# 09: Cells and cell methods

Sometimes a data value does not represent a point, but instead some coordinate range that we can call a *cell*. Some examples could be

* Maximum temperatures within each month
* Values in latitude/longitude bins
* Mean sea ice salinity from melting a chunk of an ice core between two depth values

In this tutorial we will look at some examples of how you can encode the limits of each cell in CF-NetCDF.

## Maximum monthly temperatures

Let's start with the basic code to create our xarray object for a time series of temperature values.

Months and years not recommended in CF because of potential for confusion with leap years. Therefore, we can use days since 1970-01-01 (the epoch) for first day of each month for 2023.

In [3]:
import datetime

# Function to calculate days since 1970-01-01 for the start of each month in a year
def get_days_since_epoch(year):
    days_since_epoch = []
    for month in range(1, 13):
        first_day_of_month = datetime.date(year, month, 1)
        days_since_epoch.append((first_day_of_month - datetime.date(1970, 1, 1)).days)
    return days_since_epoch

# Get days since 1970-01-01 for the start of each month in 2023
days_since_epoch_2023 = get_days_since_epoch(2023)
print(days_since_epoch_2023)


[19358, 19389, 19417, 19448, 19478, 19509, 19539, 19570, 19601, 19631, 19662, 19692]


In [2]:
import xarray as xr

days_since_epoch = [19358, 19389, 19417, 19448, 19478, 19509, 19539, 19570, 19601, 19631, 19662, 19692]
maximum_monthly_temperatures = [4.6,5.2,7.1,12.3,17.8,21.3,24.6,22.8,19.0,14.2,8.8,6.1]

xrds = xr.Dataset(
    coords={
        'time': days_since_epoch
    },
    data_vars={
        'Temperature': ('time', maximum_monthly_temperatures)
    } 
)

xrds['Temperature'].attrs = {
    'standard_name': 'air_temperature',
    'long_name': 'Maximum air temperatures per month',
    'units': 'Degrees Celsius',
    'coverage_content_type': 'physicalMeasurement'
}
xrds['time'].attrs = {
    'standard_name': 'time',
    'long_name': 'time in months',
    'units': 'days since 1970-01-01',
    'coverage_content_type': 'coordinate'
}
xrds

But our values are not representative of just one day; they are maximum values for the month! So we need to include some bounds for the month. First, we need to create a 2D array of values.

In [3]:
import numpy as np
time_bounds_2d_array = np.array([
    [738521, 738551],
    [738552, 738579],
    [738580, 738610],
    [738611, 738640],
    [738641, 738671],
    [738672, 738701],
    [738702, 738732],
    [738733, 738763],
    [738764, 738793],
    [738794, 738824],
    [738825, 738854],
    [738855, 738885]
])
time_bounds_2d_array

array([[738521, 738551],
       [738552, 738579],
       [738580, 738610],
       [738611, 738640],
       [738641, 738671],
       [738672, 738701],
       [738702, 738732],
       [738733, 738763],
       [738764, 738793],
       [738794, 738824],
       [738825, 738854],
       [738855, 738885]])

Maybe instead we have the start and end time in a pandas dataframe. Here is an example of how to create the 2D array from your pandas dataframe. Here we are creating a dummy dataframe first for you.

In [4]:
# Function to create a Pandas DataFrame of start and end days for each month in a year
import pandas as pd
def create_dataframe(year):
    months = []
    for month in range(1, 13):
        first_day_of_month = datetime.date(year, month, 1)
        last_day_of_month = datetime.date(year, month % 12 + 1, 1) if month != 12 else datetime.date(year + 1, 1, 1)
        end_of_month = (last_day_of_month - datetime.timedelta(days=1)).toordinal()
        months.append({'Month': month, 'Start Day': first_day_of_month.toordinal(), 'End Day': end_of_month})
    return pd.DataFrame(months)

# Create a Pandas DataFrame for the start and end days of each month in 2023
df_2023 = create_dataframe(2023)
df_2023

Unnamed: 0,Month,Start Day,End Day
0,1,738521,738551
1,2,738552,738579
2,3,738580,738610
3,4,738611,738640
4,5,738641,738671
5,6,738672,738701
6,7,738702,738732
7,8,738733,738763
8,9,738764,738793
9,10,738794,738824


In [5]:
time_bounds_2d_array = df_2023[['Start Day', 'End Day']].to_numpy()
time_bounds_2d_array

array([[738521, 738551],
       [738552, 738579],
       [738580, 738610],
       [738611, 738640],
       [738641, 738671],
       [738672, 738701],
       [738702, 738732],
       [738733, 738763],
       [738764, 738793],
       [738794, 738824],
       [738825, 738854],
       [738855, 738885]])

Now we need to create a new variable in our xarray object for the time bounds. This variable needs to have 2 dimensions. `time` of course, but also another dimension that we will call `nv` for number of vertices. This dimension will have a value of 2 in this case because the cell provides information on only the minimum and maximum time.

In [6]:
xrds = xrds.expand_dims(nv=2) # Creating new dimension

xrds['time_bounds'] = (['time','nv'], time_bounds_2d_array)
xrds

Now we need to add metadata to make this machine readable. Below we are saying that the `time_bounds` variable defines the bounds of the `time` variable. We are using `cell_methods` to state that values are the maximums within each cell with respect to time.

In [7]:
xrds['time'].attrs['bounds'] = 'time_bounds'
xrds['Temperature'].attrs['cell_methods'] = 'time: maximum'

A full list of possible cell methods that you can use is provided here: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#appendix-cell-methods

This includes `maximum`, `minimum`, `mean`, `median`,`standard_deviation` and more.

## Ice core data

Here is a full example for ice core data. You might have a pandas dataframe something like what we are creating below.

In [8]:
import pandas as pd

# Sample data
data = {
    'minimum_depth': [0, 10, 20, 30],
    'maximum_depth': [10, 20, 30, 38],
    'salinity': [35.4, 35.6, 35.5, 35.2]
}

# Creating the DataFrame
df = pd.DataFrame(data)
df

Unnamed: 0,minimum_depth,maximum_depth,salinity
0,0,10,35.4
1,10,20,35.6
2,20,30,35.5
3,30,38,35.2


Here is the code to get this into an xarray object with bounds

In [9]:
xrds = xr.Dataset(
    coords={
        'depth': df['minimum_depth'] # Could be any value within the cell
    },
    data_vars={
        'Salinity': ('depth', df['salinity'])
    } 
)

xrds = xrds.expand_dims(nv=2)

depths_2d_array = df[['minimum_depth','maximum_depth']].to_numpy()

xrds['depth_bounds'] = (['depth','nv'], depths_2d_array)

xrds['Salinity'].attrs = {
    'standard_name': 'sea_ice_salinity',
    'long_name': 'Salinity of sea ice measured by melting chunks of an ice core',
    'units': '1e-3',
    'coverage_content_type': 'physicalMeasurement',
    'cell_methods': 'depth: mean'
}
xrds['depth'].attrs = {
    'standard_name': 'depth',
    'long_name': 'depth in ice core',
    'units': 'cm',
    'positive': 'down',
    'coverage_content_type': 'coordinate',
    'bounds': 'depth_bounds'
}
xrds



## Cell methods when you have multiple bounds

Suppose you have some wind speed model data for a grid of latitude and longitude points. Your model gives you one value for each 0.1 degree cell. You export some variables from your model every second, but your only want to publish the mean values per hour.

You want to specify the beginning and end time of your cell in your NetCDF file, and state that values are the mean values per hour.

In this case, you will have bounds for your latitude, longitude and time.

In [7]:
import numpy as np

latitudes = np.arange(58, 62.1, 0.1)
longitudes = np.arange(9, 13, 0.1)
timestamps_hours = [
    datetime.datetime(2023, 6, 11, 0, 0, 0),
    datetime.datetime(2023, 6, 12, 0, 0, 0),
    datetime.datetime(2023, 6, 13, 0, 0, 0),
    datetime.datetime(2023, 6, 14, 0, 0, 0),
    datetime.datetime(2023, 6, 15, 0, 0, 0),
    datetime.datetime(2023, 6, 16, 0, 0, 0)
]

num_latitudes = len(latitudes)
num_longitudes = len(longitudes)
num_timestamps = len(timestamps_hours)

# Generating random wind speeds (assuming a range of 0 to 5 m/s for demonstration purposes)
mean_wind_speeds = np.random.rand(num_latitudes, num_longitudes, num_timestamps) * 5

xrds = xr.Dataset(
    coords={
        'time': times,
        'latitude': latitudes,
        'longitude', longitudes
    },
    data_vars={
        'wind_speed': (['latitude','longitude','time'], wind_speed_3d_array)
    } 
)

xrds = xrds.expand_dims(nv=2)

In [1]:
times = [0,1,2,3,4]
latitudes = [30.1

xrds = xr.Dataset(
    coords={
        'time': times,
        'latitude': latitudes,
        'longitude', longitudes
    },
    data_vars={
        'wind_speed': (['latitude','longitude','time'], wind_speed_3d_array)
    } 
)

xrds = xrds.expand_dims(nv=2)

SyntaxError: ':' expected after dictionary key (3422837419.py, line 8)