# Xarray and gridded data

Oceanographical data are often gridded: for example, we may record ocean temperature on a range of latitude and longitude, as well as a range of depth. Such data is inheritly 3-dimensional.

Naively, we may store such data in a 3-dimensional numpy array. However, in doing so we'll lose information about the **coordinates** of the grid, e.g., the value of depth that correspond to index 3. What we need is a higher-dimensional equivalent of pandas, where information about coordinates are stored alongside the data.

The third-party `xarray` module provides such an extension. As an added benefit, `xarray` also provides interface to load and save NETCDF files, a common external file format for gridded data.

To import xarray, run the line below (again, the `as xr` part is optional but standard).

In [1]:
import xarray as xr

## Loading and inspecting netCDF file

For this week's pre-class readings, we will work with a subset of the B-SOSE (Southern Ocean State Estimate) model output. The data is stored in the netCDF file `bsose_monthly_velocities.nc` accessible [here](https://github.com/OCEAN-215-2025/preclass/tree/main/week_07/data/bsose_monthly_velocities.nc).

To load the data, we call the `xr.open_dataset()` function, and assign the result to a variable:

In [2]:
bsose = xr.open_dataset("data/bsose_monthly_velocities.nc")

We can inspect the data using the `display()` function. The output is an interactive html snippet with parts that you can expand or hide:

In [3]:
display(bsose)

Notice that `bsose` is an xarray `Dataset` object. Note also that multiple sets of data ("Data variables") are stored alongside the labels of the coordinates ("Coordinates") in a single object. In addition, the dimension (a.k.a. shape in numpy lingo) of the data is presented by the "Dimensions" section at the top. Furthermore, there are metadata ("Attributes") associated with the whole dataset, such as its name.

By clicking on the document-looking icon, you can see that each coordinate and data variable has metadata asscoiated with them. For example, we see that `depth` is measured in m while `U` and `V` are measured in m/s. Finally, you can preview the actual data by clicking on the cylinder icon.

## Extracting parts of an xarray Dataset

As we have seen, an xarray `Dataset` consists of both data variables and coordinates, and each of these can be extracted from the `Dataset` object as an xarray `DataArray` object.

In [14]:
# extract the data variable `U` as an xarray DataArray
bsose["U"]

In [16]:
# extract the coordinates `lat` as an xarray DataArray
bsose.coords["lat"]

Furthermore, the internal data of a DataArray (usually a numpy `ndarray`) can be extracted using the `.value` attribute. For example, to extract the internal data of `U`, 

In [21]:
# extract the internal data of an DataArray
Uarray = bsose["U"].values

# check that the data is just stored as an numpy array
print(type(Uarray))

# check the shape of the numpy array
print(Uarray.shape)

<class 'numpy.ndarray'>
(12, 10, 147, 135)
