## Introduction to Xarray

This notebook will walk through an introduction to the structure of an xarray dataset. The examples below use [Impact Observatory 9-Class](https://docs.cecil.earth/Land-Cover-9-Class-111ef16bbbe481c0bb41e6e79ec441c8) dataset. If you would like to run this notebook for yourself, just swap out the data request ID for one of your own. Note that you may need to change some variable names and timestamps to align with the dataset you're working with. 

In [1]:
import cecil
import xarray
import matplotlib.pyplot as plt

client = cecil.Client()

### Load dataset with xarray

Each raster dataset available through Cecil is delivered in the form of an Xarray Dataset. 

A ['Dataset'](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset) is one of the data structures within the Xarray library. A single Dataset contains one or more [DataArrays](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray), which correspond to different data variables. 

After creating a data request with Cecil, the data can be accessed by passing the data request id into the `load_xarray()` function.

In [2]:
data_request_id = '1dd55fdb-fbab-4943-a984-4d327baf4356'

ds = client.load_xarray(data_request_id)
ds

Note that by default in this notebook, the dataset structure is displayed using the 'html' format, where the dataset output is interactive. To display the output in a text format, you can set the display style to 'text', or use `print(ds)`.

In [3]:
with xarray.set_options(display_style="text"):
    display(ds)

### Examine properties of a dataset

Xarray Datasets have four key properties:

- `dims`: dictionary that maps dimension names to the length of the dimension. For example, the dimensions {'x': 1500, 'y': 1000, 'time': 8} would correspond to a dataset with 1500 pixels in the x direction, 1000 pixels in the y direction, and 8 timesteps. 
- `coords`: dict-like container of DataArrays that are used to label points along the dataset's dimensions. For example, arrays respresenting each x and y coordinate value present in the Dataset, or datetime objects indicating the timesteps.
- `data_vars`: dict-like container of variables, where each data variable is represented as a single DataArray.
- `attrs`: dictionary holding metadata associated with the dataset. 

We will walk through how to access and interpret each of these properties below. 

**Dimensions**

All datasets delivered by Cecil will have `x` and `y` spatial dimensions, and most will also have a `time` dimension. We can visualise the dimensions of a dataset and their sizes by running the following:

In [4]:
ds.dims



**Coordinates**

The data delivered through Cecil has coordinates that correspond to the dimensions of the data. These are called dimension coordinates (represented by the `*` symbol), and they are the actual values that fall along each of the dimensions of the dataset. The [Xarray Tutorial](https://tutorial.xarray.dev/intro.html) suggests it's helpful to think about coordinates as the tick marks along an axis, and dimensions as the axis labels. 

In this example, the coordinates `x`, `y`, and `time` specify the spatial locations of the pixels (x and y coordinates), and the timestep associated with each data layer (time coordinates). These coordinates are represented as 1D arrays. 

It is also possible to have coordinates that do not correspond to one of the dimensions of the dataset. These types of coordinates can be multidimensional. 

In [5]:
ds.coords

Coordinates:
  * x            (x) float64 43kB -1.779e+07 -1.779e+07 ... -1.773e+07
  * y            (y) float64 35kB 2.541e+06 2.541e+06 ... 2.494e+06 2.494e+06
    spatial_ref  int64 8B 0
  * time         (time) datetime64[ns] 64B 2017-01-01 2018-01-01 ... 2024-01-01

**Data Variables**

Each variable listed on the dataset page for the relevant dataset is represented in xarray by a 'data variable'. These variables are each either 2D or 3D arrays (depending on whether or not a dataset has a time dimension). 

In the example below, there is only one data variable (`land_cover_class`) with three dimensions (time, y, x). We can see that the data itself is represented as a [dask array](https://docs.dask.org/en/stable/array.html) with a chunk size of (1, 2000, 2000). Xarray's ability to wrap around chunked dask arrays in this way means that we can run analyses on very large arrays that may not fit into memory. 

In [6]:
ds.data_vars

Data variables:
    land_cover_class  (time, y, x) uint8 188MB dask.array<chunksize=(1, 2000, 2000), meta=np.ndarray>

Similar to the dataset itself, we can access the dimensions and coordinates associated with a data variable like this: 

In [7]:
ds.land_cover_class.dims

('time', 'y', 'x')

In [8]:
ds.land_cover_class.coords

Coordinates:
  * x            (x) float64 43kB -1.779e+07 -1.779e+07 ... -1.773e+07
  * y            (y) float64 35kB 2.541e+06 2.541e+06 ... 2.494e+06 2.494e+06
    spatial_ref  int64 8B 0
  * time         (time) datetime64[ns] 64B 2017-01-01 2018-01-01 ... 2024-01-01

**Attributes**

The final property of an Xarray Dataset is the attributes. Attributes can hold metadata at the dataset level, or at the level of an indidividual data variable. 

Cecil provides some attributes by default with all datasets delivered (e.g. dataset name, provider name, and ids relevant to the Cecil platform). 

In [9]:
ds.attrs

{'provider_name': 'Impact Observatory',
 'dataset_id': 'a4bb9aea-b6df-4d19-9083-38357f8fa76c',
 'dataset_name': 'Land Cover 9-Class',
 'dataset_crs': 'EPSG:3857',
 'aoi_id': 'd9532dde-cf4d-49f8-9b93-94c6d4a60618',
 'data_request_id': '1dd55fdb-fbab-4943-a984-4d327baf4356'}

We can access these attributes from the dataset and use them to pull information from the Cecil platform. For example, to find out the size of the AOI used for making the data request, we can run:

In [10]:
aoi_ha = client.get_aoi(ds.attrs['aoi_id']).hectares
print(f'AOI size in hectares: {aoi_ha:.2f}')

AOI size in hectares: 174325.32


Or to check when a data request was originally created:

In [11]:
created_at = client.get_data_request(ds.attrs['data_request_id']).created_at
print(f'Data request {data_request_id} was created_at: {created_at}')

Data request 1dd55fdb-fbab-4943-a984-4d327baf4356 was created_at: 2025-08-20 09:56:53.557000+00:00


As mentioned previously, attributes can apply to the dataset as a whole, or to individual data variables. To see the attributes associated with the `land_cover_class` data variable, we can run the following:

In [12]:
ds.land_cover_class.attrs

{'AREA_OR_POINT': 'Area',
 '_FillValue': np.uint8(255),
 'scale_factor': 1.0,
 'add_offset': 0.0}

These attributes are often added by default during the creation of the Xarray Dataset. One useful attribute to pay attention to is the `'_FillValue'` attribute, which can be used to filter or mask NoData from a data variable. 