# `xarray` 

`xarray`:

- Python package
- augment NumPy arrays by adding labeld dimensions, coordinates, and atrributes
- based on the NetCDF data model

Today: learn `xarray.DataArray` and `xarray.Dataset`

## `xarray.DataArray`

- Primary object of `xarray`
- it is an n-dimensional array with **labeled dimensions**
- Represents a single variable in the NetCDF data format: holds the variable's values, dimensions, and attributes

In `xarray` each dimension has a set of **coordinates**. A dimension's coordinates indicate the dimension's values (tick labels along the dimension)

### Create an `xarray.DataArray`

We will use the info in our example

In [1]:
import pandas as pd
import numpy as np
import xarray as xr

**Variable values**

Underlying data in an `xarray.DataArray` is an `numpy.array` that holds the variable values.

We start by making an `np.array` of our mock temperature data

In [2]:
# values of a single variable (temperature) at each point of the coords
temp_data = np.array([np.zeros((5,5)),
         np.ones((5,5)),
         np.ones((5,5))*2]).astype(int)

temp_data

array([[[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2]]])

**Dimensions and coordinates**

To specify the dimensions of our `xarray.DataArray` let's think about how we constructed the `np.array` which holds the data.

We have that:

- 1st dimensions: time, coords: 2022-09-01, 2022-09-02, 2022-09-03
- 2nd dimension: latitude, coords: from 70 to 30, decreasing by 10
- 3rd dimension: longitude, coords: from 60 to 100, increasing by 10

Add dims and coords:

In [3]:
# names of dimensions in the required order
dims = ('time', 'lat', 'lon')

# create coords along each dimension (dictionary)
coords = {'time' : pd.date_range('2022-09-01', '2022-09-03'),
          'lat' : np.arange(70, 20, -10),
          'lon' : np.arange(60, 110, 10)}

**Attributes**

In [4]:
# add the attributes (metadata) as a dictionary
attrs = {'title' : 'temp across weather stations',
        'standard_name' : 'air_temperature',
        'units' : 'degree_c'}

**Combine**

In [5]:
# initialize xarray.DataArray
temp = xr.DataArray(data = temp_data,
                   dims = dims,
                   coords = coords,
                   attrs = attrs)

temp

## Subsetting 

To select data from an `xarray.DataArray` we need to specify the subsets we want along each dimension. We can do this in two ways:

- relying on the dimension's positions (**dimension lookup by position**)
- by calling each dimension by its name (**dimension lookup by name**)

**Example**

We want the temperature recorded by the weather station located at 40N 80E on Sept 1, 2022.

## Reduction

`xarray` has several methods to reduce an `xarray.DataArray` along any number of dimensions

**Example**

Calculate average temperature at each station over time:

In [6]:
avg_temp = temp.mean(dim = 'time')
avg_temp

In [8]:
avg_temp.attrs = {'title' : 'average temperature over three days'}
avg_temp

## `xarray.DataSet`

`xarray.DataSet`:
- resembles an in-memory representation of a NetCDF file
- consists of *multiple* variables( each variable is an `xarray.DataArray`)
- self-describing
- attributes can belong to a variable, a dimension, or describe the whole dataset
- variables in an `xarray.DataSet` can have the same dimensions, share dimensions, or have no dimensions in common

**Example**

Combine temp and avg temp data into a single object:

In [9]:
# make dictionaries with variables and attributes
data_vars = {
    'avg_temp' : avg_temp,
    'temp' : temp
}

attrs = {'title' : 'temperatrue data at weather stations: daily and average',
        'description' : 'simple example of an xarray.Dataset'}

# create xarray.Dataset
temp_dataset = xr.Dataset(data_vars = data_vars,
                         attrs = attrs)
temp_dataset