# `xarray` day

`xarray`: 
-Python package

-augments NumPy arrays by adding labeled dimensions, coordinates, and attributes

-based on the NetCDF data model


Today: Learn about `xarray.DataArray` and `xarray.Dataset`

## `xarray.DataArray`

-Primary object of `xarray`

-it is an n-dimensional array with **labelled dimensions**

-Represents a single variable in the NetCDF data format: holds the variable's values, dimensions, and attributes

In `xarray` each dimension has a set of **coordinates** 
A dimension's coordinates indicate the dimension's values (tick labels along the dimension)

## Let's create a n `xarray.DataArray`

We will use the info in our example.

In [2]:
import pandas as pd 
import numpy as np
import xarray as xr #this is new!

**Variable values**

Underlying data in an `xr.DataArray` is a `numpy.array` that holds the variable values. We start by making a np.array of our mock temperature data 

In [3]:
#REMINDER:
np.zeros((5,5)) #gives us an array of 0s with 5 by 5 dimensions 

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [4]:
#Make a numpy array 
#Then assign values of a single variable (temp) at each point of the coords
temp_data = np.array([np.zeros((5,5)), np.ones((5,5)), np.ones((5,5)) * 2]).astype(int)
temp_data

array([[[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2],
        [2, 2, 2, 2, 2]]])

**Dimensions and coordinates**

To specify the dimensions of our `xr.DataArray`, let's think about how we constructed the `np.array` which holds the data. Remember: we index from the top left corner of a matrix, just like we'll do with an array. Note that our latitude coordinates in the array will be in decreasing order and longitude in increasing order

We have that:

-1st dimension = time, coordinates: 2022-09-01, 2022-09-02, 2022-09-03

-2nd dimension = latitude, coords: from 70 to 30, decreasing by 10

-3rd dimension = longitude, coords: from 60 to 100, increasing by 10 

Adding dims and coords to our array:

In [5]:
#names of dimensions in the required order
dims = ('time', 'lat', 'lon')

#create coordinates along each dimension (dictionary)
coords = { 'time': pd.date_range ('2022-09-01','2022-09-03'), 
            'lon': np.arange(60,110,10),
          'lat': np.arange(70,20,-10)} #we're telling it what date range are we specifying for our time variable and what are our lat/long coordinates

**Attributes**

In [6]:
#add the attributes (metdata) as a dictionary
attrs = {'title': 'temp across weather stations', 
        'standard_name': 'air_temperature',
        'units': 'degree_c'}

In [7]:
#Now we combine everything we've created into our xarray

# initialize xarray.DataArray
temp = xr.DataArray (data = temp_data, dims = dims, coords = coords, attrs = attrs)
temp #here we can see all the information we have on the array


#the data in paranetheses tells us we have 3 time points, 5 latitude points, and 5 longitude points

## Subsetting Data

To select data from an `xarray.DataArray` we need to specify the subsets we want along each dimension. 
We can do this in two ways: 

-relying on the dimension's positions (**dimension lookup by position**)

-by calling each dimension by its name (**dimensions lookup by name**)

**Example**
We want the temperature recorded by the weather station located at 40N 80E on Sept 1, 2022

## Reduction
`xarray` has several methods to reduce an `xarray.DataArray` along any number of dimmensions

**Examples** 
Calculate average temp at each station over time:

In [8]:
avg_temp = temp.mean(dim = 'time')
avg_temp

In [9]:
avg_temp.attrs = {'title': 'average temperature over three days'}
avg_temp

#access dimensions by position, then use integers for indexing (similar to a numpy array) 
#this way is not as easy: 

temp[0,3,2] 

#where 0 = accessing first array, 3 = indicates our longitude value (row), 2 = our latitude value (column)

In [7]:
#access dimensions by position, then use lavels for indexing
temp.loc['2022-09-01', 40, 80] 
#this way is easier because we can input the values instead of indexing

#we get the same results with both of these appraoches ^ 

In [8]:
# We actually don't need to do the others approaches above. We can use dimension lookup by name. Much easier!

# Accessing dimensions by name, then using integers for indexing:
temp.isel(time = '2022-09-01', lon = 80, lat = 40)

IndexError: index 1661990400000000000 is out of bounds for axis 0 with size 3

## `xarray.DataSet`

`xarray.DataSet`: 
- resembles an in memory representation of a NetCDF file
- consists of *multiple* variables ( each variable is an `xarray.DataArray`)
- self decribing
- attributes can belong to a variable, a dimmension, or describe the whole dataset
- variable in an `xarray.DatSet` can have the same dimmmensions, share some dimensons, or have no dimmensions in common

example: 
combine temp and avg temp data into a single object 


In [11]:
#make dictionaries with variables and attributes

data_vars = {'avg_temp': avg_temp,
            'temp': temp}

attrs ={'title':  'temperature data at weather stations: daily and average',
       'description': 'simple example of an xarray.DataSet'}

#create xarray.DataSet

temp_dataset = xr.Dataset(data_vars = data_vars,
                          attrs = attrs)

temp_dataset