# Explaining xarray

## Sources: https://docs.xarray.dev/en/stable/getting-started-guide/

### Xarray is an open-source Python package used when working with multi-dimensional (N-dimensional, ND) arrays 

### Xarray allows you to atttach spacial and time coordinates to the ND arrays, making it easier to index, slice, and perform operations of the datasets

## Core data structures of xarray

### 1. DataArray: the implementation of a labeled, N-dimensional array; it is the multidimensional generalization of pandas.Series

### 2. Dataset: the multidimensional, in-memory array database that acts like a dictionary container of DataArray objects ; serves a similat purpose to the pandas.Dataframe but for multidimensional data

## Basic functions of xarray

### 1. Creating DataArray

In [1]:
#import packages
import numpy as np
import xarray as xr

In [11]:
data = np.random.randn(3, 3)    # creates random data in a 3x3 grid

In [6]:
coord = {'lat': [1,2,3], 'lon': [1, 2, 3]}   # define coordinates for the data's 3 dimensions

In [8]:
dataarray = xr.DataArray(data, coords = coord, dims = ['lat', 'lon']) # create a dataarrat using the random data and assigned coordinate values

In [9]:
print(dataarray)

<xarray.DataArray (lat: 3, lon: 3)>
array([[0.32362634, 0.27688404, 0.7881579 ],
       [0.58657455, 0.77105235, 0.63926525],
       [0.16912153, 0.31896529, 0.36962015]])
Coordinates:
  * lat      (lat) int64 1 2 3
  * lon      (lon) int64 1 2 3


## 2. Indexing an Xarray

### This is done similarly as pandas, but there are four kinds of indexing in Xarray:

#### 1. Positional and by integer label, similar to numpy:

In [14]:
dataarray[0,:]  # gives you the first index in the first dimension, and all the data in the second dimension

#### 2. Location (loc): positional and coordinate label similar to pandas

#### The .loc method in xarray selects data based on the coordinate labels, not index position

In [20]:
dataarray.loc[3]  # this selects where the longitude coordinate label is 3

#### 3. Integer select (isel): by dimension name and integer label

#### This selects data based on integer/index position of coordinates along a given dimension

In [27]:
dataarray.isel(lat=2)  # this selects the third latitude in the dataarray using index/integer positioning

#### 4. Select (sel): by dimension name and coordinate label

#### Method selects by labels, allowing you to select data in a specfic dimension using coordinate labels rather than indec position

In [30]:
dataarray.sel(lon=2)   # select data where lon is 2

## 3. Doing computations with Data arrays

## This works very similarly to numpy ndarrays

### Addition:

In [32]:
dataarray + 10   # adds 10 to every element in the DataArray

### Can also use numpy functions to do computations on the N-d arrays

In [34]:
np.cos(dataarray)   # computes the cosine of each element using NumPy

### Take the sum of the array

In [36]:
dataarray.sum()  # computes the sum of the entire array into one dimension

## Operations can also be done on the dimension names instead of the entire array

In [40]:
dataarray.mean(dim='lon')   # computes the mean of the DataArray along the longitude (lon) dimension