# Introduction to XArray <img align="right" src="../Supplementary_data/DE_Africa_Logo_Stacked_RGB_small.jpg">

* **Prerequisites**:  Users of this notebook should have a basic understanding of:
    * How to run a [Jupyter notebook](01_jupyter_notebooks.ipynb)

## Background
XArray is a Python library which simplifies working with labelled multi-dimension arrays. XArray introduces labels in the forms of dimensions, coordinates and attributes on top of raw Numpy-like arrays, allowing for more intitutive and concise development. More information about XArray data structures and functions can be found [here](http://xarray.pydata.org/en/stable/).


## Description
This notebook is designed to introduce users to XArray using Python code in Jupyter Notebooks via JupyterLab.

Topics covered include:

* How to use XArray functions in a Jupyter Notebook cell
* How to access XArray dimensions and metadata
* Using indexing to explore multi-dimensional XArray data
* Appliction of built-in XArray functions such as sum, std and mean

***


## Getting started
...

### Introduction to XArray
DEA uses XArray as its data model. To better understand what it is, let first do a simple experiment using a combination of plain numpy arrays and Python dictionaries.

Suposse we have a satellite image with three bands: Red, NIR and SWIR. These bands are represented as 2-dimensional numpy arrays and the latitude and longitude coordinates for each dimension are represented using 1-dimensional arrays. Finally, we also have some metadata that comes with this image.

In [1]:
import numpy as np

red = np.random.rand(250,250)
nir = np.random.rand(250,250)
swir = np.random.rand(250,250)

lats = np.linspace(-23.5, -26.0, num=red.shape[0], endpoint=False)
lons = np.linspace(110.0, 112.5, num=red.shape[1], endpoint=False)

title = "Image of the desert"
date = "2019-11-10"

image = {"red": red,
         "nir": nir,
         "swir": swir,
         "latitude": lats,
         "longitude": lons,
         "title": title,
         "date": date}

All our data is conveniently packed in a dictionary. Now we can use this dictionary to work with the data it contains:

In [None]:
print(image["date"])
image["red"].mean()

Still, to select data we have to use numpy indexes. Wouldn't it be convenient to be able to select data from the images using the coordinates of the pixels instead of their relative positions?

This is exactly what XArray solves! Let's see how it works:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from datetime import datetime
import numpy as np

import xarray as xr

To explore XArray we have a file containing some reflectance data of Canberra that has been generated using the DEA library.

The object that we get `ds` is a XArray `Dataset`, which in some ways is very similar to the dictionary that we created before, but with lots of convenient functionality available.

In [None]:
ds = xr.open_dataset('data/canberra_ls8.nc')
ds

### XArray dataset structure
A `Dataset` can be seen as a dictionary structure packing up the data, dimensions and attributes. Variables in a `Dataset` object are called `DataArrays` and they share dimensions with the higher level `Dataset`. 
The figure below provides an illustrative example:


<img src="../Supplementary_data/07_Intro_to_xarray/dataset-diagram.png" alt="drawing" width="600" align="left"/>

To access a variable we can access as if it were a Python dictionary, or using the `.` notation, which is more convenient.

In [None]:
ds["green"]

#or alternatively 

ds.green

Dimensions are also stored as numeric arrays.

In [None]:
ds['time']

#or alternatively 

ds.time

Metadata is referred as Attributes and is internally stored under `.attrs`, but the same convenient `.` notation applies to them.

In [None]:
ds.attrs['Conventions']

#or alternatively 

ds.Conventions

DataArrays store their data internally as multidimensional numpy arrays. But these arrays contain dimensions or labels that make it easier handle the data. To access the underlaying numpy array of a `DataArray` we can use the `.values` notation.

In [None]:
arr = ds.green.values

type(arr), arr.shape

### Indexing
XArray offers two different ways of selecting data. This includes the `isel()` approach, where data can be selected based on its index (like numpy).


In [None]:
print(ds.time.values)

ss = ds.green.isel(time=0)

ss

Or the `sel()` approach, used for selecting data based on its dimension of label value.

In [None]:
ss = ds.green.sel(time=datetime(2016,1,1))

ss

Slicing data is also used to select a subset of data.

In [None]:
ss = ds.green.sel(time=datetime(2016,1,1), latitude=slice(-35.30,-35.24))

ss

Xarray exposes lots of functions to easily transform and analyse `Datasets` and `DataArrays`. For example, to calculate the spatial mean, standard deviation or sum of the green band:

In [None]:
print("Mean of green band:", ds.green.mean())
print("Standard deviation of green band:", ds.green.std())
print("Sum of green band:", ds.green.sum())

### Plotting data with Matplotlib
Plotting is also conveniently integrated in the library.

In [None]:
ds["green"].isel(time=0).plot()

We still can do things manually using numpy and matplotlib:

In [None]:
rgb = np.dstack((ds.red.isel(time=0).values, ds.green.isel(time=0).values, ds.blue.isel(time=0).values))
rgb = np.clip(rgb, 0, 2000) / 2000

plt.imshow(rgb)

But compare to this elegant way of chaining operations within XArray:

In [None]:
ds[['red', 'green', 'blue']].isel(time=0).to_array().plot.imshow(robust=True, figsize=(6, 6))

## Recommended next steps

For more advanced information about working with Jupyter Notebooks or JupyterLab, you can explore [JupyterLab documentation page](https://jupyterlab.readthedocs.io/en/stable/user/notebook.html).

To continue working through the notebooks in this beginner's guide, the following notebooks are designed to be worked through in the following order:

1. [Jupyter Notebooks](01_Jupyter_notebooks.ipynb)
2. [Products and Measurements](02_Products_and_measurements.ipynb)
3. [Loading data](03_Loading_data.ipynb)
4. [Plotting](04_Plotting.ipynb)
5. [Performing a basic analysis](05_Basic_analysis.ipynb)
6. [Introduction to Numpy](06_Intro_to_numpy.ipynb)
7. **Introduction to XArray (this notebook)**
8. [Parallel processing with Dask](06_Parallel_processing_with_dask.ipynb)

Once you have you have completed the above eight tutorials, join advanced users in exploring:

* The "Datasets" directory in the repository, where you can explore DE Africa products in depth.
* The "Frequently used code" directory, which contains a recipe book of common techniques and methods for analysing DE Africa data.
* The "Real-world examples" directory, which provides more complex workflows and analysis case studies.