## Xarray

### N-D labeled arrays and datasets in Python

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called “tensors”) are an essential part of computational science. They are encountered in a wide range of fields, including physics, astronomy, geoscience, bioinformatics, engineering, finance, and deep learning.

![image.png](attachment:image.png)


- Xarray is an open source Python package that makes working with labeled **multi-dimensional arrays** simple, efficient, and fun! 
- It extends NumPy by adding labels to arrays, making it easier to work with scientific data. 
- Xarray doesn’t just keep track of labels on arrays – it uses them to provide a powerful and concise interface.
- Xarray also provides a number of powerful functions for working with labeled arrays, such as:

    - **Dimension names:** Xarray arrays have dimension names, which make it easier to understand and manipulate the data. For example, a temperature array might have dimension names "time" and "latitude".
    - **Coordinates:** Xarray arrays can also have coordinates, which are values that correspond to each element of the array. For example, the temperature array might have coordinates "2023-08-11" and "37.77".
    - **Attributes:** Xarray arrays can also have attributes, which are key-value pairs that store additional information about the data. For example, the temperature array might have an attribute "units" with the value "degrees Celsius".
  
- Xarray makes it easy to work with labeled multi-dimensional arrays in a number of ways:

    - **Intuitive syntax**: Xarray's syntax is designed to be intuitive and easy to understand. For example, you can easily select a subset of data by using dimension names and coordinates.
    - **Efficient operations:** Xarray operations are vectorized, which means that they are performed efficiently on all of the data at once. This can save a lot of time when working with large datasets.
    - **Flexible data structures:** Xarray supports a variety of data structures, including DataArrays, Datasets, and Groups. This flexibility makes it easy to work with different types of data.
    - **Integration with other libraries:** Xarray integrates well with other Python libraries, such as NumPy, Pandas, and Matplotlib. This makes it easy to use Xarray to process, analyze, and visualize data.


#### Key Features & Capabilities

| Feature                | Description                                                                                                        |
|------------------------|--------------------------------------------------------------------------------------------------------------------|
| Dimension names        | Xarray arrays have dimension names, making it easier to understand and manipulate data.                         |
| Coordinates            | Xarray arrays also support coordinates, which are values corresponding to each element of the array.             |
| Attributes             | Xarray arrays can have attributes, storing key-value pairs for additional data information.                      |
| Intuitive syntax       | Xarray's syntax is designed to be intuitive and easy to understand. You can conveniently select subsets of data. |
| Efficient operations   | Xarray operations are vectorized, efficiently performed on the entire dataset, saving processing time.            |
| Flexible data structures | Xarray supports various data structures, simplifying work with different types of data.                          |
| Integration with other libraries | Xarray integrates seamlessly with Python libraries like NumPy, Pandas, and Matplotlib.                   |


### Core data structures

Xarray has two core data structures, which build upon and extend the core strengths of `NumPy` and `pandas`. Both data structures are fundamentally N-dimensional:

- `DataArray`: is our implementation of a labeled, N-dimensional array. It is an N-D generalization of a `pandas.Series`.
- `Dataset`: is a multi-dimensional, in-memory array database. It is a dict-like container of DataArray objects aligned along any number of shared dimensions, and serves a similar purpose in xarray to the pandas.DataFrame.


| Feature                | DataArray                                            | Dataset                                                  |
|------------------------|------------------------------------------------------|----------------------------------------------------------|
| Dimensions             | A fixed number of dimensions.                        | Can have a variable number of dimensions.               |
| Coordinates            | Can have coordinates corresponding to each element. | Coordinates are shared by all DataArrays in the Dataset. |
| Attributes             | Can have attributes storing additional information. | Attributes are shared by all DataArrays in the Dataset.  |
| Operations             | Vectorized operations on all data at once.          | Vectorized operations, potentially involving multiple DataArrays. |
| Integration with other libraries | Can be integrated with Python libraries like NumPy, Pandas, and Matplotlib. | Integration with other libraries can be more complex due to multiple DataArrays. |


### Example

Here are some examples of how `DataArrays` and `Datasets` can be used:

* You can use a DataArray to store temperature data for a single location over time.
* You can use a Dataset to store temperature data for multiple locations over time.
* You can use a DataArray to store the results of a scientific simulation.
* You can use a Dataset to store the results of multiple scientific simulations.
* You can use a DataArray to visualize data with Matplotlib.
* You can use a Dataset to visualize data with Seaborn.

### Xarray dependencies 
(If you are using pip to install xarray, optional dependencies can be installed by specifying extras. Instructions for both pip and conda are given below.)

- **Required dependencies:** Python (3.9 or later), numpy (1.21 or later), packaging (21.3 or later), pandas (1.4 or later)
- **Optional dependencies:**

    - **netCDF4:** recommended if you want to use xarray for reading or writing netCDF files

    - **scipy:** used as a fallback for reading/writing netCDF3

    - **pydap:** used as a fallback for accessing OPeNDAP

    - **h5netcdf:** an alternative library for reading and writing netCDF4 files that does not use the netCDF-C libraries

    - **PyNIO:** for reading GRIB and other geoscience specific file formats. Note that PyNIO is not available for Windows and that the PyNIO backend may be moved outside of xarray in the future.

    - **zarr:** for chunked, compressed, N-dimensional arrays.

    - **cftime:** recommended if you want to encode/decode datetimes for non-standard calendars or dates before year 1678 or after year 2262.

    - **PseudoNetCDF:** recommended for accessing CAMx, GEOS-Chem (bpch), NOAA ARL files, ICARTT files (ffi1001) and many other.

    - **iris:** for conversion to and from iris’ Cube objects

- **For accelerating xarray:**

    - **scipy:** necessary to enable the interpolation features for xarray objects
    - **bottleneck:** speeds up NaN-skipping and rolling window aggregations by a large factor
    - **numbagg:** for exponential rolling window operations
- **For parallel computing:**
  - **dask.array:** required for Parallel computing with Dask.

- **For plotting:** 
  - **matplotlib:** required for Plotting
  - **cartopy:** recommended for Maps
  - **seaborn:** for better color palettes
  - **nc-time-axis:** for plotting cftime.datetime objects

### Installation: 

To install xarray and it's dependcncies.

`pip install xarray dask netCDF4 bottleneck -q`

In [11]:
%pip install xarray dask netCDF4 bottleneck -q

Collecting dask
  Downloading dask-2023.8.0-py3-none-any.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting netCDF4
  Downloading netCDF4-1.6.4-cp311-cp311-macosx_10_9_x86_64.whl (6.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.7/6.7 MB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting bottleneck
  Downloading Bottleneck-1.3.7-cp311-cp311-macosx_10_9_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.0/115.0 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting cloudpickle>=1.5.0
  Downloading cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Collecting fsspec>=2021.09.0
  Downloading fsspec-2023.6.0-py3-none-any.whl (163 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.8/163.8 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting partd>=1.2.0
  Downloading 

## Importing important libraries

In [13]:
import numpy as np

import pandas as pd

import xarray as xr

### Create a DataArray

In [15]:
np.random.randn(2, 3) 

array([[-0.77612402, -0.53706608, -0.96633729],
       [-0.43203258,  2.02312807,  1.40358926]])

In [22]:
data = xr.DataArray(
    np.random.randn(2, 3), #  generates a 2x3 array of random numbers sampled from a standard normal distribution. 
    dims=("x", "y"), 
    coords={"x": [10, 20]}
    )

data

In this case, a 2D array, assigned the names x and y to the two dimensions respectively and associated two coordinate labels ‘10’ and ‘20’ with the two locations along the x dimension. 

In [23]:
data.values

array([[-1.06363162, -1.19813606, -1.58878695],
       [-0.71194045,  0.32420941, -0.7814277 ]])

In [24]:
data.dims

('x', 'y')

In [25]:
data.coords

Coordinates:
  * x        (x) int64 10 20

In [26]:
data[0, :]

In [34]:
data.loc[10]

### Create a Dataset

In [36]:
dict(foo=data, bar=("x", [1, 2]), baz=np.pi)

{'foo': <xarray.DataArray (x: 2, y: 3)>
 array([[-1.06363162, -1.19813606, -1.58878695],
        [-0.71194045,  0.32420941, -0.7814277 ]])
 Coordinates:
   * x        (x) int64 10 20
 Dimensions without coordinates: y,
 'bar': ('x', [1, 2]),
 'baz': 3.141592653589793}

In [35]:
ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi))

ds

In [37]:
ds.to_netcdf("example.nc")

reopened = xr.open_dataset("example.nc")

reopened

### Refrence

- https://xarray.dev/blog/flox
- https://registry.opendata.aws/nwm-archive/
- https://xarray.dev/
- https://docs.xarray.dev/en/stable/getting-started-guide/faq.html
- https://coderzcolumn.com/tutorials/python/xarray-dataset-multi-dimensional-labelled-arrays