# What is Xarray

* A python library for handling labeled multi-dimensional data and metadata.
* In scientific computing we often deal with multi-dimensional data. Libraries like numpy are fast but lack labels. Pandas gives labels, but it only works with 1D and 2D data.

* Xarray combines the best of both worlds:
    * Like NumPy: supports multi-dimensional arrays
    * Like Pandas: adds labels

# DataArray Object

* A labeled N-dimensional array (like a numpy ndarray) with metadata.
* In this example each column of the 3 x 4 array corresponds to a different city, and each row corresponds to a different time step.
    * Shape of the array: (3, 4) → 3 time steps × 4 locations
    * Dimensions: dims=\["time", "location"\]
        * The first axis (axis 0) is time
        * The second axis (axis 1) is location
    * The order of the dimensions list matters - It defines which axis of the underlying NumPy array corresponds to which named dimension.

In [7]:
import xarray as xr
import numpy as np

temp = xr.DataArray(
    np.random.rand(3, 4),
    dims=["time", "location"],
    coords={"time": ["2023-01-01", "2023-01-02", "2023-01-03"],
            "location": ["NY", "LA", "CHI", "DAL"]},
    name="temperature",
    attrs={"units": "degC"}
)

temp

In [3]:
temp.values

array([[0.53210249, 0.93427762, 0.86934021, 0.98828409],
       [0.84561086, 0.44276634, 0.30869545, 0.38593125],
       [0.37362267, 0.23139627, 0.27734725, 0.43732941]])

In [4]:
temp.time

In [5]:
temp.location

# DataSet Object

* A dictionary of DataArray objects that share dimensions and coordinates.
* Common question - what is the difference between DataArray and DataSet
    * DataArray holds a single multi-dimensional variable and its coordinates.
    * Dataset holds multiple variables that potentially share the same coordinates.
* For example, a GFS grib file would load as a Dataset. A GRIB file isn't just one variable. It usually contains many fields which all vary in time, exist on different pressure levels and have different units and metadata. 
* That makes a GRIB file a natural fit for a Dataset, which is designed to hold multiple DataArrays (each one a separate variable) with shared coordinates (like time, latitude, longitude, pressure).
* For example here we have a DataSet made up of tw

In [16]:
coords={"time": ["2023-01-01", "2023-01-02", "2023-01-03"],
            "location": ["NY", "LA", "CHI", "DAL"]}
dims=["time", "location"]

temp = xr.DataArray(
    np.random.rand(3, 4),
    dims=dims,
    coords=coords,
    name="temperature",
    attrs={"units": "degC"}
)

rh = xr.DataArray(
    np.random.rand(3, 4),
    dims=temp.dims,
    coords=temp.coords,
    name="humidity",
    attrs={"units": "%"}
)


ds = xr.Dataset({
    "temperature": temp,
    "rh": rh
})

ds