# Data Structures

- Authors: Marc Shapiro, Zeb Engberg
- Date: 2023-04-14
- `pycontrails`: v0.39.6

Overview data structures of `pycontrails`.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/contrailcirrus/2023-04-pycontrails-workshop/blob/main/notebooks/02-Data-Structures.ipynb)

In [None]:
!pip install pycontrails

# VectorDataset

> This is primarily used as a base class for `GeoVectorDataset` and `Flight`

Structure to hold 1D arrays of consistent size in a dictionary.
Similar to a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), but more performant.

In [1]:
import numpy as np

from pycontrails import VectorDataset

In [2]:
vector = VectorDataset({
    "a": np.arange(0, 4),
    "b": np.linspace(1, 100, 4),
})
vector

Attributes,Attributes.1

Unnamed: 0,a,b
0,0,1.0
1,1,34.0
2,2,67.0
3,3,100.0


You can pass in scalar attributes (`attrs`) to annotate to the dataset

In [3]:
vector = VectorDataset({
    "a": np.arange(0, 4),
    "b": np.linspace(1, 100, 4),
}, attrs={"version": 1})
vector

Attributes,Attributes.1
version,1

Unnamed: 0,a,b
0,0,1.0
1,1,34.0
2,2,67.0
3,3,100.0


Data and attributes are accessed via `data` and `attrs` properties, respectively

In [4]:
print(vector.data)
print(vector.attrs)

{'a': array([0, 1, 2, 3]), 'b': array([  1.,  34.,  67., 100.])}
{'version': 1}


# GeoVectorDataset

> Implements `VectorDataset`

Base class to hold 1D geospatial arrays of consistent size.

GeoVectorDataset is required to have geospatial coordinate keys defined as:

- `latitude`: WGS84, 
- `longitude`: WGS84 
- `altitude`: meters
- `time`: np.datetime64[ns]
- `level`: hPa

Each spatial variable is expected to have "float32" or "float64" `dtype`.
The time variable is expected to have "datetime64[ns]" `dtype`.

Use the attribute `attr["crs"]` to specify coordinate reference system
using [PROJ](https://proj.org/) or [EPSG](https://epsg.org/home.html>) syntax.

In [5]:
import pandas as pd
import numpy as np

from pycontrails import GeoVectorDataset

In [6]:
geovector = GeoVectorDataset({
    "latitude": np.arange(0, 4),
    "longitude": np.linspace(1, 100, 4),
    "altitude": np.linspace(1000, 8000, 4),
    "time": pd.date_range(
                  start="2022-03-10 00",
                  end="2022-03-10 05",
                  periods=4)
}, attrs={"version": 1})
geovector

Attributes,Attributes.1
time,"[2022-03-10 00:00:00, 2022-03-10 05:00:00]"
longitude,"[1.0, 100.0]"
latitude,"[0.0, 3.0]"
altitude,"[1000.0, 8000.0]"
version,1
crs,EPSG:4326

Unnamed: 0,latitude,longitude,altitude,time
0,0.0,1.0,1000.0,2022-03-10 00:00:00
1,1.0,34.0,3333.333333,2022-03-10 01:40:00
2,2.0,67.0,5666.666667,2022-03-10 03:20:00
3,3.0,100.0,8000.0,2022-03-10 05:00:00


# Flight

> Implements `GeoVectorDataset`

A single flight trajectory.

The `Flight` class has many useful utilites.
See the [02-Flight.ipynb](02-Flight.ipynb) for more details.

In [7]:
import pandas as pd
import numpy as np

from pycontrails import Flight

In [8]:
flight = Flight({
    "latitude": np.linspace(38.8, 38.7, 5),
    "longitude": np.linspace(-77, -77.2, 5),
    "altitude": np.linspace(9000, 9000, 5),
    "time": pd.date_range(
                  start="2022-03-01 00:50:00",
                  end="2022-03-01 00:54:00",
                  periods=5)
}, attrs={
    "flight_id": "AC1234",
    "aircraft_type": "A320",
})
flight

Attributes,Attributes.1
time,"[2022-03-01 00:50:00, 2022-03-01 00:54:00]"
longitude,"[-77.2, -77.0]"
latitude,"[38.7, 38.8]"
altitude,"[9000.0, 9000.0]"
flight_id,AC1234
aircraft_type,A320
crs,EPSG:4326

Unnamed: 0,latitude,longitude,altitude,time
0,38.8,-77.0,9000.0,2022-03-01 00:50:00
1,38.775,-77.05,9000.0,2022-03-01 00:51:00
2,38.75,-77.1,9000.0,2022-03-01 00:52:00
3,38.725,-77.15,9000.0,2022-03-01 00:53:00
4,38.7,-77.2,9000.0,2022-03-01 00:54:00


# Meteorology

## MetDataset

Meteorological dataset with multiple variables.

Composition around [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html#xarray.Dataset) that enforces certain variables and dimensions:

- `latitude`: WGS84, 
- `longitude`: WGS84 
- `level`: hPa, or `altitude`: meters
- `time`: np.datetime64[ns]

See the [02-Meteorology.ipynb](02-Meteorology.ipynb) for more details.

In [9]:
import numpy as np
import xarray as xr

from pycontrails import MetDataset

In [10]:
# create a random number generator
rng = np.random.default_rng(seed=2020)

ds = xr.Dataset(
    {
        "random": (["longitude", "latitude", "level", "time"], rng.random((20, 15, 4, 5))),
        "ones": (["longitude", "latitude", "level", "time"], np.ones((20, 15, 4, 5))),
    },
    coords={
        "longitude": np.arange(-100, -80, 1.0),
        "latitude": np.arange(30, 45, 1.0),
        "level": np.arange(100, 500, 100),
        "time": pd.date_range(
                  start="2022-03-01 00:00:00",
                  end="2022-03-01 05:00:00",
                  periods=5),
    },
)
met = MetDataset(ds)
met

## MetDataArray

Meteorological data array of single variable.

Thin composition around [`xr.DataArray`](https://xarray.pydata.org/en/stable/user-guide/data-structures.html#dataarray) to enforce certain variables and dimensions.

In [11]:
import numpy as np
import xarray as xr

from pycontrails import MetDataArray

In [12]:
rng = np.random.default_rng(seed=2020)

# construct an xarray DataArray with coordinate labels for data
da = xr.DataArray(
    name="random",
    data=rng.random((20, 15, 4, 5)),
    dims=["longitude", "latitude", "level", "time"],
    coords={
        "longitude": np.arange(-100, -80, 1.0),
        "latitude": np.arange(30, 45, 1.0),
        "level": np.arange(100, 500, 100),
        "time": pd.date_range(
                  start="2022-03-01 00:00:00",
                  end="2022-03-01 05:00:00",
                  periods=5),
    },
)
da
met = MetDataArray(da)
met