# Time-series: SDO/AIA & SDO/EVE (MEGS-A)

In [None]:
import ast
import sunpy.map
import torch

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sdoml import SDOMLDataset
from sdoml.sources import DataSource

from timeit import default_timer as timer

First, we will instantiate the ``SDOMLDataset`` class, to load one month of 
the six optically-thin SDO/AIA channels (94A/193A/211A) alongside EVE MEGS-A Irradiance from Fe XVIII/XII/XIV which are the primary source ions for the aforementioned channels from ``fdl-sdoml-v2/sdomlv2_small.zarr`` and ``fdl-sdoml-v2/sdomlv2_eve.zarr/``

In [None]:
data_to_load = {
    "AIA": {
        "root": "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2_small.zarr/",
        "channels": ["94A", "193A", "211A"],
    },
    "EVE": {
        "root": "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2_eve.zarr/",
        "channels": ["Fe XVIII", "Fe XII", "Fe XIV"],
    },
}

datasource_arr = [DataSource(instrument=k, meta=v) for k, v in data_to_load.items()]

In [None]:
sdomlds = SDOMLDataset(
    cache_max_size=1 * 512 * 512 * 2048,
    years=[
        "2010",
    ],
    data_to_load=datasource_arr,
)

With the Dataset instantiated, while we could directly access the dataset using the ``__getitem__`` method ( ``sdomlds.__getitem__(idx)`` loads and returns single sample from the dataset at the given index ``idx``), we will use the ``torch.utils.data.DataLoader`` iterator with a ``batch_size`` of 64, and no shuffling of the data.

As will be evident, the first data access for a given chunk is relatively slow (it is retrieved from remote store on Google Cloud Storage), however the second data access is faster, as this uses cache. For more information see https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.LRUStoreCache

In [None]:
dataloader = torch.utils.data.DataLoader(
    sdomlds,
    batch_size=64,
    shuffle=False,
)

In [None]:
data = next(iter(dataloader))

``SDOMLDataset()`` returns both image, and metadata as a single dictionary:

```
In:  data.keys()
Out: dict_keys(['data', 'meta'])
```

* The ``images`` returned by ``__getitem__(idx)`` for a single observations is of size: ``(1, 3, 512, 512)``, where each item contains of the 3 requested co-temporal observations (SDO/AIA ``[94, 193, 211]``) of ``torch.Size([512, 512])``. As shown below, with a batch size of 64, the data is of ``torch.Size([64, 3, 512, 512])``

```
In:  data['data'].keys()
Out: dict_keys(['AIA', 'EVE'])
In:  data['data']['AIA'].shape
Out: torch.Size([64, 3, 512, 512])
```

* The ``metadata`` for AIA is a list of ``str(dictionary)``, each with 175 key-value pairs. EVE is stored similarly, with each dictionary containing 3 key-value pairs. As shown below, the length of these data is 64.

```
In:  data['meta'].keys()
Out: dict_keys(['AIA', 'EVE'])
In:  len(data['meta']['AIA'])
Out: 64
```

A small excerpt of the AIA dictionary (for index 0 in the batch of 64) is shown below for ``['DEG_COR', 'EXPTIME', 'WAVELNTH', 'T_OBS']``.


```
> batch_index = 0
> data['data']['AIA'][batch_index].shape
torch.Size([3, 512, 512])

> ast.literal_eval(data['meta']['AIA'][batch_index])

{
    ...
    'DEG_COR': [1.083, 0.99217, 0.982774],
    'EXPTIME': [2.901124, 2.000068, 2.900861],
    'WAVELNTH': [94, 193, 211],
    'T_OBS': ['2010-08-01T00:00:09.57Z',
              '2010-08-01T00:00:08.84Z',
              '2010-08-01T00:00:02.07Z']
    ...
}
```

And the whole dictionary is shown for EVE:

```
> batch_index = 0
> data['data']['EVE'][batch_index]
tensor([4.6366e-06, 3.7070e-05, 1.6266e-05])

> ast.literal_eval(data['meta']['EVE'][batch_index])

Out:
{
    'ion': ['Fe XVIII', 'Fe XII', 'Fe XIV'],
    'logT': ['6.81 MK', '6.13 MK', '6.27 MK'],
    'wavelength': ['9.3926 nm', '19.512 nm', '21.1331 nm']
}
```

alternatively, one can just use list comprehension to get a ``np.array`` of dictionaries for the instrument and batch: 

In [None]:
aia_meta = np.array(
    [ast.literal_eval(data["meta"]["AIA"][i]) for i in range(len(data["meta"]["AIA"]))]
)
eve_meta = np.array(
    [ast.literal_eval(data["meta"]["EVE"][i]) for i in range(len(data["meta"]["EVE"]))]
)

### Plotting SDO/AIA, SDO/EVE MEGS-A time-series

The following code blocks will plot the time-series from SDO/AIA (94/193/211) and their respective primary source ions (obtained from SDO/EVE MEGS-A)

In [None]:
times = []

# !TODO need to add 'T_OBS' to EVE metadata
times.extend(aia_meta[i]["T_OBS"][0] for i in range(len(data["meta"]["AIA"])))

In [None]:

plt.figure(figsize=(15, 5))

colours = ["cadetblue", "darkorange", "lightcoral"]

for aia_index, aia_data in enumerate(
    data["data"]["AIA"][:, :, :, :].mean(dim=[2, 3]).T
):
    aialabel = (
        "SDO/AIA "
        + str(ast.literal_eval(data["meta"]["AIA"][0])["WAVELNTH"][aia_index])
        + " Å"
    )
    plt.plot(
        pd.to_datetime(pd.Series(times)),
        (aia_data - aia_data.mean()) / aia_data.std(),
        "-",
        lw=6,
        alpha=0.4,
        c=colours[aia_index],
        label=aialabel,
    )

for eve_index in range(len(data["data"]["EVE"].T)):
    eve_data = data["data"]["EVE"][:, eve_index]
    evelabel = "SDO/EVE " + str(
        ast.literal_eval(data["meta"]["EVE"][0])["ion"][eve_index]
    )
    plt.plot(
        pd.to_datetime(pd.Series(times)),
        (eve_data - eve_data.mean()) / eve_data.std(),
        "-o",
        lw=2,
        c=colours[eve_index],
        label=evelabel,
    )

plt.ylabel("Standardised Observations")
plt.title(
    "Time-series Comparison of Three SDO/AIA Channels and Their Primary Source Ions (SDO/EVE MEGS-A)"
)

plt.legend()

---