# Data I/O (old)

A temporary demonstration of how the input data to toolboxes can be loaded.

The `ExternalData` class is not intended to be used directly. `ExternalData` defines some general utilities which are useful for handling the input (and output) data for each toolbox. Each toolbox defines its own subclasses of this, e.g. `SecsInputs`, `TfaMagInputs`. These define which datasets to connect to, supply some default configuration for those datasets, and perform some preprocessing (e.g. generation of auxiliary/derived parameters).

An example subclass, `MagExternalData(ExternalData)`, is provided and used below to demonstrate the general behaviour of the data objects.

Some behaviours of `ExternalData`:
- Each subclass defines which datasets to plug into, and which parameters to fetch etc.
- The user chooses which particular dataset to fetch
- `ExternalData` objects only hold a single time series, represented as an xarray Dataset
- Remote datasets are by default configured to come from VirES

Some methods added to make the usage more flexible:
- The expensive part (fetching data) happens at a step:  
  `.initialise()`  
  which is run by default when fetching data from VirES (but can be disabled by passing `initialise=False`)
- Preloaded data can be saved to a file:  
  `.to_file("filename.nc")`  (saves a netCDF file from the xarray object)  
  This can be useful to prepare a bulk dataset to be processed (i.e. download all the data first, then apply the algorithms from the toolbox)
- Choose where data will come from to initialise the data object. On object creation, pass `source = "vires" | "swarmpal_file" | "manual"`:
  - `"vires"` (default) to fetch from VirES
  - `"swarmpal_file"` to provide a file prepared from the `.to_file()` method
  - `"manual"` to manually pass an xarray Dataset
  - Data from file or manual are loaded after object creation in a second step:
    `.initialise(xarray)`  
    `.initialise("filename.nc")`

In [None]:
# This allows module code to be reloaded live
# - useful for testing out things when working on an editable install of the package
%load_ext autoreload
%autoreload 2

In [None]:
from swarmpal.io import ExternalData, MagExternalData

## Properties of `ExternalData` objects

The base `ExternalData` class has unset collections and defaults. These configure which dataset (collection) to connect to, and the default (user-overridable) parameters to pass to VirES.

In [None]:
ExternalData.COLLECTIONS

In [None]:
ExternalData.DEFAULTS

Subclasses replace these to configure the data they require access to:

In [None]:
MagExternalData.COLLECTIONS

In [None]:
MagExternalData.DEFAULTS

## Get data from VirES

The user creates a data object, specifying the details of the particular collection and time window they choose to use:

In [None]:
d_vires = MagExternalData(
    collection="SW_OPER_MAGA_LR_1B",
    model="IGRF",
    start_time="2022-01-01",
    end_time="2022-01-01T01:00:00",
    viresclient_kwargs=dict(
        asynchronous=True, show_progress=True
    ),  # optional (default)
    source="vires",  # optional (default)
    initialise=False,  # defaults to True
)

Data is stored in the .xarray property

This is not available yet because we set `initialise=False`

In [None]:
# catch the error and just print the error message
try:
    d_vires.xarray
except AttributeError as e:
    print(e)

In [None]:
d_vires.initialise()

In [None]:
d_vires.xarray

## Use data from above to manually create the `ExternalData`

In [None]:
d_manual = MagExternalData(source="manual")

In [None]:
try:
    d_manual.xarray
except AttributeError as e:
    print(e)

Initialise it with the data we fetched earlier

One could supply any data here but it is up to the user to ensure the data is valid input

In [None]:
d_manual.initialise(d_vires.xarray.copy())

In [None]:
d_manual.xarray

## Create `ExternalData` from a file

Suppose we prepared the input in an earlier step:

In [None]:
d1 = MagExternalData(
    collection="SW_OPER_MAGA_LR_1B",
    model="IGRF",
    start_time="2022-01-01",
    end_time="2022-01-01T01:00:00",
    viresclient_kwargs=dict(asynchronous=True, show_progress=True),
)
d1.to_file("test_file.nc")

Now we can create the object from this file directly:

In [None]:
d2 = MagExternalData(source="swarmpal_file")

In [None]:
d2.initialise("test_file.nc")
d2.xarray

In [None]:
from os import remove

remove("test_file.nc")