# Quickstart

There are two main things to understand in SwarmPAL, *data* and *processes*. *Data* live within a [xarray DataTree](https://xarray-datatree.readthedocs.io/) (type `DataTree`), and *processes* are callable objects, that is, they behave like functions (and are of type `PalProcess`). Processes act on data to transform them by adding derived parameters into the data object.

## Fetching data

Data are pulled in over the web and organised as a `DataTree`, which is done using `create_paldata` and `PalDataItem`:

In [None]:
from swarmpal.io import create_paldata, PalDataItem

data = create_paldata(
    PalDataItem.from_vires(
        server_url="https://vires.services/ows",
        collection="SW_OPER_MAGA_LR_1B",
        measurements=["B_NEC"],
        start_time="2020-01-01T00:00:00",
        end_time="2020-01-01T03:00:00",
        options=dict(asynchronous=False, show_progress=False),
    )
)
print(data)

Now you can skip ahead to [Applying Processes](#Applying-Processes), or read on to learn more about data...

Swarm data are fetched from the [VirES service](https://vires.services/), and SwarmPAL uses the Python package [`viresclient`](https://viresclient.readthedocs.io/) underneath to transfer and load the data. Similarly, any [HAPI server](http://hapi-server.org/) can also be used, where [`hapiclient`](https://github.com/hapi-server/client-python) is used underneath.

`create_paldata` and `PalDataItem` have a few features for flexible use:
- Pass multiple items to `create_paldata` to assemble a complex datatree. Pass them as keyword arguments (e.g. `HAPI_SW_OPER_MAGA_LR_1B=...` below) if you want to manually change the name in the datatree, otherwise they will default to the collection/dataset name.
- Use `.from_vires()` and `.from_hapi()` to fetch data from different services. Note that the argument names and usage are a bit different (though equivalent) in each case. These follow the nomenclature used in `viresclient` and `hapiclient` respectively.

In [None]:
data = create_paldata(
    PalDataItem.from_vires(
        server_url="https://vires.services/ows",
        collection="SW_OPER_MAGA_LR_1B",
        measurements=["B_NEC"],
        start_time="2020-01-01T00:00:00",
        end_time="2020-01-01T03:00:00",
        options=dict(asynchronous=False, show_progress=False),
    ),
    HAPI_SW_OPER_MAGA_LR_1B=PalDataItem.from_hapi(
        server="https://vires.services/hapi",
        dataset="SW_OPER_MAGA_LR_1B",
        parameters="Latitude,Longitude,Radius,B_NEC",
        start="2020-01-01T00:00:00",
        stop="2020-01-01T03:00:00",
    ),
)
print(data)

While you can learn more about using datatrees on the [xarray documentation](https://xarray-datatree.readthedocs.io/), this should not be necessary for basic usage of SwarmPAL. If you are familiar with xarray, you can access a dataset by browsing the datatree like a dictionary, then using either the `.ds` accessor to get an immutable view of the dataset, or `.to_dataset()` to extract a mutable copy.

In [None]:
data["SW_OPER_MAGA_LR_1B"].ds

Using the VirES API, there are additional things that can be requested outwith the original dataset (models and auxiliaries). See [the viresclient documentation](https://viresclient.readthedocs.io/en/latest/available_parameters.html) for details, or [Swarm Notebooks](https://notebooks.vires.services/) for more examples. The extra `options` below specifies an extendable dictionary of special options which are passed to `viresclient`. In this case we specify `asynchronous=False` to process the request synchronously (faster, but will fail for longer requests), and disable the progress bars with `show_progress=False`.

In [None]:
data = create_paldata(
    PalDataItem.from_vires(
        server_url="https://vires.services/ows",
        collection="SW_OPER_MAGA_LR_1B",
        measurements=["B_NEC"],
        models=["IGRF"],
        auxiliaries=["QDLat", "MLT"],
        start_time="2020-01-01T00:00:00",
        end_time="2020-01-01T03:00:00",
        options=dict(asynchronous=False, show_progress=False),
    )
)

## Applying Processes

A process is a special object type you can import from different toolboxes in SwarmPAL.

First we import the relevant toolbox and create a process from the `.processes` submodule:

In [None]:
from swarmpal.toolboxes import tfa

process = tfa.processes.Preprocess()

Each process has a `.set_config()` method which configures the behaviour of the process:

In [None]:
help(process.set_config)

In [None]:
process.set_config(
    dataset="SW_OPER_MAGA_LR_1B",
    timevar="Timestamp",
    active_variable="B_NEC",
    active_component=0,
)

Processes are *callable*, which means they can be used like functions. They act on datatrees to alter them. We can use this process on the the data we built above.

In [None]:
data = process(data)
print(data)

The data has been modified, in this case adding a new data variable called `TFA_Variable`. We can inspect it using the usual xarray/matplotlib tooling, for example:

In [None]:
data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"]

In [None]:
data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"].plot.line(x="TFA_Time");

... but in this case, the TFA toolbox has additional tools for inspecting data:

In [None]:
tfa.plotting.time_series(data);

## Saving/loading data

Since `data` is just a normal datatree, we can use the usual xarray tools to write and read files. Some situations this might be useful in are:
- Saving preprocessed (i.e. interim) data, then later reloading it for further processing. One might download a whole series of data, then in a second, more iterative workflow, analyse it (without having to wait again for the download)
- Saving the output of a process to use in other tools
- Saving the output of a process to later reload just for visualisation

In [None]:
from os import remove
from datatree import open_datatree

# Save the file as NetCDF
data.to_netcdf("testdata.nc")
# Load the data as a new datatree
reloaded_data = open_datatree("testdata.nc")
# Remove that file we just made
remove("testdata.nc")
print(reloaded_data)

## The `.swarmpal` accessor

Whenever you `import swarmpal`, this registers an *accessor* to datatrees, with extra tools available under `<datatree>.swarmpal.<...>`. One way in which this is used is to read metadata (stored within the datatree). Here we see that the `Preprocess` process from the `TFA` toolbox has saved the configuration which was used:

In [None]:
reloaded_data.swarmpal.pal_meta

Since this is stored within the data itself, this is preserved over round trips through files so that a following process can see this information, even in a different session.