---
title: THEMIS State data pipeline
---

We use low resolution [OMNI data](https://omniweb.gsfc.nasa.gov/ow.html) for plasma state data, as we did in the [OMNI notebook](../omni/index.ipynb)

In [None]:
# | export
import polars as pl
import pandas

from kedro.pipeline import node
from kedro.pipeline.modular_pipeline import pipeline

In [None]:
# | default_exp pipelines/themis/state

## Solar wind state
Also we have additional data file that indicate if `THEMIS` is in solar wind or not.

In [None]:
#| export
def load_sw_data(raw_data: pandas.DataFrame):
    return pl.from_dataframe(raw_data)

In [None]:
# | export
def preprocess_sw_data(
    raw_data: pl.LazyFrame,
) -> pl.LazyFrame:
    """
    - Applying naming conventions for columns
    - Parsing and typing data (like from string to datetime for time columns)
    """

    return raw_data.with_columns(
        # Note: For `polars`, please either specify both hour and minute, or neither.
        pl.concat_str(pl.col("start"), pl.lit(" 00")).str.to_datetime(
            format="%Y %j %H %M"
        ),
        pl.concat_str(pl.col("end"), pl.lit(" 00")).str.to_datetime(
            format="%Y %j %H %M"
        ),
    )

## Pipelines

In [None]:
#| export
def create_sw_pipeline(sat_id="THB", source="STATE"):
    namespace = f"{sat_id}.{source}"
    node_load_sw_data = node(
        load_sw_data,
        inputs="original_sw_data",
        outputs="raw_data_sw",
        name="load_solar_wind_data",
    )
    node_preprocess_sw_state = node(
        preprocess_sw_data,
        inputs="raw_data_sw",
        outputs="inter_data_sw",
        name="preprocess_solar_wind_data",
    )
    return pipeline(
        [
            node_load_sw_data,
            node_preprocess_sw_state,
        ],
        namespace=namespace,
    )

In [None]:
# | export
def create_pipeline(sat_id="THB", source="STATE"):
    return create_sw_pipeline(sat_id=sat_id, source=source)