---
title: IDs from ARTHEMIS
---

## Background

ARTEMIS spacecrafts will be exposed in the solar wind at 1 AU during its orbits around the Moon. So it's very interesting to look into its data.

- For time inteval for THEMIS-B in solar wind, see [Link](https://omniweb.gsfc.nasa.gov/ftpbrowser/themis_b_sw.txt)
- For time inteval for THEMIS-C in solar wind, see [Link](https://omniweb.gsfc.nasa.gov/ftpbrowser/themis_c_sw.txt)


## Setup

Need to run command in shell first as `pipeline` is project-specific command

```{sh}
kedro pipeline create themis
```

To get candidates data, run `kedro run --from-inputs=jno.feature_1s --to-outputs=candidates.jno_1s`

In [None]:
# | code-summary: import all the packages needed for the project
# | output: hide
# | export
import polars as pl
from kedro.pipeline import Pipeline, node
from kedro.pipeline.modular_pipeline import pipeline

from ids_finder.pipelines.themis.mag import create_pipeline as create_mag_data_pipeline
from ids_finder.pipelines.themis.state import create_pipeline as create_state_data_pipeline
from ids_finder.pipelines.default.mission import create_combined_data_pipeline

In [None]:
#| hide
#| default_exp pipelines/themis/pipeline
%load_ext autoreload
%autoreload 2

#### `Kerdo`

In [None]:
# | eval: false
from ids_finder.utils.basic import load_catalog

In [None]:
catalog = load_catalog()

jno_start_date = catalog.load("params:jno_start_date")
jno_end_date = catalog.load("params:jno_end_date")
trange = [jno_start_date, jno_end_date]

## Processing the whole data

In [None]:
#| export
from ids_finder.utils.basic import filter_tranges_df

def filter_sw_events(events: pl.LazyFrame, sw_state: pl.LazyFrame) -> pl.LazyFrame:
    
    start, end = sw_state.select(['start', 'end']).collect()
    sw_events = filter_tranges_df(events.collect(), (start, end))
    
    return sw_events

def create_sw_events_pipeline(
    sat_id,
    tau: int = 60,
    ts_mag: int = 1,
    
):
  
    ts_mag_str = f"ts_{ts_mag}s"
    tau_str = f"tau_{tau}s"
    
    node_filter_sw_events = node(
        filter_sw_events,
        inputs=[
            f"candidates.{sat_id}_{ts_mag_str}_{tau_str}",
            f"{sat_id}.inter_state_sw",
        ],
        outputs=f"events.sw.{sat_id}_{ts_mag_str}_{tau_str}"
        
    )

    nodes = [node_filter_sw_events]
    return pipeline(nodes)

In [None]:
# | export
def create_pipeline(
    sat_id="thb",
    tau=60, # time window, in seconds
    ts_state="1h",  # time resolution of state data
    ts_mag = 1, # time resolution of mag data, in seconds
) -> Pipeline:
    return (
        create_mag_data_pipeline(sat_id, tau=tau)
        + create_state_data_pipeline(sat_id, ts=ts_state)
        + create_combined_data_pipeline(sat_id, tau=tau, ts_mag= ts_mag, ts_state=ts_state)
        + create_sw_events_pipeline(sat_id, tau=tau, ts_mag= ts_mag)
    )

In [None]:
# | eval: false
# catalog.load('thb.inter_mag_4s').collect().describe()
# catalog.load('thb.primary_state_1h').collect().describe()

## Obsolete codes

### Check and preprocess the data

As we are only interested in the data when THEMIS is in the solar wind, for simplicity we will only keep the data when `X, SSE` and `X, GSE` is positive.

- State data time resolution is 1 minute...

- FGS data time resolution is 4 second...

In [None]:
def get_thm_state(sat):
    sat_pos_sse_files = f"../data/{sat}_pos_sse.parquet"
    sat_pos_sse = pl.scan_parquet(sat_pos_sse_files).set_sorted("time")
    sat_pos_gse_files = f"../data/{sat}_pos_gse.parquet"
    sat_pos_gse = pl.scan_parquet(sat_pos_gse_files).set_sorted("time")
    sat_state = sat_pos_sse.join(sat_pos_gse, on="time", how="inner")
    return sat_state

In [None]:
%%markdown
df = (
    sat_state_sw.upsample("time", every="1m")
    .group_by_dynamic("time", every="1d")
    .agg(pl.col("X, SSE").null_count().alias("null_count"))
    .with_columns(
        pl.when(pl.col("null_count") > 720).then(0).otherwise(1).alias("availablity")
    )
)

properties = {
    'width': 800,
}

chart1 = alt.Chart(df).mark_point().encode(
    x='time',
    y='null_count'
).properties(**properties)

chart2  = alt.Chart(df).mark_point().encode(
    x='time',
    y='availablity'
).properties(**properties)

(chart1 & chart2)