---
title: Finding magnetic discontinuities
order: 0
---

It can be divided into two parts:

1. Finding the discontinuities, see [this notebook](./01_ids_detection.ipynb)
    - Corresponding to limited feature extraction / anomaly detection

    - Output should contain the following:
        - "tstart" and "tstop" of the event

2. Calculating the properties of the discontinuities, see [this notebook](./02_ids_properties.ipynb)
    - One can use higher time resolution data

In [None]:
# | default_exp core/pipeline

In [None]:
# | export
# | code-summary: "Import all the packages needed for the project"
import polars as pl
from discontinuitypy.detection.variance import detect_variance
from discontinuitypy.core.propeties import process_events
from space_analysis.ds.ts.io import df2ts
from loguru import logger

from typing import Callable

## Processing the whole dataset

Notes that the candidates only require a small portion of the data so we can compress the data to speed up the processing.

In [None]:
# | exporti
from beforerr.polars import filter_df_by_ranges


def compress_data_by_events(data: pl.DataFrame, events: pl.DataFrame):
    """Compress the data for parallel processing"""
    starts = events["tstart"]
    ends = events["tstop"]
    return filter_df_by_ranges(data, starts, ends)


def get_bcols(df: pl.LazyFrame):
    """Get the magnetic field components"""
    bcols = df.collect_schema().names()
    bcols.remove("time")
    len(bcols) == 3 or logger.error("Expect 3 field components")
    return bcols

In [None]:
# | export
def ids_finder(
    detection_df: pl.LazyFrame,  # data used for anomaly dectection (typically low cadence data)
    bcols=None,
    detect_func: Callable[..., pl.LazyFrame] = detect_variance,
    detect_kwargs: dict = {},
    extract_df: pl.LazyFrame = None,  # data used for feature extraction (typically high cadence data),
    **kwargs,
):
    bcols = bcols or get_bcols(detection_df)
    detection_df = detection_df.select(bcols + ["time"]).sort("time")
    extract_df = (extract_df or detection_df).sort("time")

    events = detect_func(detection_df, bcols=bcols, **detect_kwargs)

    data_c = compress_data_by_events(extract_df.collect(), events)
    sat_fgm = df2ts(data_c, bcols)
    ids = process_events(events, sat_fgm, **kwargs)
    return ids