## Outlier removal

To remove outliers, load the level 0 CSV file, making sure to set the index to the DateTime. 

This function will load all the data, plot an individual variable, and then save a CSV of any outliers that are chosen as True. This outlier CSV can be used to mask the values in the original dataframe. An example of this is shown in the second cell.

_Note: No data is changed in the input dataframe._

In [None]:
import pandas as pd
import os
import numpy as np
from helikite.processing import choose_flags

INPUT_DATA_FILENAME = os.path.join(os.getcwd(), "level0", "20240402A_level_0.csv")
FLAG_FILENAME = os.path.join(os.getcwd(), "flags.csv")
df = pd.read_csv(INPUT_DATA_FILENAME, low_memory=False, parse_dates=True, index_col=0)

# Assign 'hovering' to selected points
choose_flags(df=df, y="FC_Pressure", flag_file=FLAG_FILENAME, key='instrument_state', value="hovering")

# Mask the original DataFrame

Loading the CSV file, ensuring the index is set to the DateTime column (using parse_dates lets Pandas discover the index is a date column instead of just strings), we can mask any values that are True. 

In [None]:
flags = pd.read_csv(FLAG_FILENAME, index_col=0, parse_dates=True)
flags

In [None]:
# Add the additional columns to the original df by their common index
df = df.merge(flags, left_index=True, right_index=True)
df