# Data processing - Level 1.5

- Remove data before and after landing – the data processor needs to take an expert judgement (first couple of meters)
- Definition of columns, order them specifically for final output, 1 file per campaign
- Only report instruments that were flown in that campaign

Timestamp (native), altitude, lat, lon, pressure, Temp, RH, wind speed, wind direction
Total conc: POPS, mSEMS, miniCDA, CPC, ‘other’ (e.g., partector, LOAC)
Other total variables: absorption coefficients, eBC concentration, CO2, CO, O3
Size distributions: usable POPS bins; mSEMS bins, mCDA bins, other (partector bins, LOAC bins)
Pollution flag*, Flight number, campaign name

*: if pollution was not targeted in the science and not representative of the local environment (e.g., pollution in ALPACA = not to be flagged as pollution; during ArtofMelt = pollution has to be flagged)

In [12]:
from pathlib import Path

DATA_FLIGHT_DIR_BASENAME = "2025-02-12_A"

DATA_DIRPATH = Path("/home/rina/Desktop/studies/EPFL/EERL/data/ORACLES/Helikite/Data")
DATA_PROCESSING_DIRPATH = DATA_DIRPATH / "Processing"

DATA_LEVEL0_DIRPATH = DATA_PROCESSING_DIRPATH / "Level0"
DATA_LEVEL1_DIRPATH = DATA_PROCESSING_DIRPATH / "Level1"
DATA_LEVEL1_5_DIRPATH = DATA_PROCESSING_DIRPATH / "Level1.5"

INPUT_DATA_FILE_BASENAME = "level0_2025-02-12T10-35"

## Load level 1 dataset

In [None]:
import pandas as pd

""" CHANGE NAME OF INPUT FILE """

df_level1 = pd.read_csv(
    DATA_LEVEL1_DIRPATH / f"level1_{DATA_FLIGHT_DIR_BASENAME}.csv",
    index_col='DateTime',
    parse_dates=['DateTime']
)

if 'DateTime.1' in df_level1.columns:
    df_level1.rename(columns={'DateTime.1': 'DateTime'}, inplace=True)

df_level1

In [13]:
from helikite.metadata.utils import load_parquet

_, metadata = load_parquet(DATA_LEVEL0_DIRPATH / f"{INPUT_DATA_FILE_BASENAME}.parquet")

## Fill in msems values at takeoff and landing times

'takeoff_time' and 'landing_time' selected in level 0 and stored in the metadata.

mSEMS scan and inverted data are available every 3 minutes. To avoid losing data points at the beginning and end of the flight (due to cutting the DataFrame at takeoff and landing), the empty timestamps at takeoff and landing are filled with the closest available mSEMS data within a 90-second window.

In [None]:
from helikite.processing.post.level1 import fill_msems_takeoff_landing
fill_msems_takeoff_landing(df_level1, metadata, time_window_seconds=90)

## Remove data from before takeoff and after landing time

'takeoff_time' and 'landing_time' selected in level 0 and stored in the metadata.

In [None]:
df_level1 = df_level1.loc[metadata.takeoff_time : metadata.landing_time]
#df_level1 = df_level1.loc[metadata.takeoff_time : "2025-02-06 17:58:00"]
df_level1.iloc[[0, -1]]

## Columns - rename and select defined columns
**Column list and names for final data file**

datetime,Altitude,Lat,Long,P,TEMP,RH,WindSpeed,WindDir,
POPS_total_N,mSEMS_total_N, mCDA_total_N, CPC_total_N,
Filter_position,Filter_flow,
POPS_b3,POPS_b4,POPS_b5,POPS_b6,POPS_b7,POPS_b8,POPS_b9,POPS_b10,POPS_b11,POPS_b12,POPS_b13,POPS_b14,POPS_b15,
mSEMS_Bin_Conc1,mSEMS_Bin_Conc2,mSEMS_Bin_Conc3,mSEMS_Bin_Conc4,mSEMS_Bin_Conc5,mSEMS_Bin_Conc6,mSEMS_Bin_Conc7,mSEMS_Bin_Conc8,mSEMS_Bin_Conc9,mSEMS_Bin_Conc10,
mSEMS_Bin_Conc11,mSEMS_Bin_Conc12,mSEMS_Bin_Conc13,mSEMS_Bin_Conc14,mSEMS_Bin_Conc15,mSEMS_Bin_Conc16,mSEMS_Bin_Conc17,mSEMS_Bin_Conc18,mSEMS_Bin_Conc19,mSEMS_Bin_Conc20,
mSEMS_Bin_Conc21,mSEMS_Bin_Conc22,mSEMS_Bin_Conc23,mSEMS_Bin_Conc24,mSEMS_Bin_Conc25,mSEMS_Bin_Conc26,mSEMS_Bin_Conc27,mSEMS_Bin_Conc28,mSEMS_Bin_Conc29,mSEMS_Bin_Conc30,
mSEMS_Bin_Conc31,mSEMS_Bin_Conc32,mSEMS_Bin_Conc33,mSEMS_Bin_Conc34,mSEMS_Bin_Conc35,mSEMS_Bin_Conc36,mSEMS_Bin_Conc37,mSEMS_Bin_Conc38,mSEMS_Bin_Conc39,mSEMS_Bin_Conc40,
mSEMS_Bin_Conc41,mSEMS_Bin_Conc42,mSEMS_Bin_Conc43,mSEMS_Bin_Conc44,mSEMS_Bin_Conc45,mSEMS_Bin_Conc46,mSEMS_Bin_Conc47,mSEMS_Bin_Conc48,mSEMS_Bin_Conc49,mSEMS_Bin_Conc50,
mSEMS_Bin_Conc51,mSEMS_Bin_Conc52,mSEMS_Bin_Conc53,mSEMS_Bin_Conc54,mSEMS_Bin_Conc55,mSEMS_Bin_Conc56,mSEMS_Bin_Conc57,mSEMS_Bin_Conc58,mSEMS_Bin_Conc59,mSEMS_Bin_Conc60,
mCDA_dataB1, ... , mCDA_dataB256,
tapir_GL,tapir_Lat,tapir_Le,tapir_Lon,tapir_Lm,tapir_speed,tapir_route,tapir_TP,tapir_Tproc1,tapir_Tproc2,tapir_Tproc3,tapir_Tproc4,tapir_TH,tapir_Thead1,tapir_Thead2,tapir_Thead3,tapir_Thead4,tapir_TB,tapir_Tbox,
flag_pollution,flag_hovering,flag_cloud,flight_nr,campaign

In [None]:
import numpy as np

# In case of missing columns
df_level1['latitude_dd'] = np.nan
df_level1['longitude_dd'] = np.nan
df_level1['flight_computer_F_smp_flw'] = np.nan

In [None]:
from helikite.processing.post.level1 import create_level1_dataframe
from helikite.processing.post.level1 import rename_columns
from helikite.processing.post.level1 import round_flightnbr_campaign

df_level1 = create_level1_dataframe(df_level1)
df_level1 = rename_columns(df_level1)
df_level1 = round_flightnbr_campaign(df_level1, metadata, decimals=2)

df_level1

## Level 1.5
**Save file with colums to keep and cut to takeoff and landing.**

In [None]:
print(df_level1.columns.tolist())

In [None]:
""" CHANGE NAME OF OUTPUT FILE """

df_level1.to_csv(DATA_LEVEL1_5_DIRPATH / f"level1.5_{DATA_FLIGHT_DIR_BASENAME}.csv", index=False)