# Cleaner class example (LEVEL 0)

The Cleaner class can be initialised and used to perform similar activities to the autoprocessing functionality.

It relies on the definition of the instruments described in the [README.md](https://github.com/EERL-EPFL/helikite-data-processing?tab=readme-ov-file#the-instrument-class) to perform the instantiation.

What happens in this first cell is we define where the input data resides, we instantiate the Cleaner class into a variable `cleaner`, which will scan for the files, load them into memory and allow us to work on all of the instruments in bulk.

The functions used to clean the data and perform corrections are all specific to the `Instrument` and can be modified according to the definitions in the Instrument class. For an example, we can see the Flight Computer's instructions code in [the repository](https://github.com/EERL-EPFL/helikite-data-processing/blob/main/helikite/instruments/flight_computer.py). Editing this file and reloading the environment will alter its behaviour in this script.

Let's get started!

In [None]:
from helikite import Cleaner, instruments
import os
import datetime

# The folder where the data resides. In this example, it is in the folder
# 'rawdata' in relation to where we loaded jupyter lab from.
INPUT_DATA = os.path.join(os.getcwd(), "rawdata")

# Initialise the Cleaner class, scan the input data folder and import
cleaner = Cleaner(
    instruments=[
        instruments.flight_computer_v1,  # These are the classes of the instruments
        instruments.smart_tether,     # that contain all the functions to process
        instruments.pops,             # each one. Add more or remove according to
        instruments.msems_readings,   # the flight
        instruments.msems_inverted,
        instruments.msems_scan,
        instruments.stap,
    ],
    reference_instrument=instruments.flight_computer_v1,  # We need a reference, in this flight it is the flight computer
    input_folder=INPUT_DATA,
    flight_date=datetime.date(2024,4,2),
    # time_takeoff=datetime.datetime(2024,4,2,9,45,15),    # These are commented out as we can do this interactively below
    # time_landing=datetime.datetime(2024,4,2,13,10),      # If you know them already, you can add them here as datetime objects
    time_offset=datetime.time(0),                          # If there is a time_offset to apply, it can be defined here
)

# Checking state

Ok! Our class is now instantiated. The class looked at our input directory and guessed the files based on their `file_identifier()` method in their respective classes, as it happens if used from the CLI or Docker!

There are no errors, so we can assume the raw CSVs have been loaded into memory according to how they are instructed to be read by the `read_data()` method in each instrument class. Each instrument can be accessed now with `cleaner.<instrument_name>` and it will have two pandas dataframes available as `.df` which will hold our data as we progress through any corrections, and a copy of it in `.df_raw` that will not be changed. 

These dataframes can both be used as you wish, as if you loaded them directly with pandas. If you want, you can stop here, and use them as if you imported them manually. 

In [None]:
# Here's an example
cleaner.flight_computer.df

### Wait..
But that's no fun, right? Let's try to make it a bit easier for ourselves! At least we know we are not bound by the capabilities of our library if we want to explore the data differently.

The function .state() is now available to give a summary about how our class is managing our data. It can be used at any stage throughout the cleaning process to help us know what's happening inside of `cleaner`.

In [None]:
# Let's see what it says now...
cleaner.state()

### "No operations have been completed"

What does that mean? Well, we can only perform some of these functions in a specific order, because they are mutating the data. We cannot perform them twice, and some require others to run before they can work on the next step. These are defined with a function decorator to define their dependencies in the `Cleaner` class, and the class tracks which ones have been executed. So as we progress we can see what we have done, if we execute a method that requires something prior, it will not work.

So what can we do? Let's check what methods are available for us to continue with `.help()`

Take note in each description, we can see what needs to be run first, and if that function can only execute once.

In [None]:
cleaner.help()

In [None]:
# So if we try to run a function that has some dependencies, we will be told we cannot
cleaner.correct_time_and_pressure(max_lag=180)

In [None]:
# Let's try again, after performing the corrections
cleaner.set_time_as_index()
cleaner.data_corrections()
cleaner.set_pressure_column()

# Setting our flight times

Now that the necessary functions have been executed to get the dataframes cleaned, we can try to set our flight times. We can do this interactively, by clicking on a time to start, then again on the end time (and if there's a mistake, to click on the first time again). All instruments are plotted, but we can only select the time from our reference instrument, which we set in the beginning as the `flight_computer`. We can zoom in to the plot to pick a good point

In [None]:
cleaner.define_flight_times()

# Correct time and pressure based on time lag
Let's correct the instruments based on their time lag to the reference instrument and plot the pressure to see the result.

In [None]:
cleaner.correct_time_and_pressure(max_lag=180)
cleaner.plot_pressure()

In [None]:
# Depending on preference, this could happen earlier
cleaner.remove_duplicates()

In [None]:
cleaner.state()

In [None]:
# Merge the instruments, they will become available in cleaner.master_df
cleaner.merge_instruments()

# Check our merge

As noted, we can look at the master dataframe in the `cleaner.master_df` variable.

In [None]:
cleaner.master_df

In [None]:
cleaner.export_data()