# Chapter 1: Loading data and metadata

In this chapter we learn how to load and save data

We start with a quick conceptual overview what midap-tools does. Midap-tools is a framework that enables loading and handling of entire midap experiments.
There are two methods how to load an experiment.

- directly from a midap output folder
- from a .h5 save file. These save files can be created using midap-tools after performing calculations and operations

## Loading data

### Loading from midap output folder

In order to load an experiment from a midap output folder we use the following command


In [15]:
from fluid_experiment.fluid_experiment import FluidExperiment

PATH = "../../data/midap-tools_example"
experiment = FluidExperiment(PATH)

Loading sample at position pos1 for color channel YFP
Loading sample at position pos1 for color channel CFP
Loading sample at position pos2 for color channel YFP
Loading sample at position pos2 for color channel CFP
Loading sample at position pos3 for color channel YFP
Loading sample at position pos3 for color channel CFP
Loading sample at position pos4 for color channel YFP
Loading sample at position pos4 for color channel CFP
Loading sample at position pos5 for color channel YFP
Loading sample at position pos5 for color channel CFP
Loading sample at position pos6 for color channel YFP
Loading sample at position pos6 for color channel CFP
Successfully loaded data with consistent number of frames: 143


***

As we can see from the output, the programm has loaded data from 6 positions, each with 2 color channels (YFP and CFP)

We also receive a short feedback about the length of the experiment, 143 frames in this case

If we want more indepth information, we can use the print command

In [16]:
print(experiment)

FluidExperiment with name: experiment
Path: ../../data/midap-tools_example
6 positions: pos1, pos2, pos3, pos4, pos5, pos6
2 color channels: YFP, CFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, trackID_d1, trackID_d2, split, trackID_mother, first_frame, last_frame, area, edges_min_row, edges_min_col, edges_max_row, edges_max_col, intensity_max, intensity_mean, intensity_min, minor_axis_length, major_axis_length, x, y



***

In the addition to the positions and color channels, we get information about the path where the data is saved and the headers

The `experiment` object that we created saves the data for each position and color channel independently. from the report we can see what consistent headers we have across all the data

This data is saved as a pandas DataFrame, and we can access the data of an individual column using:

In [17]:
df = experiment["pos1"]["YFP"]
df.head(5)

Unnamed: 0.1,Unnamed: 0,globalID,frame,labelID,trackID,lineageID,trackID_d1,trackID_d2,split,trackID_mother,...,edges_min_col,edges_max_row,edges_max_col,intensity_max,intensity_mean,intensity_min,minor_axis_length,major_axis_length,x,y
0,0,1,0,1,1,1,2.0,27.0,0,,...,698.0,60.0,720.0,0.772549,0.60372,0.219608,10.58697,50.863027,37.220049,708.195599
1,1,2,1,6,1,1,2.0,27.0,0,,...,696.0,59.0,725.0,0.803922,0.644886,0.282353,10.277597,51.163987,37.695214,709.246851
2,2,3,2,2,1,1,2.0,27.0,0,,...,694.0,57.0,727.0,0.811765,0.658262,0.294118,9.953063,52.665972,36.811558,709.575377
3,3,4,3,7,1,1,2.0,27.0,0,,...,695.0,55.0,731.0,0.788235,0.650033,0.337255,9.766916,53.222691,36.231362,711.992288
4,4,5,4,12,1,1,2.0,27.0,0,,...,695.0,50.0,738.0,0.780392,0.630351,0.305882,9.911308,55.118799,33.609337,715.420147


***

As we note, this DataFrame contains all the data saved in the `track_output.csv` file that midap creates for the lineages


The data loader by default loads all the data in the parten directory. we may wish to only load selected positions. this can be achieved by using:

In [18]:
experiment = FluidExperiment(PATH,positions=["pos1","pos2","pos3"])
print(experiment)

Loading sample at position pos1 for color channel YFP
Loading sample at position pos1 for color channel CFP
Loading sample at position pos2 for color channel YFP
Loading sample at position pos2 for color channel CFP
Loading sample at position pos3 for color channel YFP
Loading sample at position pos3 for color channel CFP
Successfully loaded data with consistent number of frames: 143
FluidExperiment with name: experiment
Path: ../../data/midap-tools_example
3 positions: pos1, pos2, pos3
2 color channels: YFP, CFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, trackID_d1, trackID_d2, split, trackID_mother, first_frame, last_frame, area, edges_min_row, edges_min_col, edges_max_row, edges_max_col, intensity_max, intensity_mean, intensity_min, minor_axis_length, major_axis_length, x, y



***

Similarly we can load only a subset of all color channels using:

In [19]:
experiment = FluidExperiment(PATH,color_channels=["YFP"])
print(experiment)

Loading sample at position pos1 for color channel YFP
Loading sample at position pos2 for color channel YFP
Loading sample at position pos3 for color channel YFP
Loading sample at position pos4 for color channel YFP
Loading sample at position pos5 for color channel YFP
Loading sample at position pos6 for color channel YFP
Successfully loaded data with consistent number of frames: 143
FluidExperiment with name: experiment
Path: ../../data/midap-tools_example
6 positions: pos1, pos2, pos3, pos4, pos5, pos6
1 color channels: YFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, trackID_d1, trackID_d2, split, trackID_mother, first_frame, last_frame, area, edges_min_row, edges_min_col, edges_max_row, edges_max_col, intensity_max, intensity_mean, intensity_min, minor_axis_length, major_axis_length, x, y



***

Or combinations of both position and color channels

In [20]:
experiment = FluidExperiment(PATH,color_channels=["YFP"],positions=["pos1","pos2","pos3"])
print(experiment)

Loading sample at position pos1 for color channel YFP
Loading sample at position pos2 for color channel YFP
Loading sample at position pos3 for color channel YFP
Successfully loaded data with consistent number of frames: 143
FluidExperiment with name: experiment
Path: ../../data/midap-tools_example
3 positions: pos1, pos2, pos3
1 color channels: YFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, trackID_d1, trackID_d2, split, trackID_mother, first_frame, last_frame, area, edges_min_row, edges_min_col, edges_max_row, edges_max_col, intensity_max, intensity_mean, intensity_min, minor_axis_length, major_axis_length, x, y



***

We can also set the experiments name. this will be the default name used when saving the experiment. It makes sense to set this to something recognizable

In [21]:
experiment = FluidExperiment(PATH,name = "midap_setup_test")
print(experiment)

Loading sample at position pos1 for color channel YFP
Loading sample at position pos1 for color channel CFP
Loading sample at position pos2 for color channel YFP
Loading sample at position pos2 for color channel CFP
Loading sample at position pos3 for color channel YFP
Loading sample at position pos3 for color channel CFP
Loading sample at position pos4 for color channel YFP
Loading sample at position pos4 for color channel CFP
Loading sample at position pos5 for color channel YFP
Loading sample at position pos5 for color channel CFP
Loading sample at position pos6 for color channel YFP
Loading sample at position pos6 for color channel CFP
Successfully loaded data with consistent number of frames: 143
FluidExperiment with name: midap_setup_test
Path: ../../data/midap-tools_example
6 positions: pos1, pos2, pos3, pos4, pos5, pos6
2 color channels: YFP, CFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, 

### Loading from save file

if we midap-tools to work with data from an experiment, there is the option to export all our analysis results as a single .h5 file.
our example data comes with a prepared save file. we can load this by using:

In [22]:
SAVEFILE_PATH = "../../data/midap-tools_example/example_experiment.h5"

experiment_loaded = FluidExperiment.load(SAVEFILE_PATH)
print(experiment_loaded)

Successfully loaded experiment with data from 6 positions and 2 color channels
FluidExperiment with name: example_experiment
Path: ../../data/midap-tools_example
6 positions: pos1, pos2, pos3, pos4, pos5, pos6
2 color channels: CFP, YFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, trackID_d1, trackID_d2, split, trackID_mother, first_frame, last_frame, area, edges_min_row, edges_min_col, edges_max_row, edges_max_col, intensity_max, intensity_mean, intensity_min, minor_axis_length, major_axis_length, x, y, density_CFP, density_YFP, major_axis_length_log, area_log
Experiment has metadata:
          device_channel          experiment   group position
position                                                     
pos1                   1  example_experiment  Group1     pos1
pos2                   2  example_experiment  Group1     pos2
pos3                   3  example_experiment  Group1     pos3
pos4     

## Create metadata

As we note, the previously loaded data came with already annotated metadata. The first time we create a new experiment, we should always create a metadata template and fill in our grouping variables.
This template can then be loaded and the FluidExperiment now has access to information about experimental design

This is a 3 step process:

1.  export a metadata template
2.  fill in metadata (can be done with excel)
3.  load metadata template

We start by exporting the template

In [23]:
experiment.create_metadata_template()

Skipped: metadata file already exists at ../../data/midap-tools_example/midap_setup_test_metadata.csv


***

As we can see, an empty metadata file was created. We can now open this metadata file for example with excel (or any other program that can modify .csv files) and fill in our groups.
In this process, as many new colums as required can be added to the metadata

**Tip:** by default midap-tools will not overwrite existing templates (to prevent accidents). If you want to force it to create a new template in such cases you can use `experiment.create_metadata_template(overwrite = True)`

we can then load this file again using:

In [24]:
experiment.load_metadata_template()
print(experiment)

FluidExperiment with name: midap_setup_test
Path: ../../data/midap-tools_example
6 positions: pos1, pos2, pos3, pos4, pos5, pos6
2 color channels: YFP, CFP
length of experiment is consistent: 143
experiment has consistent headers: Unnamed: 0, globalID, frame, labelID, trackID, lineageID, trackID_d1, trackID_d2, split, trackID_mother, first_frame, last_frame, area, edges_min_row, edges_min_col, edges_max_row, edges_max_col, intensity_max, intensity_mean, intensity_min, minor_axis_length, major_axis_length, x, y
Experiment has metadata:
         position    group        experiment  device_channel
position                                                    
pos1         pos1  default  midap_setup_test             NaN
pos2         pos2  default  midap_setup_test             NaN
pos3         pos3  default  midap_setup_test             NaN
pos4         pos4  default  midap_setup_test             NaN
pos5         pos5  default  midap_setup_test             NaN
pos6         pos6  default  mida

As we can see, now the experiment has metadata associated with it.

**Tip:** by default, midap-tools will save and load the template in the midap output folder usinge the experiment name as filename (where the experiment data is located). If you want to save and load from a different location, both functions have the option to set the path `experiment.create_metadata_template(path = "/PATH/FILE.csv")` and `experiment.load_metadata_template(path = "/PATH/FILE.csv")`

## Saving data

Finally, midap-tools also alows you to save an experiment. This will include the filtered / processed data and also any metadata and other associated data such as filtering history
To save the entire experiment you can use:

In [25]:
experiment.save()

Saved experiment at ../../data/midap-tools_example/midap_setup_test.h5


This now created a `midap_setup_test.h5` save file in the midap output folder.

**Tip:** by default midap-tools will save the output file in the experiment folder with the experiments name. You can force saveing to a different location by using `experiment.save(file_path="/PATH/FILE.h5")`


Now that you have learned how to load data, add metadata and save entire experiment, we can move on to the next Chapter where we will learn how to do easy QC operations on experiments