# Experiment Documendation

This notebook is a user's guide to everything related to conducting experiments.  
The notebook includes a general guide to all actions related to the experiment using an example for demonstration. 

Before we start, you should know that an experiment will be always connected to a certain or several Projects. 
You can read more about Projects [here](https://github.com/KaplanOpenSource/hera/blob/master/hera/doc/jupyter/datalayer/Project.ipynb).  

Given only a zip file containing the metadata of an experiment, you can easily create it and connect it to a project, with a few simple comand lines(CLI) in your environment. 

## Introduction  
### Experiment and Experiment ToolKit
An **Experiment** is a simply folder that contains all the files related to an experiment, such as: the data of the devices, the metadata (which devices are in the experiment and etc.), the class of the experiment (will be explained soon) and other files. The **Experiment toolKit** is the interface which you can access that folder and perform actions and manipulations on it simply, and according to your requirements. 

### Why you should use it  
#### Functions are already build for you
The experiment toolkit is designed to handle experiments data and metadata inside projects. You can perform various actions with it. Instead of writing the code yourself, the tool already contains all functions you need.  
#### Easy access to experiments
It is also useful because it does not require you to remember where all experiments folders are. Inside a project, there may be a high number of experiments that you wish to reach, but you may not remember where their paths. The toolKit already holds the paths for you, so you can create as many experiments as you wish, without worrying about remembering their locations.

## Step 1 - Creation of an Experiment  

Given a zip file which contains the experiment metadata, we can easily create an experiment folder.  
We use the 'create' CLI, which looks like that:

<p style="background:black">
<code style="background:black;color:white">>> hera-experiment create &ltexperiment&gt [--path &ltpath to experiment folder&gt] [--zip &ltzip file&gt]
</code>
</p>   
  
Arguments:  
- **experiment**: The name of the experiment. Must be defined.
- **path**: Path to the experiment directory (Optional). If you don't provide one, it uses the current directory as the experiment folder.
- **zip**: Path to the zip containing the metadata of the experiment (Optional). If does not exist, creates an empty experiment (which we can later add a zip to it).



After we use this command, few things will happen inside your experiment folder:  
1) A new folder <i>code</i> will be created. Inside it, you can find an empty class with the name of the experiment you provided. This class is meant for you to implement if you wish to write functions related to the experiment you are dealing with. You can also remain it empty. This class is also called the experiment class.
2) A new folder <i>data</i> will be created. This folder will contatin the data of the devices related to the experiment.
3) A new folder <i>runtimeExperimentData</i> will be created (only when providing a zip file). It will include the zip file you just provided and a configuration file. This directory is out of your use.
4) A JSON file will be created, named as "experimentName"_repository.json, containing the metadata of the experiment. For example, if you named your experiment as 'Jerusalem_2019', the file will be named Jerusalem_2019_repository.json.  
This JSON file is also called **Repository**. It's purpose is to ease us when we want to create or update a certain project.  You can read more about Repositories [here](https://github.com/KaplanOpenSource/hera/blob/master/hera/doc/jupyter/Repository.ipynb).  
For now, all you need to know about it is that as soon as you add this file to the repositories list, every new project you will create or update, will contain the metadata inside it (in our case, it will contain the experiment).

### Example:  
- We have a zip file contating the metadata of an experiment held in Haifa in 2014, with the name <i>HaifaFluxes2014.zip</i>.
- We wish to use our curren folder, named as <i>haifaExperiment</i>, as our experiment folder. The folder is empty.
-  The zip file containing the metadata is in the path: <i>home/salo/Projects/2024/zip/HaifaFluxes2014.zip</i>.
-  We wish to use the name <i>Haifa2014</i> as the experiment name. 

The CLI will look like that:  

<p style="background:black">
<code style="background:black;color:white">>> hera-experiment create Haifa2014 --zip home/salo/Projects/2024/experimentToolKit/zip/HaifaFluxes2014.zip
</code>
</p>   

Now, our folder will look like that:  

```
haifaExperiment
|   Haifa2014_repository.json
└───code
    │   Haifa2014.py
└───data
    │
└───|runtimeExperimentData
    │   Haifa2014.zip
    │   Datasources_Configurations.json
```

## Step 2 - Add the repository to the repositories list  
Now, we only need to add the Repository we just created to the list of repositories, so when we create a project, the experiment will be loaded to it.  

We do this with the following CLI:  

<p style="background:black">
<code style="background:black;color:white">>> hera-project repository add &ltrepository&gt
</code>
</p>  

### Example:  
In our example, we will use:
<p style="background:black">
<code style="background:black;color:white">>> hera-project repository add Haifa2014_repository.json
</code>
</p>   

## Step 3 - Creating the project  
As mentioned above, every experiment should be connected to a certain Project if we wish to perform actions on  it. So now, we only need to create a new project and the experiment will be automatically connected the new project.  
We do this with:  
<p style="background:black">
<code style="background:black;color:white">>> hera-project project create &ltprojectName&gt
</code>
</p>   

### Example:
If we wish to create a new project with the name 'northProject', we will use:  
<p style="background:black">
<code style="background:black;color:white">>> hera-project project create northProject
</code>
</p>   

Great! Now we have a project with the name 'northProject', with the experiment 'Haifa2014' connected to it.    

**Note**: Now, every time you create a new project, the Haifa2014 experiment will be connected to it, as long as the Haifa2014_repository.json is inside the repositories list. You can always remove the JSON from the list however (You can read more about Repositories [here](/hera/doc/jupyter/Repository.ipynb) if you wish.)

## Loading a experiment to an existing
If you wish to load an experiment to an existing project (without creating a new one), you should repeat steps 1 & 2 and perform the update CLI as following:

<p style="background:black">
<code style="background:black;color:white">>> hera-project project update &ltthe project name&gt [--overwrite]
</code>
</p>   

Arguments:  

- **projectName**: The name of the Project. Must be defined.
- **overwrite**: If is mentioned,  will overwrite the existing project (Optional). Default is False.

## Other useful CLI  

### List  
To display the experiment list in a project, you can type the following command:  

<p style="background:black">
<code style="background:black;color:white">>> hera-experiment list [--projectName &ltthe project name&gt]
</code>
</p>   
Arguments:  

- **projectName**: The name of the Project (Optional). If does not exist take from configuration.json in your directory

### Table
To display the experiments inside a project in a Table, with more details, you can type the  following command:  

<p style="background:black">
<code style="background:black;color:white">>> hera-experiment table [--projectName &ltthe project name&gt]
</code>
</p>   
Arguments:  

- **projectName**: The name of the Project (Optional). If does not exist take from configuration.json in your directory

## Getting the Data  

You can display the data of a certain device inside an experiment. For it, you can type the following command:  

<p style="background:black">
<code style="background:black;color:white">>> hera-experiment data [--projectName &ltthe project name&gt] &ltexperiment&gt &ltdeviceType&gt [--deviceName &ltdevice name&gt] [--perDevice &ltboolean argument&gt]
</code>
</p>   

Arguments:  

- **projectName**: The name of the Project (Optional). If does not exist take from configuration.json in your directory
- **experiment**: The name of the experiment. Must be defined.
- **deviceType**: The name of the device type you wish to display. Must be defined.
- **deviceName**: The name of the device you wish to display. Optional. However, if perDevice=True, it must be specified.
- **perDevice**: Boolean argument, defining if data is stored perDevice (long experiment or not). If true - device name must be specified.

## Experiment ToolKit

After we have an experiment connected to a project, we can use Experiment ToolKit in Python.  

For doing that, we need to import the **toolkitHome** module and specify the <i>toolkitName</i> argument as **toolkitHome.EXPERIMENT**, just as the following cell. We also need to specify the Project we want to deal with, using the <i>projectName</i> argument. We will use the project we just created above.

In [19]:
from hera import toolkitHome

projectName = 'northProject'
experimentToolKit = toolkitHome.getToolkit(toolkitName=toolkitHome.EXPERIMENT, projectName=projectName)

In [20]:
experimentToolKit

<hera.measurements.experiment.experiment.experimentHome at 0x7069fd625650>

### Useful Functions  
Using the experimentToolKit, we can perform various useful functions:

#### List
List of all experiments in a project (only names):

In [21]:
experimentToolKit.keys()

['Haifa2014']

#### Table
List of all experiments in a project in Table displaying:

In [22]:
experimentToolKit.getExperimentsTable()

Unnamed: 0,dataFormat,resource,experimentPath,toolkit,datasourceName,version
0,parquet,/home/salo/Projects/2024/experimentToolKit/hai...,/home/salo/Projects/2024/experimentToolKit/hai...,experimentToolKit,Haifa2014,"[0, 0, 1]"


#### Reaching the Experiment Class
You can reach the experiment class you implemented (or not), by specifing the experiment name:

In [31]:
experimentToolKit.getExperiment('Haifa2014')

<Haifa2014.Haifa2014 at 0x7069fc4d3c90>

#### Metadata
You can reach the metadata of a all experiments in the project:

In [24]:
experimentToolKit.getMeasurementsDocuments()

[<Measurements: {
    "_cls": "Metadata.Measurements",
    "projectName": "northProject",
    "desc": {
        "experimentPath": "/home/salo/Projects/2024/experimentToolKit/haifaExperiment",
        "toolkit": "experimentToolKit",
        "datasourceName": "Haifa2014",
        "version": [
            0,
            0,
            1
        ]
    },
    "type": "ToolkitDataSource",
    "resource": "/home/salo/Projects/2024/experimentToolKit/haifaExperiment/",
    "dataFormat": "parquet"
}>, <Measurements: {
    "_cls": "Metadata.Measurements",
    "projectName": "northProject",
    "desc": {
        "deviceType": "TRH",
        "experimentName": "Haifa2014"
    },
    "type": "Experiment_rawData",
    "resource": "/home/salo/Projects/2024/experimentToolKit/haifaExperiment/data/TRH",
    "dataFormat": "parquet"
}>, <Measurements: {
    "_cls": "Metadata.Measurements",
    "projectName": "northProject",
    "desc": {
        "deviceType": "Sonic",
        "experimentName": "Haifa2014"
 

The first Document (also called DataSource) is the link to the class experiment you implemented. The rest are paths to Parquet files of each device in the experiment.

#### Getting the Data  
Say we have parquet files inside our /data folder in the experiment folder. We can get data of a required device using the following cell:

In [39]:
haifa2014_class = experimentToolKit.getExperiment('Haifa2014')
trh_data = haifa2014_class.getExperimentData().getData('TRH')
trh_data



Unnamed: 0_level_0,TIMESTAMP,RECORD,TC_T,TRH,RH
npartitions=1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
,string,string,string,string,string
,...,...,...,...,...


We first reach the experiment class, and the use the getExperimentData() and getData() functions. 'TRH' is the device type in this case.  
We can display it as a csv dataframe:

In [40]:
pd.DataFrame(trh_data)

Unnamed: 0,0,1,2,3,4
0,2022-09-18 00:00:00,21710734,20.69,20.00,
1,2022-09-18 00:00:08,21710735,20.73,19.98,
2,2022-09-18 00:00:09,21710736,20.73,19.92,
3,2022-09-18 00:00:10,21710737,20.74,19.92,
4,2022-09-18 00:00:11,21710738,20.74,19.90,
...,...,...,...,...,...
53988,2022-09-18 14:59:55,21764722,26.61,26.37,
53989,2022-09-18 14:59:56,21764723,26.53,26.35,
53990,2022-09-18 14:59:57,21764724,26.55,26.31,
53991,2022-09-18 14:59:58,21764725,26.54,26.35,
