# The `ExperimentData` object

We define an experiment as a single set of input parameters that are used to run a simulation or an experiment. `f3dasm` uses a custom `ExperimentSample` object to keep track of these inputs and outputs of the experiments.

Each `ExperimentSample` is effectively an individual experiment. 
- It contains a dictionary `input_data` with key-value pairs of the input variables and a dictionary `output_data` with key-value pairs of the resulted output variables.
- The property `job_status` is used to keep track of the status of the experiment. It can be one of the following values: `open`, `in progress`, `finished`, or `error`.

All of these individual experiments are bundled together in the custom `ExperimentData` object. The `ExperimentData` object is the main object used to keep track of results, perform optimization and extract data for machine learning purposes. All other processses of `f3dasm` use this object to manipulate and access data about your experiments.

The ExperimentData object consists of the following attributes:

- `domain`: The `Domain` of the Experiment. This is used for keeping track of the input and output variables of the experiments
- `data`: A dictionary containing the data of the experiments. The keys of the dictionary are numerical identifiers (starting from $0$) of the experiments and the values are `ExperimentSample` objects. 
- `project_dir`: A user-defined project directory where all data related to your data-driven process will be stored.

## Creating the `ExperimentData` object

The `ExperimentData` object can be constructed in several ways:

You can construct a ExperimentData object by providing it `input data`, `output data`, a `Domain` object and a project directory:


In [1]:
from f3dasm import ExperimentData

The `domain` object needs to be constructed before creating the `ExperimentData` object. The `Domain` object is used to keep track of the input and output variables of the experiments:

In [2]:
from f3dasm.design import Domain

domain = Domain()
domain.add_float(name='x0', low=0., high=1.)
domain.add_float(name='x1', low=0., high=1.)

The `input_data` and `output_data` can be provided in a tabular matter and in one of the following formats:
- a 2D numpy array. The first dimension corresponds to the number of experiments and the second dimension corresponds to the input/output variables.


In [3]:
import numpy as np

input_data = np.array([
    [0.1, 0.4],
    [0.2, 0.5],
    [0.3, 0.6]
])

experimentdata = ExperimentData(domain=domain, input_data=input_data)
experimentdata

Unnamed: 0_level_0,jobs,input,input
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1
0,open,0.1,0.4
1,open,0.2,0.5
2,open,0.3,0.6


- a pandas DataFrame. The columns of the DataFrame correspond to the input/output variables and the rows correspond to the experiments.

In [4]:
import pandas as pd

input_data = pd.DataFrame({
    'x0': [0.1, 0.2, 0.3],
    'x1': [0.4, 0.5, 0.6]
})


experimentdata = ExperimentData(domain=domain, input_data=input_data)
experimentdata

Unnamed: 0_level_0,jobs,input,input
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1
0,open,0.1,0.4
1,open,0.2,0.5
2,open,0.3,0.6



- a list of dictionaries. Each dictionary corresponds to an experiment and the keys of the dictionary correspond to the input/output variables.

In [5]:
input_data = [{'x0': 0.1, 'x1': 0.4}, 
              {'x0': 0.2, 'x1': 0.5}, 
              {'x0': 0.3, 'x1': 0.6}]

experimentdata = ExperimentData(domain=domain, input_data=input_data)
experimentdata

Unnamed: 0_level_0,jobs,input,input
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1
0,open,0.1,0.4
1,open,0.2,0.5
2,open,0.3,0.6


> It is also possible to infer the parameter names from the input and output data. This will automatically infer the names of the input and output variables from the input and output data. For numpy arrays, there is no way to infer the names of the variables, so default names (e.g. `x0`, `x1`, `y0`, `y1`) will be used. 

- a path to a `.csv` file.

In [6]:
input_data = pd.DataFrame({
    'x0': [0.1, 0.2, 0.3],
    'x1': [0.4, 0.5, 0.6]
})

# For the sake of this example, we store the input_data in a file:
input_data.to_csv('input_data.csv')

# Now we can load the input_data from the file:
experimentdata = ExperimentData(domain=domain, input_data='input_data.csv')
experimentdata

Unnamed: 0_level_0,jobs,input,input
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1
0,open,0.1,0.4
1,open,0.2,0.5
2,open,0.3,0.6


We demonstrated how to use various datatypes for the `input_data` but the same applies to the `output_data` of the `ExperimentData` object:

In [7]:
domain.add_output(name='y')

input_data = [{'x0': 0.1, 'x1': 0.4}, 
              {'x0': 0.2, 'x1': 0.5}, 
              {'x0': 0.3, 'x1': 0.6}]

output_data = [{'y': 0.5},
               {'y': 0.6}]

experimentdata = ExperimentData(domain=domain, input_data=input_data, output_data=output_data)
experimentdata

Unnamed: 0_level_0,jobs,input,input,output
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1,y
0,finished,0.1,0.4,0.5
1,finished,0.2,0.5,0.6
2,open,0.3,0.6,


For experiments where the output data is given upon creation, the `job_status` will be set to `finished`, indicating that these experiments do not need to be run again.
For experiments where the output data is not given upon creation, the `job_status` will be set to `open`, indicating that these experiments are open to be evaluated.

The status of a job can be manually set by using the `mark` method of the `ExperimentData` object:

In [8]:
experimentdata.mark(indices=1, status='open')
experimentdata

Unnamed: 0_level_0,jobs,input,input,output
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1,y
0,finished,0.1,0.4,0.5
1,open,0.2,0.5,0.6
2,open,0.3,0.6,


Upon inspecting the `ExperimentData` object, you will see that the `data` attribute is a dictionary with numerical identifiers as keys and `ExperimentSample` objects as values:

In [9]:
experimentdata.data

defaultdict(f3dasm._src.experimentsample.ExperimentSample,
            {0: ExperimentSample(input_data={'x0': 0.1, 'x1': 0.4}, output_data={'y': 0.5}, job_status=JobStatus.FINISHED),
             1: ExperimentSample(input_data={'x0': 0.2, 'x1': 0.5}, output_data={'y': 0.6}, job_status=JobStatus.OPEN),
             2: ExperimentSample(input_data={'x0': 0.3, 'x1': 0.6}, output_data={}, job_status=JobStatus.OPEN)})

---

## Manipulating the `ExperimentData` object

Multiple `ExperimentData` objects can be combined using the `+` operator. This will create a new `ExperimentData` object that contains all the experiments from the two original `ExperimentData` objects.
If applicable, the `ExperimentData` objects are combined as well.

In [10]:
experimentdata_1 = ExperimentData(domain=domain, input_data=np.array([
    [0.1, 0.4],
    [0.2, 0.5],
    [0.3, 0.6]
]))

experimentdata_2 = ExperimentData(domain=domain, input_data=pd.DataFrame({
    'x0': [0.7, 0.8, 0.9],
    'x1': [0.3, 0.2, 0.1]
}))

experimentdata_1 + experimentdata_2

Unnamed: 0_level_0,jobs,input,input
Unnamed: 0_level_1,Unnamed: 1_level_1,x0,x1
0,open,0.1,0.4
1,open,0.2,0.5
2,open,0.3,0.6
3,open,0.7,0.3
4,open,0.8,0.2
5,open,0.9,0.1


---

## Storing the `ExperimentData` object

The `ExperimentData` object can be stored to a series of files using the `store()` method. In this example, we will show how to store the `ExperimentData` to disk using the `store()` method and how to load the stored data using the `from_file()` method.

- The `project_dir` argument of the is used to store the `ExperimentData` to disk. You can provide a string or a path to a directory. This can either be a relative or absolute path. If the directory does not exist, it will be created.
- The `store()` method is used to store the experiment data to the directory provided.

In [11]:
experimentdata = experimentdata.set_project_dir('./my_project')

experimentdata.store()

The data is stored in several files in an `/experiment_data` subfolder in the provided project directory:

```
my_project/
├── my_script.py
└── experiment_data
      ├── domain.json
      ├── input_data.csv
      ├── output_data.csv
      └── jobs.csv
```

In order to load the data, you can use `ExperimentData.from_file()`:

In [12]:
data_loaded = ExperimentData.from_file(project_dir="./my_project")

---

## Exporting the `ExperimentData` object

The `ExperimentData` object can be exported to several common data formats:

- To two numpy arrays (one for the input data and one for the output data) using the `to_numpy()` method.

In [13]:
experimentdata.to_numpy()

(array([[0.1, 0.4],
        [0.2, 0.5],
        [0.3, 0.6]]),
 array([[0.5],
        [0.6],
        [nan]]))

- To two pandas DataFrames (one for the input data and one for the output data) using the `to_pandas()` method.

In [14]:
df_input, df_output = experimentdata.to_pandas()
df_input

Unnamed: 0,x0,x1
0,0.1,0.4
1,0.2,0.5
2,0.3,0.6


- to an `xarray.Dataset` object using the `to_xarray()` method:

In [15]:
experimentdata.to_xarray()