# Running an Experiment
Now that you have downloaded OpenDC, we will start creating a simple experiment. 
In this experiment we will compare the performance of a small, and a big data center on the same workload.

Running this demo requires OpenDC. Download the latest release [here](https://github.com/atlarge-research/opendc/releases) and put it in this folder.

## 1. Designing a Data Center

The first requirement to run an experiment in OpenDC is a `topology`. 
A `topology` defines the hardware on which a `workload` is executed. 
Larger topologies will be capable of running more workloads, and will often quicker. 

A `topology` is defined using a JSON file. A `topology` contains one or more _clusters_.
_clusters_ are groups of _hosts_ on a specific location. Each cluster consists of one or more _hosts_. 
A _host_ is a machine on which one or more tasks can be executed. _hosts_ are composed of a _cpu_ and a _memory_ unit. 


### Small Data Center
in this experiment, we are comparing two data centers. Below is an example of the small `topology` file:

```json
{
    "clusters":
    [
        {
            "name": "C01",
            "hosts" :
            [
                {
                    "name": "H01",
                    "cpu":
                    {
                        "coreCount": 12,
                        "coreSpeed": 3300
                    },
                    "memory": {
                        "memorySize": 140457600000
                    }
                }
            ]
        }
    ]
}
```

This `topology` consist of a single _cluster_, with a single _host_. 

The `topology` file can be found [here](topologies/small_datacenter.json)

### Large Data Center
We compare the the previous datacenter with a larger datacenter defined by the following `topology` file:

```json
{
    "clusters":
    [
        {
            "name": "C01",
            "hosts" :
            [
                {
                    "name": "H01",
                    "cpu":
                    {
                        "coreCount": 32,
                        "coreSpeed": 3200
                    },
                    "memory": {
                        "memorySize": 256000
                    }
                }
            ]
        },
        {
            "name": "C02",
            "hosts" :
            [
                {
                    "name": "H02",
                    "count": 6,
                    "cpu":
                    {
                        "coreCount": 8,
                        "coreSpeed": 2930
                    },
                    "memory": {
                        "memorySize": 64000
                    }
                }
            ]
        },
        {
            "name": "C03",
            "hosts" :
            [
                {
                    "name": "H03",
                    "count": 2,
                    "cpu":
                    {
                        "coreCount": 16,
                        "coreSpeed": 3200
                    },
                    "memory": {
                        "memorySize": 128000
                    }
                }
            ]
        }
    ]
}
```

Compared to the small topology, the big topology consist of three clusters, all consisting of a single host.

The `topology` file can be found [here](topologies/big_datacenter.json)

For more in depth information about Topologies, see [Topology](https://atlarge-research.github.io/opendc/docs/documentation/Input/Topology/)


## 2. Workloads

Next to the topology, we need a workload to simulate on the data center. 
In OpenDC, workloads are defined as a bag of tasks. Each task is accompanied by one or more fragments. 
These fragments define the computational requirements of the task over time. 
For this experiment, we will use the bitbrains-small workload. This is a small workload of 50 tasks, 
spanning over a bit more than a month time. You can download the workload [here](documents/workloads/bitbrains-small.zip "download").

Workloads traces define when tasks are submitted, and their computational requirements. 
A workload consists of two trace files defined as parquet files: 

- `tasks.parquet` provides a general overview of the tasks executed during the workload. It defines when tasks are scheduled and the hardware they require. 
- `fragments.parquet` provides detailed information of each task during its runtime 

In [4]:
import pandas as pd

df_tasks = pd.read_parquet(r"D:\courses\DT\assignment\opendc-demos-main\1. Simple Experiment\workload_traces\bitbrains-small\tasks.parquet")
df_fragments = pd.read_parquet(r"D:\courses\DT\assignment\opendc-demos-main\1. Simple Experiment\workload_traces\bitbrains-small\fragments.parquet")

df_tasks.head()

Unnamed: 0,id,submission_time,duration,cpu_count,cpu_capacity,mem_capacity
0,1019,1376314546000,2592252000,1,2926.000135,181352
1,1023,1376314546000,2592252000,1,2925.99956,260096
2,1026,1376314546000,2592252000,1,2925.999717,249972
3,1052,1377787092000,577855000,1,2926.000107,131245
4,1073,1377083232000,1823566000,1,2599.999649,179306


In [5]:
df_fragments.head()

Unnamed: 0,id,duration,cpu_usage
0,1019,300000,0.0
1,1019,300000,11.703998
2,1019,600000,0.0
3,1019,300000,11.703998
4,1019,900000,0.0


## 3. Executing an experiment

To run an experiment, we need to create an `experiment` file. This is a JSON file, that defines what should be executed 
by OpenDC, and how. Below is an example of a simple `experiment` file:

```json
{
    "name": "simple",
    "topologies": [
        {
            "pathToFile": "topologies/small_datacenter.json"
        },
        {
            "pathToFile": "topologies/big_datacenter.json"
        }
    ],
    "workloads": [
        {
            "pathToFile": "workload_traces/bitbrains-small",
            "type": "ComputeWorkload"
        }
    ],
    "exportModels": [
        {
            "exportInterval": 3600,
            "printFrequency": 168,
            "filesToExport": [
                "host",
                "powerSource",
                "service",
                "task"
            ]
        }
    ]
}
```

The **experiment** file defines four parameter values. First, is the `name`. This defines how the experiment is called in the output folder. Second, is the `topologies`. This defines where OpenDC can find the topology files.
third, the `workloads`. This defines which workload OpenDC should run. Finally, `exportModels` defines how OpenDC should export its result. In this case we set the `exportInterval` and the `printFrequency`, and the `filesToExport`. The `exportInterval` and the `printFrequency` determine how often OpenDC should sample for output, and print to the terminal. Using `filesToExport` we specify that we only want to output specific files. For more in depth information about Experiments, see [Experiment] [ExportModel](https://atlarge-research.github.io/opendc/docs/documentation/Input/ExportModel).

As you can see, both `topolgies` and `workloads` are defined as lists. This allows the user to define multiple values. OpenDC will run a simulation for each seperate combination of parameter values. In this case two simulations will be ran; one with the small topology, and one with the big topology. 

For more in depth information about Experiments, see [Experiment](https://atlarge-research.github.io/opendc/docs/documentation/Input/Experiment)

## 4. Running OpenDC

An experiment in OpenDC can be executed directly from the terminal. The only parameter that needs to be provided is `--experiment-path` which is the path to the `experiment` file we defined in 3. While running the experiment, OpenDC periodically prints information about the status of the simulation. In this experiment, OpenDC prints every week, but this can be changes using the `exportModel`.

In [12]:
import subprocess

pathToScenario = r"D:\courses\DT\assignment\opendc-demos-main\1. Simple Experiment\experiments\simple_experiment.json"
subprocess.run([r"D:\courses\DT\assignment\opendc-demos-main\OpenDCExperimentRunner\bin\OpenDCExperimentRunner", "--experiment-path", pathToScenario])

OSError: [WinError 193] %1 is not a valid Win32 application

Running the simulation has created the `output` folder containing information about the experiment. 
In the next tutorial we will use these files for analysis and vizualization.