# Dataloading Example

This notebook shows how to load `_resource.csv` and `_runtimes.csv` files from `/data`

In [1]:
import pandas as pd

## Read resource files

Resource files have the following structure:

- _datetime_: Timestamp when the resource snapshot was taken
- _epoch_: Current epoch of the trainer
- _memory\_used_: (CPU-)RAM used
- _memory\_free_: (CPU-)RAM available

<code>for i in number_of_gpus:</code>
- _gpu\_{i}\_memory\_used_: VRAM of GPU i used
- _gpu\_{i}\_memory\_free_: VRAM of GPU i available

In [2]:
import itertools

num_gpus = 8
gpu_headers = list(itertools.chain.from_iterable([[f"gpu_{i}_memory_used", f"gpu_{i}_memory_free"] for i in range(num_gpus)]))

resources = pd.read_csv("ocp-metrics/s2ef/gemnet_t/1658138522_stage1_8gpus_resources.csv",
                        header=None,
                        names=[
                            "datetime",
                            "epoch",
                            "memory_used",
                            "memory_free"
                        ] + gpu_headers)

resources

Unnamed: 0,datetime,epoch,memory_used,memory_free,gpu_0_memory_used,gpu_0_memory_free,gpu_1_memory_used,gpu_1_memory_free,gpu_2_memory_used,gpu_2_memory_free,gpu_3_memory_used,gpu_3_memory_free,gpu_4_memory_used,gpu_4_memory_free,gpu_5_memory_used,gpu_5_memory_free,gpu_6_memory_used,gpu_6_memory_free,gpu_7_memory_used,gpu_7_memory_free
0,2022-07-18 10:05:02.496490,0,42750967808,98423980032,47173.4375,1966.5625,37352.4375,11787.5625,38978.4375,10161.5625,39392.4375,9747.5625,35242.4375,13897.5625,39146.4375,9993.5625,36578.4375,12561.5625,39018.4375,10121.5625
1,2022-07-18 10:08:02.598793,0,43498582016,97660993536,48789.4375,350.5625,37352.4375,11787.5625,39856.4375,9283.5625,41604.4375,7535.5625,36906.4375,12233.5625,39970.4375,9169.5625,38246.4375,10893.5625,39018.4375,10121.5625
2,2022-07-18 10:11:02.700742,0,44104036352,97040134144,31115.4375,18024.5625,39758.4375,9381.5625,39856.4375,9283.5625,41604.4375,7535.5625,37760.4375,11379.5625,40804.4375,8335.5625,38246.4375,10893.5625,39018.4375,10121.5625
3,2022-07-18 10:14:02.802194,0,44610613248,96521674752,39507.4375,9632.5625,39758.4375,9381.5625,39856.4375,9283.5625,41604.4375,7535.5625,37760.4375,11379.5625,42618.4375,6521.5625,40002.4375,9137.5625,40836.4375,8303.5625
4,2022-07-18 10:17:02.903738,0,45281861632,95820500992,39507.4375,9632.5625,40630.4375,8509.5625,40740.4375,8399.5625,41604.4375,7535.5625,37760.4375,11379.5625,42618.4375,6521.5625,40002.4375,9137.5625,40836.4375,8303.5625
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138,2022-07-18 16:59:13.254287,4,48534450176,91633631232,33625.4375,15514.5625,41666.4375,7473.5625,41156.4375,7983.5625,41892.4375,7247.5625,43210.4375,5929.5625,45428.4375,3711.5625,39498.4375,9641.5625,40082.4375,9057.5625
139,2022-07-18 17:02:13.353563,4,49754624000,90403942400,33625.4375,15514.5625,41666.4375,7473.5625,41156.4375,7983.5625,41892.4375,7247.5625,43210.4375,5929.5625,45428.4375,3711.5625,39498.4375,9641.5625,40082.4375,9057.5625
140,2022-07-18 17:05:13.454855,4,49903726592,90247999488,33625.4375,15514.5625,41666.4375,7473.5625,41156.4375,7983.5625,41892.4375,7247.5625,43210.4375,5929.5625,45428.4375,3711.5625,39498.4375,9641.5625,40082.4375,9057.5625
141,2022-07-18 17:08:13.553549,4,50011889664,90137124864,33625.4375,15514.5625,41666.4375,7473.5625,43010.4375,6129.5625,41892.4375,7247.5625,43210.4375,5929.5625,45428.4375,3711.5625,39498.4375,9641.5625,40082.4375,9057.5625


## Read runtime files

Runtime files have the following structure:

- _rank_: Rank of device (=GPU)
- _epoch_: Current epoch of the trainer
- _epoch\_time_: Total time of epoch (in s)
- _dataloading\_time_: Time spend during dataloading (in s)
- _forward\_time_: Time spend during forward pass (in s)
- _backward\_time_: Time spend during backward pass (in s)

In [3]:
runtimes = pd.read_csv("ocp-metrics/s2ef/gemnet_t/1658138522_stage1_8gpus_runtimes.csv",
                        header=None,
                        names=[
                            "rank",
                            "epoch",
                            "epoch_time",
                            "dataloading_time",
                            "forward_time",
                            "backward_time"
                        ])

runtimes

Unnamed: 0,rank,epoch,epoch_time,dataloading_time,forward_time,backward_time
0,0,0,4831.628426,2.502689,2029.771993,2305.274723
1,6,0,4831.838826,3.137321,2029.729852,2306.2563
2,7,0,4831.991909,2.420488,2012.909939,2305.317963
3,4,0,4831.998671,2.444411,2029.129666,2305.356793
4,3,0,4832.032114,2.464529,2016.577881,2304.924315
5,5,0,4832.042439,2.252492,2012.139287,2304.625425
6,2,0,4832.046059,2.442905,2025.135091,2304.629135
7,1,0,4832.072531,2.356498,2004.619684,2304.765459
8,0,1,5637.699811,2.422401,2698.231861,2305.291131
9,7,1,5637.507586,2.320783,2683.775887,2305.674166
