# Section 3b: Data Integration

Creation process of the joined dataset `prom_slurm_joined.parquet` that is already ready-to-use uploaded to the Zenodo repository.
Runtime of the notebook was around 1 hour and it needs 50-100 GB storage including sratch space in `./checkpoint`.

Data integration: we combine job and node data by joining all job data collected by SLURM with individual timestamps collected by Prometheus. This figure was generated manually with a diagram editor.

![image](plots/section_3/data-integration.png)

Fig. 2: An example of the data integration process. We
match each job record to the fine granular 30s-interval
timestamps of the node dataset.

In [2]:
from util.read_and_print_df import *
from util.handle_nan import *
from util.extract_json_attributes import *
from util.plotting import *
import pyspark.sql.functions as F
from pyspark.sql.types import LongType
import os

In [4]:
# define here your path for scratch space (checkpoint folder for dataframes)
checkpoint_folder = "checkpoint"

def create_directory(directory_name):
    if not os.path.exists(directory_name):
        try:
            os.mkdir(directory_name)
            print(f"Directory created successfully")
        except OSError as e:
            print(f"Error creating directory '{directory_name}': {e}")
    else:
        print(f"Directory already exists")

create_directory(checkpoint_folder)

Directory already exists


In [4]:
spark = get_spark_session()

Assigning 541 GB of memory per spark driver and executor, and use 126 cores.


In [4]:
df_prom = spark.read.parquet(path_node_dataset)
df_slurm = spark.read.parquet(path_job_dataset)

df_prom.show(5, False)
df_slurm.show(5, False)

+---------+-------------------+-----+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+--------------------------+---------------------------+--------------------------+------------------------

In [6]:
# show datetype of the columns
print(df_prom.dtypes)
print(df_slurm.dtypes)

[('id', 'decimal(20,0)'), ('timestamp', 'timestamp'), ('node', 'string'), ('node_time_seconds', 'double'), ('node_load15', 'double'), ('node_power_usage', 'double'), ('up', 'double'), ('node_netstat_Tcp_OutSegs', 'double'), ('node_netstat_Tcp_InErrs', 'double'), ('node_context_switches_total', 'double'), ('node_load5', 'double'), ('node_load1', 'double'), ('node_memory_Active_bytes', 'double'), ('node_netstat_Tcp_RetransSegs', 'double'), ('node_netstat_Udp_InErrors', 'double'), ('node_memory_Dirty_bytes', 'double'), ('node_ambient_temp', 'double'), ('node_netstat_Icmp_InMsgs', 'double'), ('node_netstat_Udp_InDatagrams', 'double'), ('node_intr_total', 'double'), ('node_netstat_Tcp_InSegs', 'double'), ('node_memory_Percpu_bytes', 'double'), ('node_boot_time_seconds', 'double'), ('node_netstat_Udp_OutDatagrams', 'double'), ('node_netstat_Icmp_InErrors', 'double'), ('node_procs_blocked', 'double'), ('node_netstat_Icmp_OutMsgs', 'double'), ('node_memory_MemFree_bytes', 'double'), ('node_pro

In [7]:
# check date ranges and intersection

df_slurm_timestamps = df_slurm.select(F.min("start_date").alias("min"), F.max("end_date").alias("max")).first()
df_prom_timestamps = df_prom.select(F.min("timestamp").alias("min"), F.max("timestamp").alias("max")).first()

print("DataFrame slurm - Min Timestamp:", df_slurm_timestamps["min"])
print("DataFrame slurm - Max Timestamp:", df_slurm_timestamps["max"])
print("DataFrame slurm range in days:", (df_slurm_timestamps["max"] - df_slurm_timestamps["min"]).days + 1)

print("DataFrame prom - Min Timestamp:", df_prom_timestamps["min"])
print("DataFrame prom - Max Timestamp:", df_prom_timestamps["max"])
print("DataFrame prom range in days:", (df_prom_timestamps["max"] - df_prom_timestamps["min"]).days + 1)

max_intersect_ts = min(df_slurm_timestamps["max"], df_prom_timestamps["max"]) 
min_intersect_ts = max(df_slurm_timestamps["min"], df_prom_timestamps["min"])
print("Intersection in days:", (max_intersect_ts - min_intersect_ts).days + 1)



DataFrame slurm - Min Timestamp: 2021-12-26 23:06:31
DataFrame slurm - Max Timestamp: 2022-11-01 13:59:18
DataFrame slurm range in days: 310
DataFrame prom - Min Timestamp: 2022-06-30 16:00:30
DataFrame prom - Max Timestamp: 2022-11-22 11:20:30
DataFrame prom range in days: 145
Intersection in days: 124


                                                                                

In [8]:
def extract_nodes(c):
    
    def process_rack(g):
        # data for a rack is in the r13n[1,2,3] or r13n1 format
        # The r is lost during a previous split
        # We use the identify and extract rack and node identifiers
        rack_id = F.regexp_extract(g, "([0-9]+)n", 1)
        node_ids = F.regexp_extract(g, "n\[?([0-9,]+)", 1)
        node_id_list = F.split(node_ids, ",")
        combined_ids = F.transform(node_id_list, lambda nid: F.concat(lit("r"), rack_id, lit("n"), nid))
        return combined_ids
        
    splits = F.split(c, ",r")
    all_racks = F.transform(splits, lambda x: process_rack(x))
    return F.flatten(all_racks)

(df_slurm.orderBy(F.length("node").desc())
.withColumn("nodez", extract_nodes(col("node")))
.select("id","node","nodez", "start_date", "end_date")# , "metadata")
.limit(5).toPandas())

Unnamed: 0,id,node,nodez,start_date,end_date
0,1616160,"r10n[13,19,25],r11n[19,20,22,23,24,28],r12n[4,...","[r10n13, r10n19, r10n25, r11n19, r11n20, r11n2...",2022-07-12 11:48:24,2022-07-12 11:48:39
1,539519,"r10n[7,11,13,17,19,26,30],r11n[14,23,24,30,32]...","[r10n7, r10n11, r10n13, r10n17, r10n19, r10n26...",2022-04-27 08:50:20,2022-04-27 08:50:58
2,539520,"r10n[7,11,13,17,19,26,30],r11n[14,23,24,30,32]...","[r10n7, r10n11, r10n13, r10n17, r10n19, r10n26...",2022-04-27 08:51:02,2022-04-27 08:51:40
3,539517,"r10n[7,11,13,17,19,26,30],r11n[14,23,24,30,32]...","[r10n7, r10n11, r10n13, r10n17, r10n19, r10n26...",2022-04-27 08:49:05,2022-04-27 08:49:33
4,539518,"r10n[7,11,13,17,19,26,30],r11n[14,23,24,30,32]...","[r10n7, r10n11, r10n13, r10n17, r10n19, r10n26...",2022-04-27 08:49:37,2022-04-27 08:50:16


In [9]:
nodes_extracted_df  =(df_slurm.orderBy(F.length("node").desc())
            .withColumn("nodez", extract_nodes(col("node")))
            .select("*", F.dayofyear("start_date").alias("dayofyear"),  F.dayofyear("end_date").alias("dayofyear_end")))

In [10]:
prom_subset_df = df_prom.withColumn("dayofyear", F.dayofyear("timestamp"))
prom_subset_df.show(5, False)

+---------+-------------------+-----+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+--------------------------+---------------------------+--------------------------+------------------------

In [12]:
# get difference in days
nodes_extracted_df_exploded = nodes_extracted_df.withColumn("diff_days", F.dayofyear("end_date") - F.dayofyear("start_date"))
nodes_extracted_df_exploded = nodes_extracted_df_exploded.orderBy("diff_days", ascending=False)
nodes_extracted_df_exploded.show(10, False)
print(nodes_extracted_df_exploded.count())

# explode diff_days
nodes_extracted_df_exploded = nodes_extracted_df_exploded.withColumn("diff_days_exploded", F.explode(F.sequence(lit(0), col("diff_days"))))
nodes_extracted_df_exploded.show(10, False)
print(nodes_extracted_df_exploded.count())

# add diff days to start_date
nodes_extracted_df_exploded = nodes_extracted_df_exploded.withColumn("running_date_exploded", F.expr("date_add(start_date, diff_days_exploded)"))
nodes_extracted_df_exploded.show(10, False)
print(nodes_extracted_df_exploded.count())

nodes_extracted_df_exploded = nodes_extracted_df_exploded.drop("diff_days", "diff_days_exploded")
nodes_extracted_df_exploded = nodes_extracted_df_exploded.withColumn("dayofyear_running", F.dayofyear("running_date_exploded"))

                                                                                

+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+--------+---------+-------------+---------+
|id |start_date         |end_date           |node  |nodetypes|numnodes|numcores|submit_date        |state  |nodez   |dayofyear|dayofyear_end|diff_days|
+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+--------+---------+-------------+---------+
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r13n5] |360      |365          |5        |
|2  |2021-12-26 23:06:43|2021-12-31 23:06:50|r14n27|normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r14n27]|360      |365          |5        |
|3  |2021-12-26 23:06:43|2021-12-31 23:06:50|r15n12|normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r15n12]|360      |365          |5        |
|4  |2021-12-26 23:06:43|2021-12-31 23:06:50|r10n14|normal(1)|1       |16      |2021-12-

                                                                                

+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+--------+---------+-------------+---------+------------------+
|id |start_date         |end_date           |node  |nodetypes|numnodes|numcores|submit_date        |state  |nodez   |dayofyear|dayofyear_end|diff_days|diff_days_exploded|
+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+--------+---------+-------------+---------+------------------+
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r13n5] |360      |365          |5        |0                 |
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r13n5] |360      |365          |5        |1                 |
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r13n5] |360      |365          |5   

                                                                                

1702895


                                                                                

+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+--------+---------+-------------+---------+------------------+---------------------+
|id |start_date         |end_date           |node  |nodetypes|numnodes|numcores|submit_date        |state  |nodez   |dayofyear|dayofyear_end|diff_days|diff_days_exploded|running_date_exploded|
+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+--------+---------+-------------+---------+------------------+---------------------+
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r13n5] |360      |365          |5        |0                 |2021-12-26           |
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|[r13n5] |360      |365          |5        |1                 |2021-12-27           |
|1  |2021-12-26 23:06:31|2021-12-31

[Stage 44:>                                                         (0 + 1) / 1]

1702895


                                                                                

In [13]:
jobs_count = df_slurm.count()
jobs_count_within_30s = df_slurm.filter((F.col("end_date").cast(LongType()) - F.col("start_date").cast(LongType())) < 30).count()
jobs_count_within_one_day = df_slurm.filter((F.dayofyear(F.col("start_date")) == F.dayofyear(F.col("end_date")))).count()
jobs_count_within_multiple_days = df_slurm.filter((F.dayofyear(F.col("start_date")) < F.dayofyear(F.col("end_date")))).count()

jobs_sum_hours_within_30s = df_slurm.filter((F.col("end_date").cast(LongType()) - F.col("start_date").cast(LongType())) < 30).select(F.sum((F.col("end_date").cast(LongType()) - F.col("start_date").cast(LongType())) / 3600)).first()[0]
jobs_sum_hours_within_one_day = df_slurm.filter((F.dayofyear(F.col("start_date")) == F.dayofyear(F.col("end_date")))).select(F.sum((F.col("end_date").cast(LongType()) - F.col("start_date").cast(LongType())) / 3600)).first()[0]
jobs_sum_hours_within_multiple_days = df_slurm.filter((F.dayofyear(F.col("start_date")) < F.dayofyear(F.col("end_date")))).select(F.sum((F.col("end_date").cast(LongType()) - F.col("start_date").cast(LongType())) / 3600)).first()[0]

print(f"Jobs count: \t\t\t\t{jobs_count}")
print(f"Jobs within 30s: \t\t\t{jobs_count_within_30s} ({builtins.round(jobs_count_within_30s / jobs_count * 100, 2)}%)")
print(f"Jobs within one day: \t\t\t{jobs_count_within_one_day} ({builtins.round(jobs_count_within_one_day / jobs_count * 100, 2)}%)")
print(f"Jobs within multiple days: \t\t{jobs_count_within_multiple_days} ({builtins.round(jobs_count_within_multiple_days / jobs_count * 100, 2)}%)")
print(f"Sum Job hours within 30s: \t\t{builtins.round(jobs_sum_hours_within_30s, 2)} ({builtins.round(jobs_sum_hours_within_30s / (jobs_sum_hours_within_30s + jobs_sum_hours_within_one_day + jobs_sum_hours_within_multiple_days) * 100, 2)}%)")
print(f"Sum Job hours within one day: \t\t{builtins.round(jobs_sum_hours_within_one_day, 2)} ({builtins.round(jobs_sum_hours_within_one_day / (jobs_sum_hours_within_one_day + jobs_sum_hours_within_multiple_days) * 100, 2)}%)")
print(f"Sum Job hours within multiple days: \t{builtins.round(jobs_sum_hours_within_multiple_days, 2)} ({builtins.round(jobs_sum_hours_within_multiple_days / (jobs_sum_hours_within_one_day + jobs_sum_hours_within_multiple_days) * 100, 2)}%)")

Jobs count: 				1596963
Jobs within 30s: 			778844 (48.77%)
Jobs within one day: 			1548216 (96.95%)
Jobs within multiple days: 		48649 (3.05%)
Sum Job hours within 30s: 		1133.98 (0.07%)
Sum Job hours within one day: 		539018.07 (32.04%)
Sum Job hours within multiple days: 	1143160.5 (67.96%)


In [14]:
prom_subset_df = prom_subset_df.withColumnRenamed("id", "prom_id")
prom_subset_df = prom_subset_df.withColumnRenamed("node", "prom_node")
prom_subset_df = prom_subset_df.withColumnRenamed("dayofyear", "prom_dayofyear")

nodes_extracted_df_exploded = nodes_extracted_df_exploded.withColumnRenamed("id", "slurm_id")
nodes_extracted_df_exploded = nodes_extracted_df_exploded.withColumnRenamed("node", "slurm_node")
nodes_extracted_df_exploded = nodes_extracted_df_exploded.withColumnRenamed("dayofyear", "slurm_dayofyear")

prom_subset_df = prom_subset_df.filter((col("timestamp") >= min_intersect_ts) & (col("timestamp") <= max_intersect_ts))
nodes_extracted_df_exploded = nodes_extracted_df_exploded.filter((col("start_date") >= min_intersect_ts) & (col("end_date") <= max_intersect_ts))

In [15]:
# checkpoint subset with all comlumns to read it in at the end again and re-join column to combined df
prom_subset_df = checkpoint_df(prom_subset_df, spark, f"{checkpoint_folder}/prom_subset_df_updated.parquet")
nodes_extracted_df_exploded = checkpoint_df(nodes_extracted_df_exploded, spark, f"{checkpoint_folder}/nodes_extracted_df_exploded_updated.parquet")

                                                                                

In [16]:
prom_subset_df = prom_subset_df.select("prom_id", "prom_node", "prom_dayofyear", "timestamp")
prom_subset_df = checkpoint_df(prom_subset_df, spark, f"{checkpoint_folder}/prom_subset_df_selected_updated.parquet")

                                                                                

In [17]:
# 15 min
df_combined = (prom_subset_df.join(nodes_extracted_df_exploded)
.where(prom_subset_df.prom_dayofyear == nodes_extracted_df_exploded.dayofyear_running)
.where(F.array_contains(nodes_extracted_df_exploded.nodez, prom_subset_df.prom_node))
.filter(prom_subset_df.timestamp >= nodes_extracted_df_exploded.start_date)
.filter(prom_subset_df.timestamp <= nodes_extracted_df_exploded.end_date)
)

print(df_combined.count())



94090170


                                                                                

In [18]:
df_combined.show(10, False)



+--------+---------+--------------+-------------------+--------+-------------------+-------------------+----------+------------------+--------+--------+-------------------+---------+-------+---------------+-------------+---------------------+-----------------+
|prom_id |prom_node|prom_dayofyear|timestamp          |slurm_id|start_date         |end_date           |slurm_node|nodetypes         |numnodes|numcores|submit_date        |state    |nodez  |slurm_dayofyear|dayofyear_end|running_date_exploded|dayofyear_running|
+--------+---------+--------------+-------------------+--------+-------------------+-------------------+----------+------------------+--------+--------+-------------------+---------+-------+---------------+-------------+---------------------+-----------------+
|67659139|r37n1    |243           |2022-08-31 12:55:00|1930815 |2022-08-31 12:54:40|2022-08-31 12:56:19|r37n1     |shared_52c_384g(1)|1       |16      |2022-08-31 12:54:40|COMPLETED|[r37n1]|243            |243        

                                                                                

In [19]:
# 35 min
prom_subset_df = checkpoint_df(prom_subset_df, spark, f"{checkpoint_folder}/prom_subset_df_selected_updated.parquet")



                                                                                

In [20]:
prom_subset_df = spark.read.parquet(f"{checkpoint_folder}/prom_subset_df_selected_updated.parquet")
df_combined = df_combined.join(prom_subset_df, df_combined.prom_id == prom_subset_df.prom_id, "inner").drop(prom_subset_df.prom_id).drop(prom_subset_df.prom_node).drop(prom_subset_df.prom_dayofyear).drop(prom_subset_df.timestamp)

In [21]:
# order columns again, so that prom_subset_df columns are first, then the remaining df_combined columns
df_combined = df_combined.select(prom_subset_df.columns + [col for col in df_combined.columns if col not in prom_subset_df.columns])
df_combined = df_combined.withColumnRenamed("slurm_dayofyear", "slurm_dayofyear_start")
df_combined = df_combined.withColumnRenamed("dayofyear_running", "slurm_dayofyear_running")
df_combined = df_combined.drop("running_date_exploded")
# swap last 2 columns
df_combined = df_combined.select(df_combined.columns[:-2] + df_combined.columns[-1:] + df_combined.columns[-2:-1])
df_combined.show(10, False)

[Stage 99:>                                                         (0 + 1) / 1]

+-------+-------------------+---------+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+--------------------------+---------------------------+--------------------------+----------------------

                                                                                

In [22]:
df_combined = df_combined.orderBy(['prom_node', 'timestamp', 'start_date'])
print("Size:", df_combined.count(), "x", len(df_combined.columns))
df_combined.show(10, False)

                                                                                

Size: 94090170 x 105


                                                                                

+-------+-------------------+---------+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+--------------------------+---------------------------+--------------------------+----------------------

In [25]:
df_combined_renamed = df_combined.drop('prom_dayofyear', 'slurm_node', 'slurm_dayofyear_start', 'slurm_dayofyear_running', 'dayofyear_end')
df_combined_renamed = df_combined_renamed.withColumnRenamed('nodez', 'slurm_nodes')
df_combined_renamed = df_combined_renamed.withColumnRenamed('prom_node', 'node')

df_combined_renamed.show(10, False)

[Stage 119:==>          (19 + 89) / 108][Stage 120:>             (0 + 37) / 199]



+-------+-------------------+-----+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+--------------------------+---------------------------+--------------------------+--------------------------

                                                                                

In [27]:
df_combined_renamed.write.parquet(path_job_node_joined_dataset, mode="overwrite")

[Stage 124:===>         (29 + 79) / 108][Stage 125:>             (0 + 47) / 199]

                                                                                

In [6]:
df_slurm = spark.read.parquet(path_job_dataset)
print("Job Dataset Size:", df_slurm.count(), "x", len(df_slurm.columns), "Start Date:", df_slurm.select(F.min("start_date")).first()[0], "End Date:", df_slurm.select(F.max("end_date")).first()[0])
df_slurm.show(5, False)

df_prom = spark.read.parquet(path_node_dataset)
print("Node Dataset Size:", df_prom.count(), "x", len(df_prom.columns), "Start Date:", df_prom.select(F.min("timestamp")).first()[0], "End Date:", df_prom.select(F.max("timestamp")).first()[0])
df_prom.show(5, False)

df_combined = spark.read.parquet(path_job_node_joined_dataset)
print("Job-Node Dataset Size:", df_combined.count(), "x", len(df_combined.columns), "Start Date:", df_combined.select(F.min("timestamp")).first()[0], "End Date:", df_combined.select(F.max("timestamp")).first()[0])
df_combined.show(5, False)

Job Dataset Size: 1596963 x 9 Start Date: 2021-12-26 23:06:31 End Date: 2022-11-01 13:59:18
+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+
|id |start_date         |end_date           |node  |nodetypes|numnodes|numcores|submit_date        |state  |
+---+-------------------+-------------------+------+---------+--------+--------+-------------------+-------+
|1  |2021-12-26 23:06:31|2021-12-31 23:06:50|r13n5 |normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|
|2  |2021-12-26 23:06:43|2021-12-31 23:06:50|r14n27|normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|
|3  |2021-12-26 23:06:43|2021-12-31 23:06:50|r15n12|normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|
|4  |2021-12-26 23:06:43|2021-12-31 23:06:50|r10n14|normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|
|5  |2021-12-26 23:06:43|2021-12-31 23:06:50|r10n30|normal(1)|1       |16      |2021-12-26 17:11:18|TIMEOUT|
+---+-------------------+-----------

                                                                                

Node Dataset Size: 127827719 x 91 Start Date: 2022-06-30 16:00:30 End Date: 2022-11-22 11:20:30


+---------+-------------------+-----+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+--------------------------+---------------------------+--------------------------+------------------------

                                                                                

Job-Node Dataset Size: 94090170 x 100 Start Date: 2022-06-30 16:01:00 End Date: 2022-11-01 13:59:00
+--------+-------------------+------+-----------------+-----------+----------------+---+------------------------+-----------------------+---------------------------+----------+----------+------------------------+----------------------------+-------------------------+-----------------------+-----------------+------------------------+----------------------------+---------------+-----------------------+------------------------+----------------------+-----------------------------+--------------------------+------------------+-------------------------+-------------------------+------------------+----------------+---------------------------+----------------------------+---------------------------+-------------------------------+-------------------------+-------------------------------------+--------------------------------+---------------------------------+------------------------------+------