# Node usage analysis with `sacct` output

This notebook analyzes the node usage from the Slurm `sacct` output.  
The analysis includes:
- Parsing the `AllocTRES` field to extract CPU, memory, and node information.
- Calculating total node hours used and remaining.
- Estimating running days for a given number of nodes.
- Estimating the required nodes for specific remaining days.

In [1]:
import numpy as np
import pandas as pd

## Load the `sacct` result

The `sacct` command outputs job information. Below is the command used to generate the file:

```shell
sacct -S 2024-10-01 -E 2024-12-01 --format="JOBID,JobName,Partition,State,AllocTres,ElapsedRaw -p -T" > sacct.txt
```

In [2]:
# Load the `sacct` result
info = pd.read_csv("./sacct.txt", delimiter="|", skiprows=1)

In [3]:
# Display the dataset
info

Unnamed: 0,JobID,JobName,Partition,State,AllocTRES,ElapsedRaw,Unnamed: 6
0,14775436,check_cores,normal,COMPLETED,"cpu=2,mem=14200M,node=1",2,
1,14775436.batch,batch,,COMPLETED,"cpu=2,mem=14200M,node=1",2,
2,14775436.extern,extern,,COMPLETED,"cpu=2,mem=14200M,node=1",2,
3,14858548,hr5-016-01,normal,COMPLETED,"cpu=32,mem=227200M,node=1",4,
4,14858548.batch,batch,,COMPLETED,"cpu=32,mem=227200M,node=1",4,
...,...,...,...,...,...,...,...
167,14920353.0,ramses3d,,FAILED,"cpu=1200,mem=7726620M,node=30",220871,
168,14930321,LCDM,large_cpu,CANCELLED by 43801,"billing=3024,cpu=4320,mem=7726620M,node=30",7657,
169,14930321.batch,batch,,CANCELLED,"cpu=144,mem=257554M,node=1",7658,
170,14930321.extern,extern,,COMPLETED,"billing=3024,cpu=4320,mem=7726620M,node=30",7657,


## Parse `AllocTRES` field

Extract `cpu`, `mem`, and `node` information from the `AllocTRES` column and add them as new columns to the DataFrame.

In [4]:
# Extract cpu, mem, and node from AllocTRES
alloc_cols = info["AllocTRES"].str.extract(r'cpu=(\d+),mem=(\d+M),node=(\d+)')
alloc_cols.columns = ["cpu", "mem", "node"]

# Add the parsed data to the DataFrame
info = pd.concat([info, alloc_cols], axis=1)

In [5]:
info

Unnamed: 0,JobID,JobName,Partition,State,AllocTRES,ElapsedRaw,Unnamed: 6,cpu,mem,node
0,14775436,check_cores,normal,COMPLETED,"cpu=2,mem=14200M,node=1",2,,2,14200M,1
1,14775436.batch,batch,,COMPLETED,"cpu=2,mem=14200M,node=1",2,,2,14200M,1
2,14775436.extern,extern,,COMPLETED,"cpu=2,mem=14200M,node=1",2,,2,14200M,1
3,14858548,hr5-016-01,normal,COMPLETED,"cpu=32,mem=227200M,node=1",4,,32,227200M,1
4,14858548.batch,batch,,COMPLETED,"cpu=32,mem=227200M,node=1",4,,32,227200M,1
...,...,...,...,...,...,...,...,...,...,...
167,14920353.0,ramses3d,,FAILED,"cpu=1200,mem=7726620M,node=30",220871,,1200,7726620M,30
168,14930321,LCDM,large_cpu,CANCELLED by 43801,"billing=3024,cpu=4320,mem=7726620M,node=30",7657,,4320,7726620M,30
169,14930321.batch,batch,,CANCELLED,"cpu=144,mem=257554M,node=1",7658,,144,257554M,1
170,14930321.extern,extern,,COMPLETED,"billing=3024,cpu=4320,mem=7726620M,node=30",7657,,4320,7726620M,30


## Compute node hours

In [6]:
# Convert necessary columns to proper data types
info["ElapsedRaw"] = info["ElapsedRaw"].astype(float) # sec
info["node"] = info["node"].astype(float)

# Calculate node hours
info["NodeHours"] = (info["ElapsedRaw"] * info["node"]) / 3600 # hr

In [7]:
info

Unnamed: 0,JobID,JobName,Partition,State,AllocTRES,ElapsedRaw,Unnamed: 6,cpu,mem,node,NodeHours
0,14775436,check_cores,normal,COMPLETED,"cpu=2,mem=14200M,node=1",2.0,,2,14200M,1.0,0.000556
1,14775436.batch,batch,,COMPLETED,"cpu=2,mem=14200M,node=1",2.0,,2,14200M,1.0,0.000556
2,14775436.extern,extern,,COMPLETED,"cpu=2,mem=14200M,node=1",2.0,,2,14200M,1.0,0.000556
3,14858548,hr5-016-01,normal,COMPLETED,"cpu=32,mem=227200M,node=1",4.0,,32,227200M,1.0,0.001111
4,14858548.batch,batch,,COMPLETED,"cpu=32,mem=227200M,node=1",4.0,,32,227200M,1.0,0.001111
...,...,...,...,...,...,...,...,...,...,...,...
167,14920353.0,ramses3d,,FAILED,"cpu=1200,mem=7726620M,node=30",220871.0,,1200,7726620M,30.0,1840.591667
168,14930321,LCDM,large_cpu,CANCELLED by 43801,"billing=3024,cpu=4320,mem=7726620M,node=30",7657.0,,4320,7726620M,30.0,63.808333
169,14930321.batch,batch,,CANCELLED,"cpu=144,mem=257554M,node=1",7658.0,,144,257554M,1.0,2.127222
170,14930321.extern,extern,,COMPLETED,"billing=3024,cpu=4320,mem=7726620M,node=30",7657.0,,4320,7726620M,30.0,63.808333


In [8]:
node_hours_total = 63000 # total node hours available on Olaf
node_hours_used = info["NodeHours"].sum()
node_hours_left = node_hours_total - node_hours_used

print(f"Total node hours: {node_hours_total:10.2f} h")
print(f"Used node hours:  {node_hours_used:10.2f} h ({node_hours_used/node_hours_total*100:.2f}%)")
print(f"Left node hours:  {node_hours_left:10.2f} h ({node_hours_left/node_hours_total*100:.2f}%)")

Total node hours:   63000.00 h
Used node hours:     9784.12 h (15.53%)
Left node hours:    53215.88 h (84.47%)


## Estimate remaining days for specific node usage

In [9]:
nodes = 90
days_left = node_hours_left / 24 / nodes
print(f"Running days with {nodes} nodes: {days_left:.2f} days")

Running days with 90 nodes: 24.64 days


## Estimate required nodes for a specific remaining days

In [10]:
days_left = 30
nodes = node_hours_left / 24 / days_left
print(f"Required nodes for {days_left} days: {nodes:.2f} nodes")

Required nodes for 30 days: 73.91 nodes


# Estimate the number of maximum runs

In [16]:
# For one simulation,
nodes = 30
days_run = 10 # days (~ 6days for ideal case)
node_hours_run = nodes * days_run * 24

In [17]:
node_hours_left / node_hours_run

7.3910941358024695