# GPU Monitoring
...

## Load Data into Dataframes

In [1]:
%pip install pandas



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import sqlite3
import pandas as pd

connection = sqlite3.connect("data/gpu_monitor.db")

gpu_infos: pd.DataFrame = pd.read_sql_query("SELECT * FROM 'gpu_infos';", connection)
process_infos: pd.DataFrame = pd.read_sql_query("SELECT * FROM 'process_infos';", connection)

connection.close()

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [15]:
from datetime import datetime
import re

month_pattern = re.compile("[A-Z][a-z]{2}")

def split_process_infos(text: str) -> pd.Series:
    # text contains fields for %mem=,%cpu=,user=,stat=,bsdstart=,bsdtime=,cmd=
    components = text.strip().split()

    memory_percentage = float(components[0])
    cpu_percentage = float(components[1])
    user = components[2]
    status = components[3]
    
    # From the ps man page: If the process was started less than 24 hours ago, the
    # output format is " HH:MM", else it is "Mmm:SS" (where Mmm is the three letters 
    # of the month).
    if month_pattern.match(components[4]):
    
    created_at = " ".join(components[4:6])

    # From the ps man page: [bsdtime is] accumulated cpu time, user + system.  The
    # display format is usually "MMM:SS"
    cpu_time = components[6]
    
    cmd = " ".join(components[7:])
    return pd.Series([cpu_percentage, memory_percentage, user, status, created_at, cpu_time, cmd])

process_infos[["cpu_percentage", "memory_percentage", "user", "status", "created_at", "cpu_time", "cmd"]] = process_infos["pid_info"].apply(split_process_infos)

In [16]:
process_infos.head()

Unnamed: 0,pid,pid_info,host_id,timestamp,cpu_percentage,memory_percentage,user,created_at,cmd,cpu_time,status
0,4164634,8.9 1.6 root Ssl Jan 18 271:42 /usr/loc...,teach2,2024-01-29 13:37:27,1.6,8.9,root,Jan 18,/usr/local/bin/python -m ipykernel_launcher -f...,271:42,Ssl
1,441525,0.8 13.1 root Ssl Jan 26 609:36 /usr/loc...,teach2,2024-01-29 13:37:27,13.1,0.8,root,Jan 26,/usr/local/bin/python -m ipykernel_launcher -f...,609:36,Ssl
2,1823085,0.9 55.5 joerg Ssl 13:00 20:45 /usr/bin...,teach3,2024-01-29 13:37:28,55.5,0.9,joerg,13:00 20:45,-m ipykernel_launcher -f /root/.local/share/ju...,/usr/bin/python3,Ssl
3,1823085,0.9 55.5 joerg Ssl 13:00 20:45 /usr/bin...,teach3,2024-01-29 13:37:28,55.5,0.9,joerg,13:00 20:45,-m ipykernel_launcher -f /root/.local/share/ju...,/usr/bin/python3,Ssl
4,4164634,8.9 1.6 root Ssl Jan 18 271:42 /usr/loc...,teach2,2024-01-29 13:44:56,1.6,8.9,root,Jan 18,/usr/local/bin/python -m ipykernel_launcher -f...,271:42,Ssl


In [4]:
gpu_infos.head()

Unnamed: 0,pid,gpu_memory,host_id,timestamp
0,4164634,1711 MiB,teach2,2024-01-29 13:37:27
1,441525,1887 MiB,teach2,2024-01-29 13:37:27
2,1823085,80890 MiB,teach3,2024-01-29 13:37:28
3,1823085,30566 MiB,teach3,2024-01-29 13:37:28
4,4164634,1711 MiB,teach2,2024-01-29 13:44:56


In [None]:
# TODO: Parse types of timestamp and floats

## Analyse Data
Interesting analysis questions include:
* In which context do gpu intensive programs run?
* How is gpu usage distributed? (visualize over time)
* Which resources do gpu-using processes also use? (memory, cpu)
* ¬¬¬¬¬¬