# Measuring and logging the CPU usage at the `hpc05`

Takes a measuring point **every 15 minutes** and then updates this website.

Found a mistake or want to know something? Ask/e-mail Bas at [basnijholt@gmail.com](mailto:basnijholt@gmail.com) or see the complete code on [GitHub](https://github.com/basnijholt/cluster-logger).

You can also find this `ipynb` [here](https://github.com/basnijholt/cluster-logger/raw/master/index.ipynb) and the data of the last 60 days [here](https://hpc05.quantumtinkerer.tudelft.nl/database.p).

_You can see the code of this Jupyter Notebook by clicking on this button:_

In [None]:
from logger import *
print('Last time this script ran is at {}'.format(now))

# Current usage at the `hpc05`

In [None]:
print_stat()

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

today = str(now.date())
month = now.strftime("%B")

processes = load_processes('database.p')
df = pd.DataFrame(processes)
df.index = pd.to_datetime(df.current_time, unit='s', utc=True)

gb = df.groupby('Job ID', as_index=False)
df['cpu_time'] = gb['cpu_time'].transform(lambda x: x-x.min())
df['reserved_time'] = gb['current_time'].transform(lambda x: x-x.min())
df['reserved_cpu_time'] = df['reserved_time'] * df['num_cores']
df['activity'] = df['cpu_time'] / df['reserved_cpu_time'] * 100
lasts = gb.last()

def get_user_df(lasts, only_today=False):
    lasts = lasts.copy()
    if only_today:
        # select only today
        lasts.index = pd.to_datetime(lasts.pop('current_time'), unit='s', utc=True)
        lasts = lasts.loc[today]

    by_user = lasts.groupby('Username')
    reserved_days = by_user.reserved_cpu_time.sum() / 86400
    cpu_days = by_user.cpu_time.sum() / 86400
    idle_days = reserved_days - cpu_days
    activity = cpu_days * 100 / reserved_days
    cols = ['CPU time (days)', 'Reserved time (days)',
            'IDLE time (days)', 'Activity (%)']
    user_df = pd.DataFrame([cpu_days, reserved_days, idle_days, activity], 
                           index=cols).T
    return user_df

# Data of the last 60 days

Note that we only started to collect data at May the 11th.

In [None]:
user_df = get_user_df(lasts)
user_df.sort_values('IDLE time (days)', ascending=False)

In [None]:
ax = user_df.sort_values('Activity (%)').plot.bar(y=['Reserved time (days)', 'CPU time (days)'])
ax.set_ylabel('CPU time in days')
ax.set_title('CPU time used per user for the last 60 days')

In [None]:
ax = df.groupby(df.index.weekday_name, sort=False).cpu_time.sum().divide(86400 * 7 * 365).plot.bar()
ax.set_xlabel('Weekday')
ax.set_ylabel('CPU time in years')
ax.set_title('CPU time per weekday in the last 60 days')

In [None]:
ax = df.groupby(df.index.hour + tz_offset, sort=False).cpu_time.sum().divide(86400 * 24).plot.bar()
ax.set_ylabel('CPU time in days')
ax.set_xlabel('Hour of the day')
ax.set_title('CPU time per hour in the last 60 days')

# Only today

In [None]:
user_df_today = get_user_df(lasts, only_today=True)
user_df_today.sort_values('IDLE time (days)', ascending=False)

In [None]:
ax = user_df_today.sort_values('Activity (%)').plot.bar(y=['Reserved time (days)', 'CPU time (days)'])
ax.set_ylabel('CPU time in days')
ax.set_title('CPU time per user today ({})'.format(str(now.utcnow().date())))

# Ideas?
* Showing usage per department
* Average number of cores used per day