## Power Sampling Rate

<b>Central Question</b>: To what extent does calculated energy differ based on varying levels of power sampling frequency for a data center server? 

<b>Process</b>: Compare "true" energy against the energy calculated at different sampling rates. The process of calculating energy from subsets of the full power sample time series can be repeated for different sampling frequencies (e.g. 5s, 10s, 30s, 1 min, 5 min, 1 hr) to compare the accuracy of the calculated energy to the "true" energy value.

<b>Power vs. Energy</b>:
- Power is sampled at regular intervals (e.g. 1s)
- Energy is calculated by multiplying each power sample by the time between samples and summing the products

### Setup & Imports

In [1]:
!pip install pandas --user



In [2]:
# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

### Initial look at data

Data source: https://ieee-dataport.org/open-access/data-server-energy-consumption-dtaset

According to the source, the data was collected from an HP Z440 workstation for 245 days (35 weeks) with a sampling rate of one value per second.

Columns represent the following variables:
* Voltage (V)
* Current (A)
* Power (PA) - Watts (W)
* Frequency - Hertz (Hz)
* Active Energy - kilowatts per hour (KWh)
* Power factor - Adimentional
* ESP32 temperature - Centigrade Degrees (°C)
* CPU consumption - Percentage (%)
* CPU power consumption - Percentage (%)
* CPU temperature - Centigrade Degrees (°C)
* GPU consumption - Percentage (%)
* GPU power consumption - Percentage (%)
* GPU temperature - Centigrade Degrees (°C)
* RAM memory consumption - Percentage (%)
* RAM memory power consumption - Percentage (%)

In [3]:
# Load the csv file into a pandas DataFrame
data = pd.read_csv('./data/1mayo - agosto 2021.csv')

In [4]:
# Print column names of the DataFrame
print(data.columns)

Index(['MAC', 'weekday', 'fecha_servidor', 'fecha_esp32', 'voltaje',
       'corriente', 'potencia', 'frecuencia', 'energia', 'fp', 'ESP32_temp',
       'WORKSTATION_CPU', 'WORKSTATION_CPU_POWER', 'WORKSTATION_CPU_TEMP',
       'WORKSTATION_GPU', 'WORKSTATION_GPU_POWER', 'WORKSTATION_GPU_TEMP',
       'WORKSTATION_RAM', 'WORKSTATION_RAM_POWER'],
      dtype='object')


In [5]:
# Show first 5 rows of DataFrame
data.head(5)

Unnamed: 0,MAC,weekday,fecha_servidor,fecha_esp32,voltaje,corriente,potencia,frecuencia,energia,fp,ESP32_temp,WORKSTATION_CPU,WORKSTATION_CPU_POWER,WORKSTATION_CPU_TEMP,WORKSTATION_GPU,WORKSTATION_GPU_POWER,WORKSTATION_GPU_TEMP,WORKSTATION_RAM,WORKSTATION_RAM_POWER
0,3C:61:05:12:96:30,4,2021-05-06 10:00:00,2021-05-06 10:00:00,120.1,0.93,96.3,60.0,1.16,0.86,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
1,3C:61:05:12:96:30,4,2021-05-06 10:00:01,2021-05-06 10:00:01,120.1,0.93,96.3,59.9,1.16,0.86,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
2,3C:61:05:12:96:30,4,2021-05-06 10:00:01,2021-05-06 10:00:01,120.0,0.94,96.6,59.9,1.16,0.86,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
3,3C:61:05:12:96:30,4,2021-05-06 10:00:02,2021-05-06 10:00:02,120.0,0.94,96.6,59.9,1.16,0.86,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
4,3C:61:05:12:96:30,4,2021-05-06 10:00:03,2021-05-06 10:00:03,120.0,0.94,96.6,59.9,1.16,0.86,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0


In [6]:
print(data.describe(include='all'))

                      MAC       weekday       fecha_servidor  \
count             7887568  7.887568e+06              7887568   
unique                  1           NaN              6499426   
top     3C:61:05:12:96:30           NaN  2021-05-06 10:00:03   
freq              7887568           NaN                    6   
mean                  NaN  4.097151e+00                  NaN   
std                   NaN  2.015128e+00                  NaN   
min                   NaN  1.000000e+00                  NaN   
25%                   NaN  2.000000e+00                  NaN   
50%                   NaN  4.000000e+00                  NaN   
75%                   NaN  6.000000e+00                  NaN   
max                   NaN  7.000000e+00                  NaN   

                fecha_esp32       voltaje     corriente      potencia  \
count               7887568  7.887568e+06  7.887568e+06  7.887568e+06   
unique              6499426           NaN           NaN           NaN   
top     2021

In [7]:
data.dtypes

MAC                       object
weekday                    int64
fecha_servidor            object
fecha_esp32               object
voltaje                  float64
corriente                float64
potencia                 float64
frecuencia               float64
energia                  float64
fp                       float64
ESP32_temp               float64
WORKSTATION_CPU          float64
WORKSTATION_CPU_POWER    float64
WORKSTATION_CPU_TEMP       int64
WORKSTATION_GPU          float64
WORKSTATION_GPU_POWER      int64
WORKSTATION_GPU_TEMP     float64
WORKSTATION_RAM          float64
WORKSTATION_RAM_POWER    float64
dtype: object

In [8]:
# Make a copy of the DataFrame and clean data
data_clean = data.drop(columns=['MAC'])
data_clean = data_clean.rename(columns={
    "fecha_servidor": "date_server",
    "fecha_esp32": "date_esp32",
    "voltaje": "voltage",
    "corriente": "current",
    "potencia": "power", 
    "frecuencia": "frequency",
    "energia": "energy",
})
data_clean

Unnamed: 0,weekday,date_server,date_esp32,voltage,current,power,frequency,energy,fp,ESP32_temp,WORKSTATION_CPU,WORKSTATION_CPU_POWER,WORKSTATION_CPU_TEMP,WORKSTATION_GPU,WORKSTATION_GPU_POWER,WORKSTATION_GPU_TEMP,WORKSTATION_RAM,WORKSTATION_RAM_POWER
0,4,2021-05-06 10:00:00,2021-05-06 10:00:00,120.1,0.93,96.3,60.0,1.16,0.86,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0
1,4,2021-05-06 10:00:01,2021-05-06 10:00:01,120.1,0.93,96.3,59.9,1.16,0.86,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0
2,4,2021-05-06 10:00:01,2021-05-06 10:00:01,120.0,0.94,96.6,59.9,1.16,0.86,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0
3,4,2021-05-06 10:00:02,2021-05-06 10:00:02,120.0,0.94,96.6,59.9,1.16,0.86,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0
4,4,2021-05-06 10:00:03,2021-05-06 10:00:03,120.0,0.94,96.6,59.9,1.16,0.86,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7887563,1,2021-08-02 02:35:18,2021-08-02 02:35:18,120.0,0.86,100.0,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0
7887564,1,2021-08-02 02:35:19,2021-08-02 02:35:19,120.0,0.84,97.5,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0
7887565,1,2021-08-02 02:35:21,2021-08-02 02:35:21,120.0,0.85,98.6,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0
7887566,1,2021-08-02 02:35:22,2021-08-02 02:35:22,120.0,0.85,99.0,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0


In [9]:
data_clean['date_server'] = pd.to_datetime(data_clean['date_server'])
data_clean['date_esp32'] = pd.to_datetime(data_clean['date_esp32'])
data_clean.dtypes

weekday                           int64
date_server              datetime64[ns]
date_esp32               datetime64[ns]
voltage                         float64
current                         float64
power                           float64
frequency                       float64
energy                          float64
fp                              float64
ESP32_temp                      float64
WORKSTATION_CPU                 float64
WORKSTATION_CPU_POWER           float64
WORKSTATION_CPU_TEMP              int64
WORKSTATION_GPU                 float64
WORKSTATION_GPU_POWER             int64
WORKSTATION_GPU_TEMP            float64
WORKSTATION_RAM                 float64
WORKSTATION_RAM_POWER           float64
dtype: object

In [10]:
data_clean = data_clean.sort_values(['date_server'], ascending=True)
data_clean.head()

Unnamed: 0,weekday,date_server,date_esp32,voltage,current,power,frequency,energy,fp,ESP32_temp,WORKSTATION_CPU,WORKSTATION_CPU_POWER,WORKSTATION_CPU_TEMP,WORKSTATION_GPU,WORKSTATION_GPU_POWER,WORKSTATION_GPU_TEMP,WORKSTATION_RAM,WORKSTATION_RAM_POWER
20,3,2021-05-05 22:05:27,2021-05-05 22:05:27,119.9,1.15,126.4,60.0,0.0,0.92,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
22,3,2021-05-05 22:05:28,2021-05-05 22:05:28,119.9,1.09,118.5,60.0,0.0,0.91,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
21,3,2021-05-05 22:05:28,2021-05-05 22:05:28,119.9,1.15,126.4,60.0,0.0,0.92,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
24,3,2021-05-05 22:05:29,2021-05-05 22:05:29,120.0,1.01,107.7,60.0,0.0,0.89,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0
23,3,2021-05-05 22:05:29,2021-05-05 22:05:29,119.9,1.09,118.5,60.0,0.0,0.91,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0


In [11]:
start_date = data_clean['date_server'].iat[0]
start_date

Timestamp('2021-05-05 22:05:27')

In [12]:
for row in data_clean:
    data_clean['total_seconds'] = (data_clean['date_server'] - start_date).dt.total_seconds()
data_clean

Unnamed: 0,weekday,date_server,date_esp32,voltage,current,power,frequency,energy,fp,ESP32_temp,WORKSTATION_CPU,WORKSTATION_CPU_POWER,WORKSTATION_CPU_TEMP,WORKSTATION_GPU,WORKSTATION_GPU_POWER,WORKSTATION_GPU_TEMP,WORKSTATION_RAM,WORKSTATION_RAM_POWER,total_seconds
20,3,2021-05-05 22:05:27,2021-05-05 22:05:27,119.9,1.15,126.4,60.0,0.00,0.92,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0
22,3,2021-05-05 22:05:28,2021-05-05 22:05:28,119.9,1.09,118.5,60.0,0.00,0.91,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0,1.0
21,3,2021-05-05 22:05:28,2021-05-05 22:05:28,119.9,1.15,126.4,60.0,0.00,0.92,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0,1.0
24,3,2021-05-05 22:05:29,2021-05-05 22:05:29,120.0,1.01,107.7,60.0,0.00,0.89,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0,2.0
23,3,2021-05-05 22:05:29,2021-05-05 22:05:29,119.9,1.09,118.5,60.0,0.00,0.91,0.00,0.0,0.0,0,0.0,0,0.0,0.0,0.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7887563,1,2021-08-02 02:35:18,2021-08-02 02:35:18,120.0,0.86,100.0,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0,7619391.0
7887564,1,2021-08-02 02:35:19,2021-08-02 02:35:19,120.0,0.84,97.5,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0,7619392.0
7887565,1,2021-08-02 02:35:21,2021-08-02 02:35:21,120.0,0.85,98.6,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0,7619394.0
7887566,1,2021-08-02 02:35:22,2021-08-02 02:35:22,120.0,0.85,99.0,60.0,165.76,0.97,33.89,0.0,0.0,0,0.0,0,0.0,0.0,0.0,7619395.0


### Calculate "true" energy consumed
The most accurate calculation of energy would use each of the power samples over the time series.

In [64]:
data_power_1s = data_clean[['total_seconds', 'power', 'energy']].copy().reset_index(drop=True)
data_power_1s

Unnamed: 0,total_seconds,power,energy
0,0.0,126.4,0.00
1,1.0,118.5,0.00
2,1.0,126.4,0.00
3,2.0,107.7,0.00
4,2.0,118.5,0.00
...,...,...,...
7887563,7619391.0,100.0,165.76
7887564,7619392.0,97.5,165.76
7887565,7619394.0,98.6,165.76
7887566,7619395.0,99.0,165.76


In [65]:
# average power values taken at the same second
data_power_1s = data_power_1s.groupby(['total_seconds'])['power'].agg('mean').reset_index()
data_power_1s

Unnamed: 0,total_seconds,power
0,0.0,126.40
1,1.0,122.45
2,2.0,113.10
3,3.0,104.30
4,4.0,100.90
...,...,...
6499421,7619391.0,100.00
6499422,7619392.0,97.50
6499423,7619394.0,98.60
6499424,7619395.0,99.00


In [69]:
# Samples are taken over 1 second intervals
energy_true = (data_power_1s['power'] * 1).sum()  # units: Watt * second
energy_true = energy_true / 3600  # units: Watt * hour
energy_true = energy_true / 1000  # units: Kilowatt * hour
print("The true energy consumed is " + str(round(energy_true,3)) + " kWh.") 

The true energy consumed is 141.946 kWh.


### Energy value using power values at 5 second intervals

In [67]:
# Aggregate power at 5 second intervals
data_power_5s = data_power_1s.iloc[::5, :]
data_power_5s

Unnamed: 0,total_seconds,power
0,0.0,126.400000
5,5.0,105.400000
10,10.0,102.600000
15,15.0,98.900000
20,20.0,99.266667
...,...,...
6499405,7619374.0,67.000000
6499410,7619380.0,66.700000
6499415,7619385.0,96.300000
6499420,7619390.0,100.000000


In [70]:
energy_true_5s = (data_power_5s['power'] * 5).sum()  # units: Watt * second
energy_true_5s = energy_true_5s / 3600  # units: Watt * hour
energy_true_5s = energy_true_5s / 1000  # units: Kilowatt * hour
print("Using 5 second power intervals, the calculated energy consumed is " + str(round(energy_true_5s,3)) + " kWh.") 

Using 5 second power intervals, the calculated energy consumed is 141.949 kWh.


### Energy value using power values at 10 second intervals

In [71]:
# Aggregate power at 10 second intervals
data_power_10s = data_power_1s.iloc[::10, :]
data_power_10s

Unnamed: 0,total_seconds,power
0,0.0,126.400000
10,10.0,102.600000
20,20.0,99.266667
30,30.0,98.566667
40,40.0,97.900000
...,...,...
6499380,7619348.0,66.800000
6499390,7619358.0,67.900000
6499400,7619369.0,68.200000
6499410,7619380.0,66.700000


In [72]:
energy_true_10s = (data_power_10s['power'] * 10).sum()  # units: Watt * second
energy_true_10s = energy_true_10s / 3600  # units: Watt * hour
energy_true_10s = energy_true_10s / 1000  # units: Kilowatt * hour
print("Using 10 second power intervals, the calculated energy consumed is " + str(round(energy_true_10s,3)) + " kWh.") 

Using 10 second power intervals, the calculated energy consumed is 141.96 kWh.


### Energy value using power values at 30 second intervals

In [73]:
# Aggregate power at 30 second intervals
data_power_30s = data_power_1s.iloc[::30, :]
data_power_30s

Unnamed: 0,total_seconds,power
0,0.0,126.400000
30,30.0,98.566667
60,577.0,110.250000
90,607.0,96.900000
120,637.0,97.200000
...,...,...
6499290,7619252.0,67.900000
6499320,7619284.0,66.700000
6499350,7619316.0,66.900000
6499380,7619348.0,66.800000


In [74]:
energy_true_30s = (data_power_30s['power'] * 30).sum()  # units: Watt * second
energy_true_30s = energy_true_30s / 3600  # units: Watt * hour
energy_true_30s = energy_true_30s / 1000  # units: Kilowatt * hour
print("Using 30 second power intervals, the calculated energy consumed is " + str(round(energy_true_30s,3)) + " kWh.") 

Using 30 second power intervals, the calculated energy consumed is 141.974 kWh.


### Energy value using power values at 1 minute intervals

In [75]:
# Aggregate power at 1 minute (60 sec) intervals
data_power_1m = data_power_1s.iloc[::60, :]
data_power_1m

Unnamed: 0,total_seconds,power
0,0.0,126.40
60,577.0,110.25
120,637.0,97.20
180,697.0,97.10
240,757.0,97.40
...,...,...
6499140,7619093.0,67.20
6499200,7619157.0,100.50
6499260,7619220.0,68.30
6499320,7619284.0,66.70


In [79]:
energy_true_1m = (data_power_1m['power'] * 60).sum()  # units: Watt * second
energy_true_1m = energy_true_1m / 3600  # units: Watt * hour
energy_true_1m = energy_true_1m / 1000  # units: Kilowatt * hour
print("Using 1 minute power intervals, the calculated energy consumed is " + str(round(energy_true_1m,3)) + " kWh.") 

Using 1 minute power intervals, the calculated energy consumed is 141.987 kWh.


### Energy value using power values at 5 minute intervals

In [77]:
# Aggregate power at 5 min (300 sec) intervals
data_power_5m = data_power_1s.iloc[::300, :]
data_power_5m

Unnamed: 0,total_seconds,power
0,0.0,126.400000
300,817.0,97.250000
600,1117.0,97.233333
900,1417.0,96.100000
1200,1717.0,97.150000
...,...,...
6498000,7617881.0,69.000000
6498300,7618200.0,98.500000
6498600,7618519.0,66.900000
6498900,7618838.0,66.800000


In [78]:
energy_true_5m = (data_power_5m['power'] * 300).sum()  # units: Watt * second
energy_true_5m = energy_true_5m / 3600  # units: Watt * hour
energy_true_5m = energy_true_5m / 1000  # units: Kilowatt * hour
print("Using 5 minute power intervals, the calculated energy consumed is " + str(round(energy_true_5m,3)) + " kWh.") 

Using 5 minute power intervals, the calculated energy consumed is 141.989 kWh.


### Energy value using power values at 1 hour intervals

In [80]:
# Aggregate power at 1 hour (3600 sec) intervals
data_power_1h = data_power_1s.iloc[::3600, :]
data_power_1h

Unnamed: 0,total_seconds,power
0,0.0,126.4
3600,4117.0,96.9
7200,7717.0,96.6
10800,11317.0,96.5
14400,14917.0,98.1
...,...,...
6483600,7602576.0,66.8
6487200,7606402.0,67.6
6490800,7610228.0,66.9
6494400,7614055.0,66.9


In [81]:
energy_true_1h = (data_power_1h['power'] * 3600).sum()  # units: Watt * second
energy_true_1h = energy_true_1h / 3600  # units: Watt * hour
energy_true_1h = energy_true_1h / 1000  # units: Kilowatt * hour
print("Using 1 hour power intervals, the calculated energy consumed is " + str(round(energy_true_1h,3)) + " kWh.") 

Using 1 hour power intervals, the calculated energy consumed is 142.227 kWh.


### Compare "true" energy value with aggregated power values

In [82]:
print("True energy consumed: " + str(round(energy_true,3)) + " kWh.") 
print("Calculated energy using 5 second power intervals: " + str(round(energy_true_5s,3)) + " kWh.") 
print("Calculated energy using 10 second power intervals: " + str(round(energy_true_10s,3)) + " kWh.") 
print("Calculated energy using 30 second power intervals: " + str(round(energy_true_30s,3)) + " kWh.") 
print("Calculated energy using 1 minute power intervals: " + str(round(energy_true_1m,3)) + " kWh.") 
print("Calculated energy using 5 minute power intervals: " + str(round(energy_true_5m,3)) + " kWh.") 
print("Calculated energy using 1 hour power intervals: " + str(round(energy_true_1h,3)) + " kWh.") 


True energy consumed: 141.946 kWh.
Calculated energy using 5 second power intervals: 141.949 kWh.
Calculated energy using 10 second power intervals: 141.96 kWh.
Calculated energy using 30 second power intervals: 141.974 kWh.
Calculated energy using 1 minute power intervals: 141.987 kWh.
Calculated energy using 5 minute power intervals: 141.989 kWh.
Calculated energy using 1 hour power intervals: 142.227 kWh.
