# Plot the data

__Question:__ What is the First thing you should do after you receive your data?

__Answer:__ Plot the data for visual inspection!

## Load the data

In [1]:
import pandas as pd

In [2]:
df_2020 = pd.read_csv("../raw_data/2020_heat.csv", delimiter=";", index_col=False)
df_2021 = pd.read_csv("../raw_data/2021_heat.csv", delimiter=";", index_col=False)
df_2022 = pd.read_csv("../raw_data/2022_heat.csv", delimiter=";", index_col=False)

df = pd.concat([df_2020, df_2021, df_2022], ignore_index=True)
df.rename({"S1": "heat_power", "S2": "flow_rate", "S3": "leader_temp", "S4": "return_temp"}, axis=1, inplace=True)
df

Unnamed: 0,Timestamp,heat_power,flow_rate,leader_temp,return_temp
0,2020-01-01T00:15:00.000000+01:00,713.000,17650.000,81.000,46.000
1,2020-01-01T00:30:00.000000+01:00,330.000,9600.000,81.000,51.000
2,2020-01-01T00:45:00.000000+01:00,705.000,16730.000,81.000,44.000
3,2020-01-01T01:00:00.000000+01:00,663.000,16890.000,81.000,47.000
4,2020-01-01T01:15:00.000000+01:00,412.000,10690.000,81.000,48.000
...,...,...,...,...,...
102715,2022-12-07T23:00:00.000000+01:00,488.286,23120.000,78.286,59.714
102716,2022-12-07T23:15:00.000000+01:00,524.625,23136.250,79.375,59.500
102717,2022-12-07T23:30:00.000000+01:00,435.143,21630.000,79.571,62.143
102718,2022-12-07T23:45:00.000000+01:00,534.500,24685.000,78.750,59.750


## Plot the data: the quick way

In [5]:
import plotly.express as px

In [9]:
fig = px.line(data_frame=df, y="heat_power")
fig.show()

## Plot the data: the good way

AKA fun with datetime

Everyones favorite pasttime: dealing with the datetime. Basically, I want the naive but local dattime as a variable. For now, I am not going to use datetime as the index, because of thing like daylight savings time. 

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 102720 entries, 0 to 102719
Data columns (total 5 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   Timestamp    102720 non-null  object 
 1   heat_power   100407 non-null  float64
 2   flow_rate    100407 non-null  float64
 3   leader_temp  100407 non-null  float64
 4   return_temp  100407 non-null  float64
dtypes: float64(4), object(1)
memory usage: 3.9+ MB


In [None]:
df.head()

Unnamed: 0,Timestamp,heat_power,flow_rate,leader_temp,return_temp
0,2020-01-01T00:15:00.000000+01:00,713.0,17650.0,81.0,46.0
1,2020-01-01T00:30:00.000000+01:00,330.0,9600.0,81.0,51.0
2,2020-01-01T00:45:00.000000+01:00,705.0,16730.0,81.0,44.0
3,2020-01-01T01:00:00.000000+01:00,663.0,16890.0,81.0,47.0
4,2020-01-01T01:15:00.000000+01:00,412.0,10690.0,81.0,48.0


As we see, the Timestamp is not in a datetime format, but rather an object (string) with some funny syntax describing date and time, including timezone info. We have to bring it into proper format. 

The following code looks way easier than it was to get there. Took me hours, spread over several days... 

In [12]:
df.Timestamp = pd.to_datetime(df.Timestamp, infer_datetime_format=True, utc=True).dt.tz_convert(tz="Europe/Berlin")
df.head()

Unnamed: 0,Timestamp,heat_power,flow_rate,leader_temp,return_temp
0,2020-01-01 00:15:00+01:00,713.0,17650.0,81.0,46.0
1,2020-01-01 00:30:00+01:00,330.0,9600.0,81.0,51.0
2,2020-01-01 00:45:00+01:00,705.0,16730.0,81.0,44.0
3,2020-01-01 01:00:00+01:00,663.0,16890.0,81.0,47.0
4,2020-01-01 01:15:00+01:00,412.0,10690.0,81.0,48.0


In [13]:
fig = px.line(data_frame=df, x="Timestamp", y="heat_power")
fig.show()

## Observations

We clearly see seasonal patterns:
* yearly patterns
* dayly patterns

We also see some missing values. 