## Introduction 
[Kaggle](https://www.kaggle.com) offers dataset under different licences. The [Indoor Temp Over an Oven and Cooktop](https://www.kaggle.com/datasets/rdickenson/cooktoptemp?select=temp_humidity_control_2018-02-16--2018-03-06.csv) is CC0: Public Domain. You can download a dataset in Python using kagglehub. To install kagglehub, do the following (need only to be run once at all):

In [None]:
pip install kagglehub

There are several packages useful for data analysis in Python: [pandas](https://pandas.pydata.org), [numpy](https://numpy.org). For visualization, [matplotlib](https://matplotlib.org), [seaborn](https://seaborn.pydata.org) or [plotly](https://plotly.com/python/) (requiring nbformat) are quite helpful. Here, plotly is used. Again, this is a system installation and need to be run only once. 

In [None]:
pip install pandas numpy plotly nbformat

The following code summarizes the used package. It may be executed only when not all the following steps are done. 

In [None]:
import kagglehub
import pandas as pandas
import plotly.express as px
import plotly.graph_objects as pgo
import datetime


Now download the data set. You will get a path to the dataset on your computer as return.

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("rdickenson/cooktoptemp")

print("Path to dataset files:", path)

A csv files can be read using the [read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) of pandas. The return of read_cvs is a [dataframe](https://pandas.pydata.org/docs/reference/frame.html), a class of two-dimensional data with many methods. A data column containing date and time can be set up by [to_datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) and used as index to the other columns' data by [set_index](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html).

We read the first dataset that contains the temperature in degree above the stove.

In [None]:
import pandas as pandas

# read dataset
stove_data = pandas.read_csv(path+'/stove_temp_2018-02-16--2018-03-06.csv', sep=',' )
# the dataset has a column 'datetime' that is in a certain format. 
# the next line set this column to be used as date and time
stove_data['datetime'] = pandas.to_datetime(stove_data['datetime'], format='ISO8601')
# and finally, we use the column as index
stove_data.set_index('datetime', inplace = True)

# a look at the data
stove_data


Now load a second data set that contains the temperature of the corner of the kitchen. 

In [None]:
# read dataset
kitchen_data = pandas.read_csv(path+'/temp_humidity_control_2018-02-16--2018-03-06.csv', 
                                sep=',')
# the dataset has a column 'datetime' that is in a certain format. 
# the next line set this column to be used as date and time
kitchen_data['datetime'] = pandas.to_datetime(kitchen_data['datetime'])
kitchen_data.set_index('datetime', inplace = True)

kitchen_data


We do not need the columns 'device_id' and 'date_time_stamp'. Thus, we [drop](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html) these columns in both data sets. Further, we [sort](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html) both dataframes so that they are both in the same order regarding the time.

In [None]:
stove_data = stove_data.drop(columns=['datetime_stamp', 'device_id'])
stove_data.sort_index(inplace = True)
kitchen_data = kitchen_data.drop(columns=['datetime_stamp', 'device_id'])
kitchen_data.sort_index(inplace = True)

stove_data

Now we have all data set up and do a first visualization so that we can see how the data looks like. To this end, we use [plotly.express](https://plotly.com/python/plotly-express/). It offers a simple way to draw a [line plot](https://plotly.com/python/line-charts/) specifying data for the x and y dimension. Here, x i the index (time and date) and y is the column 'degC' of the data frames. The plots is interactive (zoom and pan).

In [None]:
import plotly.express as px

fig_k = px.line(kitchen_data, x=kitchen_data.index, y=kitchen_data['degC'], 
                title='Interactive Plot with Zoom and Pan: kitchen')
fig_s = px.line(stove_data, x=stove_data.index, y=stove_data['degC'], 
                title='Interactive Plot with Zoom and Pan: stove')

fig_k.show()
fig_s.show()


As you can see in the plots, the files contain a different number of data values. If you read the description of the data, it says: <em> Measurements were recorded only when the temperature changed by 1 degree C or more and at a minimum of every 15 minutes.</em>. This means that the time between two values in the data could be hours when there was no change. We would like to have a time intervall of 15 minutes. Therefore, we have to fill samples into the dataset so that we get a proper visualization.

The method [resample](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) can do this. We create new dataframes with the sample rate of 15 minutes. 

In [None]:
kitchen_15m = kitchen_data.resample('15min').ffill()
stove_15m = stove_data.resample('15min').ffill()

stove_15m

Now we plot the original data and the sample data of the kitchen in one plat. Plotly provides the [graph objects](https://plotly.com/python/graph-objects/) classes that can do this.

In [None]:
import plotly.graph_objects as pgo
# Create a figure
fig = pgo.Figure()

# Add traces
fig.add_trace(pgo.Scatter(x=kitchen_data.index, y=kitchen_data['degC'], mode='lines+markers', name='orig. kitchen'))
fig.add_trace(pgo.Scatter(x=kitchen_15m.index, y=kitchen_15m['degC'], mode='lines+markers', name='res. kitchen'))

# Update layout
fig.update_layout(title='Temperature',
                  xaxis_title='time',
                  yaxis_title='deg C')

fig.show()

Next, we take a look of the stove temperature and the kitchen temperature.

In [None]:
import plotly.graph_objects as pgo
import datetime

# Create a figure
fig = pgo.Figure()

# Add traces
fig.add_trace(pgo.Scatter(x=kitchen_15m.index, y=kitchen_15m['degC'], mode='lines+markers', name='res. kitchen'))
fig.add_trace(pgo.Scatter(x=stove_15m.index, y=stove_15m['degC'], mode='lines+markers', name='res. stove'))

# Update layout
fig.update_layout(title='Temperature',
                  xaxis_title='time',
                  yaxis_title='deg C')

# Zoom x axis
#fig.update_layout(xaxis_range=[datetime.datetime(2018, 2, 18), datetime.datetime(2018, 2, 20)])

fig.show()

It is mentioned in the data description that <em>First use of the cooktop is typically to prepare coffee between 05:30 and 07:00; any absence of this event is likely an indicator of trouble.</em>. Now investigate the data and think how you could determine whether there is an absence on a day or not?

It is easy to get some statistics of the data using the [describe method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html).

In [None]:
print(stove_15m.describe())
print(kitchen_15m.describe())

One insight of the data is that the temperature near the stove is dependent on the temperature in the kitchen. When the room temperature is colder, the temperature near the stove is also colder even if it is used. 

Therefore, we investigate the difference between both values. A dataframe of difference between the columns' data is created by subtraction of the dataframes. There are a couple of measurements that do not have corresponding values. These will have values NaN. They are removed with [dropna](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html).

We plot the data and see that there is no 'drift' at the beginning of March now. 

In [None]:
diff_temp = stove_15m-kitchen_15m
diff_temp = diff_temp.dropna()

fig_d = px.line(diff_temp, x=diff_temp.index, y=diff_temp['degC'], title='Interactive Plot with Zoom and Pan: diff', markers=True)

fig_d.show()

print(diff_temp.describe())


Finally, we would like to see if there is a peak every day between 05:30 and 07:00. Therefore, we compare the mean of the temperature in the night until the morning (03:30 to 05:30) with the mean of the temperature between 05:30 and 07:00.  

In [None]:
# Function to check for increased temperature between 5:30 and 07:00 - AI support used :-)
def check_temperature_morning(data):
    # Resample data to daily frequency
    daily_data = data.resample('D')
    
    # Initialize a list to store results
    results = []
    
    # Iterate over each day
    for day, group in daily_data:
        # Extract temperature values 2h before, in the range 
        temp_out_range = group.between_time('03:30', '05:30')['degC']
        temp_in_range  = group.between_time('05:30', '07:00')['degC']
        
        # Check if there is an increase in temperature between before 0530 and 0530 to 0700 clock
        if not temp_in_range.empty and not temp_out_range.empty:
            mean_diff = temp_in_range.mean() - temp_out_range.mean() 
            if mean_diff > 0:
                results.append((day, True, mean_diff))
            else:
                results.append((day, False, mean_diff))
        else:
            results.append((day, False, -100))
    
    return results

# Example usage with your data
# Assuming 'temperature_data' is your DataFrame with a datetime index and a 'temperature' column
results = check_temperature_morning(diff_temp)

# Print the results
for day, increased, value in results:
    print(f"Date: {day.date()}, check_temperature_morning: {increased}, difference: {value}")