# Voltron - AI

We had the task to help monitoring vineyards with data that should comes from multiple sensors on the spot.
We will try to do forecasting on temperature first in order to be aware of future trends.   

Firstly we analyse and format our data in order to have a glimpse at its structure and distribution.  
Then we will use [prophet](https://facebook.github.io/prophet/) in order to predict futur value for time serie.


**Usage**:  
You need to Upload your [kaggle key](https://www.kaggle.com/docs/api) or upload csv on session.  
1. Download dataset 
2. Cleaning dataset
3. Plots
4. Create model
5. Prediction
6. Testing model











---


## Download dataset

First we upload our kaggle key to the environment.
Then we download it and install pophet - our tool to build our model.  
Finally we read our data and look at its structure from our DataFrame variable.

In [None]:
!rm -r ~/.kaggle
!mkdir ~/.kaggle
!mv ./kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets list

In [None]:
!kaggle datasets download -d garystafford/environmental-sensor-data-132k



In [None]:
!unzip environmental-sensor-data-132k.zip

In [None]:
!pip install prophet
# see https://facebook.github.io/prophet/ for further details

In [49]:
import pandas as pd
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import plotly.express as px 
from prophet import Prophet
from prophet.diagnostics import cross_validation


import seaborn as sn


In [None]:
df = pd.read_csv('iot_telemetry_data.csv')


print(f"""
##############################
Shape:
  {df.shape}

Dtypes:
  {df.dtypes}
############################## 
""")
df.head()

---

## cleaning data

In order to use our data properly we convert our time column to datetime.  
Then we group our data by devices.

In [None]:
df.dropna(inplace=True)
start = datetime(1970, 1, 1)  # Unix epoch start time
df['time'] =  df['ts'].apply(lambda x: start + timedelta(seconds=x))
df.replace(['b8:27:eb:bf:9d:51', '00:0f:00:70:91:0a', '1c:bf:ce:15:ec:4d'], ['Device1','Device2','Device3'], inplace=True)
df

In [None]:
print(df['time'].min())
print(df['time'].max())


In [None]:
data_1 = df[df.device == 'Device1']
data_2 = df[df.device == 'Device2']
data_3 = df[df.device == 'Device3']

print(f"""
  data_1.shape: {data_1.shape}
  data_2.shape: {data_2.shape}
  data_3.shape: {data_3.shape}
""")



---


## Plots

We plot our data by device and notice that each of it gathered very different values.  
We notice especilally by plotting by average that our 3 device are situated in very differents environments.

In [None]:
plt.figure(figsize=(20, 6), dpi=80)

plt.plot(data_1['time'], data_1['co'], label='Device1')
plt.plot(data_2['time'], data_2['co'], label='Device2')
plt.plot(data_3['time'], data_3['co'], label='Device3')

plt.title('carbon m')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(20, 6), dpi=80)

plt.plot(data_1['time'], data_1['humidity'], label='Device1')
plt.plot(data_2['time'], data_2['humidity'], label='Device2')
plt.plot(data_3['time'], data_3['humidity'], label='Device3')

plt.title('humidity')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(20, 6), dpi=80)

plt.plot(data_1['time'], data_1['temp'], label='Device1')
plt.plot(data_2['time'], data_2['temp'], label='Device_C2')
plt.plot(data_3['time'], data_3['temp'], label='Device_C3')

plt.title('temperature')
plt.legend()
plt.show()

We plot temperature by day in order to have wider vision of the temperature

In [None]:
grp_1 = data_1.groupby(data_1["time"].dt.day).mean()
grp_1['ts'] = grp_1['ts'].apply(lambda x: start + timedelta(seconds=x))

grp_2 = data_2.groupby(data_2["time"].dt.day).mean()
grp_2['ts'] = grp_2['ts'].apply(lambda x: start + timedelta(seconds=x))

grp_3 = data_3.groupby(data_3["time"].dt.day).mean()
grp_3['ts'] = grp_3['ts'].apply(lambda x: start + timedelta(seconds=x))
plt.figure(figsize=(20, 6), dpi=80)


plt.plot(grp_1['ts'], grp_1['temp'], label='Device1')
plt.plot(grp_2['ts'], grp_2['temp'], label='Device2')
plt.plot(grp_3['ts'], grp_3['temp'], label='Device3')
plt.title('temperature per day average')
plt.legend()
plt.show()


In [None]:
df_prophet_1 = pd.DataFrame({"ds": data_1["time"], 'y': data_1['temp']})
df_prophet_1

## Create model

In [None]:
from prophet import Prophet
from prophet.plot import plot_yearly
from prophet.diagnostics import cross_validation



model = Prophet()
model.fit(df_prophet_1)

## Prediction

In [None]:
future = model.make_future_dataframe(periods=15000, freq="4S")  
forecast = model.predict(future)
fig1 = model.plot(forecast)

In [None]:
fig2 = model.plot_components(forecast)


In [None]:
forecast

## testing model

Cross validation do a cut in the timevalues and then compute differents metrics (mae, map,...) tryng to predict the cut part of data.

In [None]:
df_cv = cross_validation(model, initial='5 days', period='1day', horizon = '2days')

In [None]:
df_cv

In [None]:
from prophet.plot import plot_cross_validation_metric
from prophet.diagnostics import performance_metrics


fig = plot_cross_validation_metric(df_cv, metric='mape')


In [None]:
df_p = performance_metrics(df_cv)
df_p.head()