# <b>1 <span style='color:#F1C40F'>|</span> Introduction to Date and Time</b>

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>1.1 | How to import data ?</b></p>
</div>

First, we import all the datasets needed for this kernel. The required time series column is imported as a datetime column using **<span style='color:#F1C40F'>parse_dates</span>** parameter and is also selected as index of the dataframe using **<span style='color:#F1C40F'>index_col</span>** parameter.

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>1.2 | Timestamps and Periods</b></p>
</div>

Timestamps are used to represent a point in time. Periods represent an interval in time. Periods can used to check if a specific event in the given period. They can also be converted to each other's form.

📌 Video: [How to use dates and times with pandas](https://campus.datacamp.com/courses/manipulating-time-series-data-in-python/working-with-time-series-in-pandas?ex=1): explain **<span style='color:#F1C40F'>TimeStamp</span>** and **<span style='color:#F1C40F'>Period</span>** data. 

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>1.3 | Using date_range</b></p>
</div>

date_range is a method that returns a fixed **<span style='color:#F1C40F'>frequency datetimeindex</span>**. It is quite useful when creating your own time series attribute for pre-existing data or arranging the whole data around the time series attribute created by you.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
import datetime
from learntools.time_series.style import *

from pathlib import Path
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
from statsmodels.tsa.stattools import adfuller
from sklearn.preprocessing import OrdinalEncoder,OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import mean_absolute_error
from xgboost import XGBRegressor
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.metrics import mean_absolute_error
from sklearn.neighbors import KNeighborsRegressor

import plotly.express as px
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.offline as offline
import plotly.graph_objs as go
from sklearn.model_selection import TimeSeriesSplit
from fbprophet import Prophet

from pandas_profiling import ProfileReport as PR
import warnings
warnings.filterwarnings("ignore")

In [None]:
path = Path('../input/tabular-playground-series-jul-2021')
train = pd.read_csv(path/"train.csv")
test = pd.read_csv(path/'test.csv')
df_data = pd.concat([train,test], sort=True)

# <b>3 <span style='color:#F1C40F'>|</span> Feature Engineering</b>

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b> Time Series Features</b></p>
</div>

We will break down the date into different columns:

* One for the year
* One for the month
* One for the week
* One for the quarter of the year
* One for the day of the week
* One for year season (winter, spring, summer, autumn)
* One for weekend

In [None]:
df_data.date_time = pd.to_datetime(df_data.date_time)
df_data['year'] = df_data['date_time'].dt.year
df_data['month'] = df_data['date_time'].dt.month
df_data['week'] = df_data['date_time'].dt.isocalendar().week
df_data['quarter'] = df_data['date_time'].dt.quarter
df_data['day_of_week'] = df_data['date_time'].dt.day_name()
df_data['is_weekend'] = (df_data["date_time"].dt.dayofweek >= 5).astype("int")
df_data["is_winter"] = df_data["month"].isin([1, 2, 12]).astype("int")
df_data["is_spring"] = df_data["month"].isin([3, 4, 5]).astype("int")
df_data["is_summer"] = df_data["month"].isin([6, 7, 8]).astype("int")
df_data["is_autumn"] = df_data["month"].isin([9, 10, 11]).astype("int")
df_data = df_data.set_index('date_time')
df_data.head()

# <b>3 <span style='color:#F1C40F'>|</span> Data Visualization</b>

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px">
    <p style="padding: 8px;color:white;"><b>3.1 | Trend</b></p>
</div>

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
decomp = seasonal_decompose(df_data[:train.shape[0]]['target_benzene'], period=365, model='additive', extrapolate_trend='freq')
fig, ax = plt.subplots(ncols=2, nrows=2, sharex=True, figsize=(22,10))
ax[0,0].set_title('Observed values for Sales', fontsize=16)
decomp.observed.plot(ax = ax[0,0], legend=False, color='dodgerblue')

ax[0,1].set_title('Sales Trend', fontsize=16)
decomp.trend.plot(ax = ax[0,1],legend=False, color='dodgerblue')

ax[1,0].set_title('Sales Seasonality', fontsize=16)
decomp.seasonal.plot(ax = ax[1,0],legend=False, color='dodgerblue')

ax[1,1].set_title('Noise', fontsize=16)
decomp.resid.plot(ax = ax[1,1],legend=False, color='dodgerblue')

In [None]:
def adf_test(series, title=''):
    """
    Pass in a time series and an optional title, returns an ADF report
    """
    print('Augmented Dickey-Fuller Test: {}'.format(title))
    # .dropna() handles differenced data
    result = adfuller(series.dropna(),autolag='AIC') 
    
    labels = ['ADF test statistic','p-value','# lags used','# observations']
    out = pd.Series(result[0:4],index=labels)

    for key,val in result[4].items():
        out['critical value ({})'.format(key)]=val
        
    # .to_string() removes the line "dtype: float64"
    print(out.to_string())          
    
    if result[1] <= 0.05:
        print("Strong evidence against the null hypothesis")
        print("Reject the null hypothesis")
        print("Data has no unit root and is stationary")
    else:
        print("Weak evidence against the null hypothesis")
        print("Fail to reject the null hypothesis")
        print("Data has a unit root and is non-stationary")
        
def stationarity(target):
    y = df_data[target].copy()
    y.index = pd.to_datetime(y.index)
    y = y.resample('1H').mean()
    adf_test(y,title=target)

In [None]:
stationarity('target_benzene')
stationarity('target_carbon_monoxide')
stationarity('target_nitrogen_oxides')

In [None]:
def prediction(target):
    print('==================================')
    print(target)
    print('==================================')
    model = Prophet()
    model.add_regressor('deg_C')
    model.add_regressor('absolute_humidity')
    model.add_regressor('relative_humidity')
    model.add_regressor('sensor_1')
    model.add_regressor('sensor_2')
    model.add_regressor('sensor_3')
    model.add_regressor('sensor_4')
    model.add_regressor('sensor_5')
    
    train = df_data[df_data.index < datetime.datetime(2011,1,1)].reset_index()[['date_time',target,'deg_C','absolute_humidity',
                                                                             'relative_humidity','sensor_1','sensor_2','sensor_3','sensor_4','sensor_5']]
    train.columns = ['ds', 'y', 'deg_C','absolute_humidity', 'relative_humidity','sensor_1','sensor_2','sensor_3','sensor_4','sensor_5']                
    model.fit(train)

    x_valid = pd.DataFrame(df_data[df_data.index >= datetime.datetime(2011,1,1)].reset_index()[['date_time','deg_C','absolute_humidity',
                                                                             'relative_humidity','sensor_1','sensor_2','sensor_3','sensor_4','sensor_5']])
    x_valid.columns = ['ds', 'deg_C','absolute_humidity', 'relative_humidity','sensor_1','sensor_2','sensor_3','sensor_4','sensor_5']   
    y_pred = model.predict(x_valid)
    print('\n')
    return y_pred,train

In [None]:
benzene,trainB = prediction('target_benzene')
carbon,trainC = prediction('target_carbon_monoxide')
nitrogen, trainN = prediction('target_nitrogen_oxides')

In [None]:
fig = make_subplots(rows=1, cols=1, vertical_spacing=0.08,                    
                    subplot_titles=("Sales 365 - Day Moving Average"))
fig.add_trace(go.Scatter(x=trainB['ds'], y=trainB['y'],mode='lines',name='Benzene',marker=dict(color= '#334668')))
fig.add_trace(go.Scatter(x=benzene['ds'], y=benzene['yhat'], mode='lines', name='Prediction'))
fig.update_layout(height=350, bargap=0.15,
                  margin=dict(b=0,r=20,l=20), 
                  title_text="Benzene Prediction",
                  template="plotly_white",
                  title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                  font=dict(color='#8a8d93'),
                  hoverlabel=dict(bgcolor="#f2f2f2", font_size=13, font_family="Lato, sans-serif"),
                  showlegend=False)
fig.show()

In [None]:
fig = make_subplots(rows=1, cols=1, vertical_spacing=0.08,                    
                    subplot_titles=("Sales 365 - Day Moving Average"))
fig.add_trace(go.Scatter(x=trainC['ds'], y=trainC['y'],mode='lines',name='Benzene',marker=dict(color= '#334668')))
fig.add_trace(go.Scatter(x=carbon['ds'], y=carbon['yhat'], mode='lines', name='Prediction'))
fig.update_layout(height=350, bargap=0.15,
                  margin=dict(b=0,r=20,l=20), 
                  title_text="Benzene Prediction",
                  template="plotly_white",
                  title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                  font=dict(color='#8a8d93'),
                  hoverlabel=dict(bgcolor="#f2f2f2", font_size=13, font_family="Lato, sans-serif"),
                  showlegend=False)
fig.show()

In [None]:
fig = make_subplots(rows=1, cols=1, vertical_spacing=0.08,                    
                    subplot_titles=("Sales 365 - Day Moving Average"))
fig.add_trace(go.Scatter(x=trainN['ds'], y=trainN['y'],mode='lines',name='Benzene',marker=dict(color= '#334668')))
fig.add_trace(go.Scatter(x=nitrogen['ds'], y=nitrogen['yhat'], mode='lines', name='Prediction'))
fig.update_layout(height=350, bargap=0.15,
                  margin=dict(b=0,r=20,l=20), 
                  title_text="Benzene Prediction",
                  template="plotly_white",
                  title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                  font=dict(color='#8a8d93'),
                  hoverlabel=dict(bgcolor="#f2f2f2", font_size=13, font_family="Lato, sans-serif"),
                  showlegend=False)
fig.show()

In [None]:
y_submit = pd.DataFrame({'date_time':benzene.loc[1:,'ds'],'target_benzene':benzene['yhat'],'target_carbon_monoxide':carbon['yhat'],'target_nitrogen_oxides':nitrogen['yhat']})
y_submit = y_submit.drop(0,axis=0)
y_submit = y_submit.set_index('date_time')
y_submit.to_csv('./submission.csv')

In [None]:
y_submit