# Time Series Classification Hello World

My first attempt to perform feature engineering on some time time series and to build and evaluate a classifier based on these features. Want to compute features manually to understand how this works before using a package like [tsfresh](https://tsfresh.readthedocs.io/en/latest/#). 

Have used a number of resources:

In [4]:
import xgboost
import pandas as pd

In [6]:
print(f"xgboost: {xgboost.__version__}")
print(f"pandas: {pd.__version__}")

xgboost: 1.2.0
pandas: 0.25.3


Let's start with a univariate time series data set of the daily minimum temperatures measured in degrees Celsius from 1981-01-01 until 1990-12-31 ($3,650$ measurements in total). We want to learn to predict the minimum temperature on a day.

In [8]:
data = pd.read_csv("data/daily-min-temperatures.csv", header=0, index_col=0, parse_dates=True)

In [9]:
data.head()

Unnamed: 0_level_0,Temp
Date,Unnamed: 1_level_1
1981-01-01,20.7
1981-01-02,17.9
1981-01-03,18.8
1981-01-04,14.6
1981-01-05,15.8


In [17]:
print(f"Number of measurements: {len(data)}")
print(f"Earliest date: {data.index[0]}")
print(f"Latest date: {data.index[-1]}")

Number of measurements: 3650
Earliest date: 1981-01-01 00:00:00
Latest date: 1990-12-31 00:00:00


One way to create features would be to decompose the dates into date-time based features.

In [27]:
df_features_1 = pd.DataFrame()
df_features_1["Month"] = [data.index[i].month for i in range(len(data.index))]
df_features_1["Day"] = [data.index[i].day for i in range(len(data.index))]
df_features_1["Temp"] = [data.values[i][0] for i in range(len(data))]

In [28]:
df_features_1.sample(5)

Unnamed: 0,Month,Day,Temp
3066,5,27,10.2
1687,8,16,8.4
670,11,2,6.9
613,9,6,8.2
1391,10,23,18.2


To predict the temperate based on the month and day is probably not an effective approach. Ohter date-time based features might perform better for the problem at hand. For example, time of day or season.

Another way to create features is to use lag (predict the value a $t + 1$ based on the value(s) at previous points in time).

In [45]:
data_shifted_1 = pd.concat([data["Temp"].shift(1), data["Temp"]], axis=1)
data_shifted_1.columns = ["t-1", "t+1"]
data_shifted_1.head()

Unnamed: 0_level_0,t-1,t+1
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1981-01-01,,20.7
1981-01-02,20.7,17.9
1981-01-03,17.9,18.8
1981-01-04,18.8,14.6
1981-01-05,14.6,15.8
