In [None]:
try:
    import tsai
except:
    !pip install tsai 

<hr style="border: solid 3px blue;">

# Introduction

![](https://upload.wikimedia.org/wikipedia/commons/9/95/Continuous_wavelet_transform.gif)

Picture Credit: https://upload.wikimedia.org

Last time, I trained and tested the time series with the InceptionTime model with [this notebook](https://www.kaggle.com/code/ohseokkim/tps-apr-let-s-go-over-the-wall-inceptiontime).

Here, we want to learn and test the time series using a model using wavelet decompositon and XceptionTimePlus model.

Data placed on the time axis, such as a time series, can be analyzed more clearly when viewed directly from another domain through appropriate transformation. Therefore, data can be processed by changing from the time axis to another axis through fourier and wavelet transformations. Here, we will use modeling that decomposes each component through wavelet transformation and processes it with XceptionTimePlus.

---------------------------------
# Checking Metrics

![](https://miro.medium.com/max/722/1*pk05QGzoWhCgRiiFbz-oKQ.png)

Picture Credit: https://miro.medium.com

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed target.

**What are ROC and AUROC**
> The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, recall or probability of detection. The false-positive rate is also known as probability of false alarm and can be calculated as (1 − specificity). It can also be thought of as a plot of the power as a function of the Type I Error of the decision rule (when the performance is calculated from just a sample of the population, it can be thought of as estimators of these quantities). The ROC curve is thus the sensitivity or recall as a function of fall-out. 

Ref: https://en.wikipedia.org/wiki/Receiver_operating_characteristic

-------------------------------------
# Setting Up

In [None]:
import numpy as np 
import pandas as pd 
from fastai.text.all import *

import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams

import warnings
warnings.filterwarnings(action='ignore')

import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

sns.set(style="ticks", context="talk")
plt.style.use("dark_background")

In [None]:
from tsai.all import *
computer_setup()

In [None]:
train = pd.read_csv('../input/tabular-playground-series-apr-2022/train.csv')
test = pd.read_csv('../input/tabular-playground-series-apr-2022/test.csv')
submission_df = pd.read_csv("../input/tabular-playground-series-apr-2022/sample_submission.csv")
labels_df = pd.read_csv("../input/tabular-playground-series-apr-2022/train_labels.csv")

----------------------------------------
# Checking Target Imbalance

In [None]:
labels_df.head().T.style.set_properties(**{'background-color': 'black',
                           'color': 'white',
                           'border-color': 'white'})

In [None]:
colors = ['gold', 'mediumturquoise']
labels = ['0','1']
values = labels_df['state'].value_counts()/labels_df['state'].shape[0]

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='percent', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.update_layout(
    title_text="Target Balance",
    title_font_color="white",
    legend_title_font_color="yellow",
    paper_bgcolor="black",
    plot_bgcolor='black',
    font_color="white",
)
fig.show()

<span style="color:Blue"> Observation:
* OK! Target is well-balanced.

------------------------------------
# Preprocessing

In [None]:
train.head().T.style.set_properties(**{'background-color': 'black',
                           'color': 'white',
                           'border-color': 'white'})

## Augmentation

In [None]:
features = train.columns.tolist()[3:]
def augument(df):
    for feature in features:
        df[feature + '_lag1'] = df.groupby('sequence')[feature].shift(1)
        df.fillna(0, inplace=True)
        df[feature + '_diff1'] = df[feature] - df[feature + '_lag1'] 
        df[feature + '_flip'] = df[feature] * -1.
        
augument(train)
augument(test)

## Scaling

In [None]:
features = train.columns.tolist()
sc = StandardScaler()
train[features] = sc.fit_transform(train[features])
test[features] = sc.transform(test[features])

Since it is a recording of the sensors for 60 seconds, we set the window to 60.

In [None]:
Window = 60

Let's change it to a 3D ndarray as shown below.
* Samples
* Variables
* Length (aka time or sequence steps)

In [None]:
y_train = labels_df['state'].to_numpy()
train = train.drop(["sequence", "subject", "step"], axis=1).to_numpy()
train = train.reshape(-1, Window, train.shape[-1])
train = np.transpose(train, (0, 2, 1))

test = test.drop(["sequence", "subject", "step"], axis=1).to_numpy()
test = test.reshape(-1, Window, test.shape[-1])
test = np.transpose(test, (0, 2, 1))

In [None]:
train.shape, y_train.shape

----------------------------------------
# Splitting Train/Valid Dataset

We make the valid dateset randomly with a size of 30% as shown below.

In [None]:
splits = get_splits(y_train, valid_size=.2, stratify=True, random_state=23, shuffle=True)
splits

-----------------------------------------------
# Making Pipeline and Dataloaders

In [None]:
%%time
bs = 64
tfms  = [None, [Categorize()]]
dsets = TSDatasets(train, y_train, tfms=tfms, splits=splits)
dls   = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[bs,bs*2])
dls.show_batch()

------------------------------
# Modeling


![](https://d3i71xaburhd42.cloudfront.net/3caeefc64c863b8573755c4109944836267d2c14/3-Figure2-1.png)

Picture Credit: https://d3i71xaburhd42.cloudfront.net



After wavelet decomposition as shown in the figure above, we decide to model it using XceptionTimePlus basic architecture model.

In [None]:
%%time
model = mWDNPlus(dls.vars, dls.c, dls.len, base_arch=XceptionTimePlus)
learn = Learner(dls,
                model,  
                metrics=[accuracy,RocAucBinary()],
                cbs = [EarlyStoppingCallback(monitor='accuracy',  min_delta=0.01, patience=3)]
               )

In [None]:
learn.model

------------------------------
# Training

In [None]:
%%time
with plt.rc_context({"figure.figsize": (4,4), "figure.dpi": (200)}):
    sr = learn.lr_find()
sr.valley

In [None]:
%%time
learn.fit_one_cycle(100, sr.valley)

In [None]:
with plt.rc_context({"figure.figsize": (4,4), "figure.dpi": (200)}):
    learn.recorder.plot_loss()

<span style="color:Blue"> Observation:
* It seems to have been early stopping at an appropriate time.

In [None]:
learn.plot_confusion_matrix(figsize=(3,3),dpi=200)

In [None]:
learn.show_probas(figsize=(5,5),dpi=300)

-------------------------------------------
# Checking Feature Importance

In [None]:
learn.feature_importance()

<span style="color:Blue"> Observation:
*  Through the picture above, it is possible to know which sensor data is important for model learning.

----------------------------------------
# Predicting

In [None]:
test_probas, test_targets, test_preds = learn.get_X_preds(test, with_decoded=True)

In [None]:
results = test_probas[:,1].tolist()

In [None]:
submission_df['state'] = results

In [None]:
submission_df.to_csv('submission.csv', index = False)

<hr style="border: solid 3px blue;">