# About this Notebook
If anyone of you have read my previous kernels , you might know how much I love the EDA part , but it struck me that writing on one particular thing would not help me grow , so I have decided to explore untreaded territories to explore new things. For this competition people are mostly using Rapids and the tabular data . I have hardly seen any kernels using only images and both images and tabular data .

Few days ago I saw Abhishek's post on LinkedIn about Tabnet and I was really curious about it , I wanted to apply the idea here on Trends data but it had already been done and didn't give good results so I dropped it.

After watching Sebastian on Abhishek talks , I realized that Tabnet's potential isn't being fully utilized .

**This notebook presents a fully structured working pipeline for training n-folds Tabnet Regressor for this competition . This Notebook achieves 0.1620 without a lot of efforts and this notbook could beat Rapids SVM's and achieve the benchmark 0.1595 with some tweaks . I also explain the pros and cons of using Tabnets (although I don't find a lot cons 😜 )**

Here is the [link](https://arxiv.org/pdf/1908.07442.pdf) to Tabnet Paper

<font color='red'>If you like my efforts please leava an upvote .As I am not planning on doing this competition for now , if you all like my efforts I plan to release more public kernels on Tabnet with higher scores</font>

# Token of Gratitude

* For the part other than modelling I have used most of the code from this wonderful [kernel](https://www.kaggle.com/aerdem4/rapids-svm-on-trends-neuroimaging) by Ahmet , Thank you for writing it 
* A big thanks to team of Pytorch-Tabnet for writing such a beautiful implementations with so much functionalities . The repo can be found [here](https://github.com/dreamquark-ai/tabnet)
The documentation is very nicely written and Sebastien has also provided with example notebooks to help understand the model and usage better. Everything can be found at above mentioned repo

# Advantages of Tabnet

Tabnet gives us the following advantages :-
* The best thing which I found is Tabnet allows us to train a MULTIREGRESSOR and we don't to create separate models for every class

* It uses attention for selecting out the set of features to focus on for a given particular data point and we can even visualize that to see which parts get attention for a particular decision . We can also play with the number of features we want the Tabnet to focus to.
* It uses backprop for improving decisions and weights thus providing a greater control to us
* We can use the fine-tuning techniques that have worked for us and all the deep-learning concepts like LR annealing , Custom loss,etc
* The headache of feature selection is vanished as Tabnet does that on its own.
* It achieves SOTA results wothout any feature engg, finetuning with just  the defaults , wonder what it can do with sufficient feature engineering and finetuning

There are a lot of more advantages and ideas that I have for Tabnet which I plan to release in the future

If you want to learn more about Tabnet and it's inner workings please refer to this [video](https://www.youtube.com/watch?v=ysBaZO8YmX8)

In [None]:
!pip install pytorch-tabnet

In [None]:
# Preliminaries
import numpy as np
import pandas as pd 
import os
import random

#Visuals
import matplotlib.pyplot as plt
import seaborn as sns

#Torch and Tabnet
import torch
from pytorch_tabnet.tab_model import TabNetRegressor

#Sklearn only for splitting
from sklearn.model_selection import KFold

# Configuration

In [None]:
NUM_FOLDS = 7  # you can specify your folds here
seed = 2020   # seed for reproducible results

# Seed Everything

Seeding Everything for Reproducible Results

In [None]:
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

In [None]:
seed_everything(seed)

# Metric

Since Tabnet allows us to create a MULTIREGRESSOR , we don't have to create multiple models and loop through them . I have modified the metric to account for that

In [None]:
def metric(y_true, y_pred):
    
    overall_score = 0
    
    weights = [.3, .175, .175, .175, .175]
    
    for i,w in zip(range(y_true.shape[1]),weights):
        ind_score = np.mean(np.sum(np.abs(y_true[:,i] - y_pred[:,i]), axis=0)/np.sum(y_true[:,i], axis=0))
        overall_score += w*ind_score
    
    return overall_score

# Data Preparation

Mostly Taken from Ahmet's kernel

In [None]:
fnc_df = pd.read_csv("../input/trends-assessment-prediction/fnc.csv")
loading_df = pd.read_csv("../input/trends-assessment-prediction/loading.csv")

fnc_features, loading_features = list(fnc_df.columns[1:]), list(loading_df.columns[1:])
df = fnc_df.merge(loading_df, on="Id")
features = fnc_features + loading_features


labels_df = pd.read_csv("../input/trends-assessment-prediction/train_scores.csv")
target_features = list(labels_df.columns[1:])
labels_df["is_train"] = True


df = df.merge(labels_df, on="Id", how="left")

test_df = df[df["is_train"] != True].copy()
df = df[df["is_train"] == True].copy()

df.shape, test_df.shape

In [None]:
# Creating FOLDS

df = df.dropna().reset_index(drop=True)
df["kfold"] = -1

df = df.sample(frac=1,random_state=2020).reset_index(drop=True)

kf = KFold(n_splits=NUM_FOLDS)

for fold, (trn_, val_) in enumerate(kf.split(X=df, y=df)):
    df.loc[val_, 'kfold'] = fold

In [None]:
# Giving less importance to FNC features since they are easier to overfit due to high dimensionality.
FNC_SCALE = 1/500

df[fnc_features] *= FNC_SCALE
test_df[fnc_features] *= FNC_SCALE

# Model

In [None]:
model = TabNetRegressor(n_d=16,
                       n_a=16,
                       n_steps=4,
                       gamma=1.9,
                       n_independent=4,
                       n_shared=5,
                       seed=seed,
                       optimizer_fn = torch.optim.Adam,
                       scheduler_params = {"milestones": [150,250,300,350,400,450],'gamma':0.2},
                       scheduler_fn=torch.optim.lr_scheduler.MultiStepLR)

# Engine

In [None]:
y_test = np.zeros((test_df.shape[0],len(target_features), NUM_FOLDS))  #A 3D TENSOR FOR STORING RESULTS OF ALL FOLDS

In [None]:
def run(fold):
    df_train = df[df.kfold != fold]
    df_valid = df[df.kfold == fold]
    
    X_train = df_train[features].values
    Y_train = df_train[target_features].values
    
    X_valid = df_valid[features].values
    Y_valid = df_valid[target_features].values
    
    y_oof = np.zeros((df_valid.shape[0],len(target_features)))   # Out of folds validation
    
    print("--------Training Begining for fold {}-------------".format(fold+1))
     
    model.fit(X_train = X_train,
             y_train = Y_train,
             X_valid = X_valid,
             y_valid = Y_valid,
             max_epochs = 1000,
             patience =70)
              
    
    print("--------Validating For fold {}------------".format(fold+1))
    
    y_oof = model.predict(X_valid)
    y_test[:,:,fold] = model.predict(test_df[features].values)
    
    val_score = metric(Y_valid,y_oof)
    
    print("Validation score: {:<8.5f}".format(val_score))
    
    # VISUALIZTION
    plt.figure(figsize=(12,6))
    plt.plot(model.history['train']['loss'])
    plt.plot(model.history['valid']['loss'])
    
    #Plotting Metric
    #plt.plot([-x for x in model.history['train']['metric']])
    #plt.plot([-x for x in model.history['valid']['metric']])

### I am hiding the output of training please unhide the output to look at the results and Loss plots for any fold

In [None]:
run(fold=0)

In [None]:
run(fold=1)

In [None]:
run(fold=2)

In [None]:
run(fold=3)

In [None]:
run(fold=4)

In [None]:
run(fold=5)

In [None]:
run(fold=6)

# Creating Submission

In [None]:
y_test = y_test.mean(axis=-1) # Taking mean of all the fold predictions
test_df[target_features] = y_test

In [None]:
test_df = test_df[["Id", "age", "domain1_var1", "domain1_var2", "domain2_var1", "domain2_var2"]]

In [None]:
sub_df = pd.melt(test_df, id_vars=["Id"], value_name="Predicted")
sub_df["Id"] = sub_df["Id"].astype("str") + "_" +  sub_df["variable"].astype("str")

sub_df = sub_df.drop("variable", axis=1).sort_values("Id")
assert sub_df.shape[0] == test_df.shape[0]*5
sub_df.head(10)

In [None]:
sub_df.to_csv('submission.csv',index=False)

# End Notes:
* Tabnet allows us to have a greater control over training and predictions
* With Tabnet we can integrate Image and Tabular data with some ideas
* I have dropped the missing values in the targets and used raw data without any pre-processing/feature engineering ,etc
* I would be glad to see interesting results if someone fine tunes  it further