Hi Kagglers!
I was wondering how am I supposed to aproach this month competition, what can we do what hasn't been done yet. I decided that this time I focus on creating model in Tensorflow with a deep understanding of workflow and what should be done to create robust model with hope of improving metrics. I would also try to create some nice plots in EDA part.

## 📡Import Libraries and Datasets

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

import missingno as no

from sklearn.preprocessing import LabelEncoder

In [None]:
sns.set_style('whitegrid')

In [None]:
train_df = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2021/train.csv")
test_df = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2021/test.csv")
sample_df = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2021/sample_submission.csv")

##  📊 Exploratory Data Analysis

Although this data set is similar to the previous one and we could start creating model strait away, it is a good practice to look at the properties of a dataset we are working on.

In [None]:
train_df.head()

In [None]:
no.matrix(train_df, figsize=(18,4));

There are no missing values in training and test datasets.

In [None]:
train_df.shape, test_df.shape

In [None]:
train_df.drop('id', axis=1).describe().T.style.bar(subset=['mean'], color=px.colors.qualitative.Pastel[4])\
                                        .background_gradient(subset=['std'], cmap='Greens')

### 🎯 Our target labels

In [None]:
plt.figure(figsize=(12,5))
sns.countplot(x=train_df['target'], palette='coolwarm')
plt.title("Distribution of target labels", fontdict={'fontsize':24})
plt.xlabel('Target', fontdict={'fontsize':16})
plt.ylabel('Count', fontdict={'fontsize':16});

In [None]:
class_ratio = 100 * train_df['target'].value_counts() / len(train_df)

plt.figure(figsize=(12,5))
sns.barplot(x=class_ratio.index,y=class_ratio.values, palette='coolwarm')
plt.title("Target labels Percentage in training dataset", fontdict={'fontsize':24})
plt.ylabel("Percentage %")
plt.xlabel("Target label");

As we can see, we have inbalance problem again where majority of it is in class 6 and class 8 so I will apply StratifiedKFold method to help me deal with it. Importance of knowing whether we have balanced or unbalanced target labels is also when it comes to evaluate the model performance. In this case more reliable metrics are f1, precision and recall instead of accuracy.

### Correlation between features and target.

In [None]:
lb = LabelEncoder()
train_df['num_target'] = lb.fit_transform(train_df['target'])

In [None]:
fig, ax = plt.subplots(figsize=(28,16))
corr_mat = train_df.drop(["id", 'target'], axis=1).corr()
mask = np.zeros_like(corr_mat, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

sns.heatmap(corr_mat, mask=mask, square=True, ax=ax, linewidths=0.1,center=0, cmap='coolwarm_r');

Let's see features correlation to the target.

In [None]:
fig = plt.figure(figsize=(18,5))
sns.barplot(y=corr_mat['num_target'].values[:-1],x=corr_mat['num_target'].index[:-1], palette='coolwarm')
plt.xticks(rotation=90);

From these plots we can tell that all features have weak correlation to the target column and the correlation is positive. Feature 20 has no correlation to the target column.

### Features distribution in training and test dataset.

In [None]:
feat_cols = [col for col in train_df.columns if col not in ['target','num_target','id']]

fig = plt.figure(figsize=(20,40))
for i,col in enumerate(feat_cols[:12]):
    temp_df = train_df[[col,'target']].groupby('target').mean()
    plt.subplot(25,3,i+1)
    sns.barplot(x=temp_df.index[:12],y=temp_df[col][:12], palette='coolwarm')
    plt.ylabel(f"feature_{i} mean")
    plt.tight_layout()

In [None]:
fig = plt.figure(figsize=(20,40))

for i, col in enumerate(feat_cols):
    plt.subplot(25,3, i+1)
    sns.kdeplot(train_df[col], fill=True, color='red')
    sns.kdeplot(test_df[col], fill=True, color='blue')
plt.tight_layout()

Training and test dataset distribution are virtually the same with lots of outliers. In normal case scenario we would have to deal with it, but in this case were the dataset is created syntheticaly using CGAN I found very little difference in performance of the model, therefore I will not deal with outliers in this notebook.Let's have a closer look into few features with different distribution and create some nice plots and have some fun with it.

In [None]:
# Create fig and gridspec
fig = plt.figure(figsize=(16,10),dpi=80)
grid = plt.GridSpec(4,4, hspace=0.5,wspace=0.2)

# Define the axes
ax_main = fig.add_subplot(grid[:-1,:-1])
ax_right = fig.add_subplot(grid[:-1,-1], xticklabels=[],yticklabels=[])
ax_bottom = fig.add_subplot(grid[-1,0:-1],xticklabels=[],yticklabels=[])

# Scatterplot on main ax
ax_main.scatter(x='feature_12', y='feature_39',c='num_target',data=train_df,alpha=.9,cmap="coolwarm")

# Boxplot on the right
ax_right.boxplot(x=train_df['feature_39'])

# boxplot on the bottom
ax_bottom.boxplot(x=train_df['feature_12'],vert=False, )

# Decorations
ax_main.set(title='Scatterplot with Boxplot \n feature_39 vs. feature_12', xlabel='feature_39', ylabel='feature_12');

In [None]:
sns.set_style('white')

In [None]:
df_agg = train_df.loc[:,['feature_39','target']].groupby('target')
vals = [df['feature_39'].values.tolist() for i,df in df_agg]
plt.figure(figsize=(16,9),dpi=80)
# create color list
colors = [plt.cm.coolwarm(i/float(len(vals)-1)) for i in range(len(vals))]
# plot histogram
n, bins, patches = plt.hist(vals,30,stacked=True,density=False,color=colors[:len(vals)])
plt.xlim(0,15)
# decorations
plt.legend({group:col for group,col in zip(np.unique(train_df['target']).tolist(),colors[:len(vals)])})
plt.title('Stacked histogram of fearure_39 colored by class', fontsize=22);

In [None]:
# Density Plot
plt.figure(figsize=(10,5), dpi= 80)
for i in range(len(train_df['target'].unique())):
    sns.kdeplot(train_df.loc[train_df['target'] == f'Class_{i}', "feature_39"], shade=False, color=colors[i], alpha=.3, fill=None)
plt.title("Density plot of feature_39", fontsize=22)
plt.legend({group:val for group,val in zip(train_df['target'].unique(),colors[:9])});

In [None]:
!pip install joypy
import joypy

In [None]:
plt.figure(figsize=(16,10),dpi=80)
fig, axes = joypy.joyplot(train_df, 
                          column=['feature_39','feature_10','feature_67'],
                          by='target',
                          figsize=(14,10),
                          legend=True,
                          color=['g','r','b'])
plt.title("Chosen features distribution per class",fontsize=22);

# It's time to create a base ANN (Artificial Neural Network) model. 🥁🎺

In this notebook I would like to create a robust model using Tensorflow. First I will create a base model and than I will try to imrove the model.There are few things to be done in order to make neural network working:
* All data needs to be numerical.
* data should be presented in tensors (tensorflow also works great with arrays)
* Scaled the data (a model performes much better ones a data is normalized)

To create a base model I will split data with train_test_split. Later, when we try to imrove our model performace and make our model more robust I will use one of the cross validation methods to split the data.

In [None]:
# Drop unwanted columns
train_df.drop(["id","target"], axis=1, inplace=True)
test_df.drop("id", axis=1, inplace=True)

In [None]:
train_df.shape, test_df.shape

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Split our data into X & y
X = train_df.iloc[:,:-1].values
y = train_df.iloc[:,-1].values

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.2,
                                                    random_state=42,
                                                    stratify=y)

# Normalize the data
sc = MinMaxScaler(feature_range=(0,1))
X_train_norm = sc.fit_transform(X_train)  # First we fit and transform train set and than transform test set to avoid data leakage
X_test_norm = sc.transform(X_test)

In [None]:
# Check the shape of our datasets
X_train_norm.shape, X_test_norm.shape

## Base ANN model

In [None]:
import tensorflow as tf
import tensorflow.keras.backend as K
print(tf.__version__)

In [None]:
# Create a base model
base_model = tf.keras.Sequential([
    tf.keras.layers.Dense(75, activation='relu'),
    tf.keras.layers.Dense(75, activation='relu'),
    tf.keras.layers.Dense(9, activation='softmax')  # we have multi-class classification problem
])

# Compile the base model
base_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # expect labels to provided as integers
                   optimizer=tf.keras.optimizers.Adam(lr=0.001),
                   metrics=["accuracy"])

# Fit the base model
base_history = base_model.fit(X_train_norm, 
                              y_train, 
                              epochs=20,
                              validation_data=(X_test_norm, y_test))

### Evaluate our base model

In [None]:
# Create a data frame
base_history_df = pd.DataFrame(base_history.history)

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
def plot_history(df, fold=1):
    fig, ax = plt.subplots(nrows=1,ncols=2, figsize=(18,5))
    
    fig.suptitle(f"FOLD={fold}")

    # First plot
    df[["loss","val_loss"]].plot(ax=ax[0])
    ax[0].set_xlabel("Epochs")
    ax[0].set_ylabel("Loss")
    ax[0].set_title("Training and Validation Loss")

    # Second plot
    df[["accuracy","val_accuracy"]].plot(ax=ax[1])
    ax[1].set_xlabel("Epochs")
    ax[1].set_ylabel("Accuracy")
    ax[1].set_title("Training and Validation Accuracy")
    
plot_history(base_history_df)

In [None]:
base_model.evaluate(X_train_norm, y_train)
base_model.evaluate(X_test_norm,y_test)

🔑 **Note:** As we can see on the first plot above, the training loss started with ~1.80 value and have gone down to ~1.72 and it looks like it could go a little bit more if we train for more epochs. On the other hand loss on test data have gone down from ~1.78 to ~1.77 than sort of leveling and as we increase number of epochs loss is going up. Looks like the base model is overfitting. We will have to apply regularization to tackle this issue. Accuracy also increase on training data but is decreasing on test data as we can see on second plot above


To evaluate classification model we can use other metrics as:
1. Precision (Specificity) - is the ratio of True Positives to all positives predicted by a model(low precision: the more false positive model predicts, the lower the precision).
2. Recall (also known as Sensitivity) - is the ratio of True Positives to all positives in your data
3. F1-score - in case we want to find ideal blend of precision and recall

Alongside visualizing our model results as much as possible, there are handfull evaluation methods we should be familiar with. To main ones we can include:
* Confusion matrix
* Classification report
* Receiver Operating Characteristic (ROC) curve

Let's make predictions and try these methods to evaluate our model. We have to remember that our predictions array come out in prediction probability form... to standard output form the sigmoid or softmax activation function.

In [None]:
# Make predictions
base_y_pred = base_model.predict(X_test_norm)
# Convert all of the prediction probabilities into integers
base_y_pred_int = base_y_pred.argmax(axis=1)
base_y_pred_int[:10]

### Classification report



In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, base_y_pred_int))

### Confusion matrix

In [None]:
classes_names = {l:i for (i,l) in enumerate(lb.classes_)}

In [None]:
import itertools
from sklearn.metrics import confusion_matrix

def make_confusion_matrix(y_true, y_preds, classes=None, figsize=(15,15),text_size=15):
    """
    Plots confusion matrix for given true labels and model predictions.
    """
    # Create the confusion matrix
    cm = confusion_matrix(y_true, y_preds)
    cm_norm = cm.astype("float") / cm.sum(axis=1)[:,np.newaxis] # normalize out confusion matrix
    n_classes = cm.shape[0]
    
    # Create a matrix plot
    fig, ax = plt.subplots(figsize=figsize)
    cax = ax.matshow(cm, cmap=plt.cm.Blues)
    fig.colorbar(cax)
    
    # Set labels to be classes
    if classes:
        labels=classes
    else:
        labels=np.arange(cm.shape[0])
        
    # Label the axes
    ax.set(title="Confusion matrix",
           xlabel="Predicted labels",
           ylabel="True labels",
           xticks=np.arange(n_classes),
           yticks=np.arange(n_classes),
           xticklabels=labels,
           yticklabels=labels)
    
    # Set x-axis labels to bottom
    ax.xaxis.set_label_position("bottom")
    ax.xaxis.tick_bottom()
    
    # Adjust label size
    ax.yaxis.label.set_size(text_size)
    ax.xaxis.label.set_size(text_size)
    ax.title.set_size(text_size)
    
    # Set threshold to different colors
    threshold = (cm.max() + cm.min()) / 2.
    
    # Plot the text on each cell
    for i,j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, f"{cm[i,j]} ({cm_norm[i,j]*100:.1f})",
        horizontalalignment='center',
        color='white' if cm[i,j] > threshold else 'black',
        size=text_size)

In [None]:
confusion_matrix(y_test, base_y_pred_int)

In [None]:
# Make a confusion matrix plot
make_confusion_matrix(y_true=y_test, 
                      y_preds=base_y_pred_int, 
                      classes=classes_names.keys(), 
                      figsize=(18,15),
                      text_size=15)

### AUC-ROC curve

🔑 **Note:** AUC ROC Curve is a performance measurement for classification problem with various thresholds settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between the classes.

In [None]:
from sklearn.metrics import plot_roc_curve, auc, roc_curve

In [None]:
n_classes = len(classes_names)

def roc_auc_plot(y_true, y_preds, n_classes):
    """
    Compute ROC Curve and ROC Area for each class than create a plot.
    """
    # Compute roc and auc for each class
    fpr = dict()
    tpr = dict()
    roc_auc = dict()
    y_true = tf.one_hot(y_true, depth=n_classes)
    for i in range(n_classes):
        fpr[i], tpr[i], _ = roc_curve(y_true[:,i],y_preds[:,i])
        roc_auc[i] = auc(fpr[i],tpr[i])
        
    # Plot Roc curve
    linestyles = ['-', '--', '-.', ':','-', '--', '-.', ':','-']

    plt.figure(figsize=(12,10))
    for i in range(n_classes):
        plt.plot(fpr[i], 
                 tpr[i], 
                 label='ROC curve of class {0} (area={1:0.2f})'.format(i+1, roc_auc[i]),
                 linestyle=linestyles[i])
        
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver operating characteristic to multi-class')
    plt.legend(loc="lower right")
    plt.show()

In [None]:
roc_auc_plot(y_test, base_y_pred, n_classes)

## How we can improve our model?

To improve our model we can:
* Increase hidden layers
* Add more neurons in hidden layer 
* Change non-linear activation function
* Find the ideal learning rate
* Weight initialization
* Change optimizer
* Cross-Validate our data
* Normalizing/Scaling data
* Batch Normalization

To avoid overfitting we can:
* use dropout method
* set early stopping

It's a good idea to change one thing at a time and see if our model improves. I'm going to create a function where I will try these techniques to build a robust model.

In [None]:
def ann_model(X_train, X_test, y_train, y_true):
    
    # Create Early Stopping
    early_stop = tf.keras.callbacks.EarlyStopping(
        monitor='val_accuracy',patience=4,min_delta=0,verbose=1,
        mode='max',baseline=0,restore_best_weights=True)
    
    # Create a model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(75,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(100,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(100,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(75, activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(9, activation='softmax')  # We have multi-class classification problem
    ])
    
    # Compile the model
    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # expect labels to be provide as integer
                  optimizer=tf.keras.optimizers.Adam(lr=0.001),
                  metrics=['accuracy']
                 )
    
    # Fit the model
    history = model.fit(X_train, 
                        y_train, 
                        epochs=60,
                        validation_data=(X_test, y_true),
                        callbacks=[early_stop]) 
    
    model.evaluate(X_test,y_true)
    
    return history, model

In [None]:
tf.random.set_seed(42)
history, model_ann = ann_model(X_train=X_train_norm,
                     X_test=X_test_norm,
                     y_train=y_train,
                     y_true=y_test)

In [None]:
history_df = pd.DataFrame(history.history)
plot_history(history_df)

🔑**Note:** Looks like we can increase number of epochs as val_loss is still lover than training set and accuracy of training and test set stays on the same level (in this case they are beautifully interwine together).

In [None]:
def find_ideal_lr(X_train, X_test, y_train, y_test):
    
    # Create the learning rate callback to find ideal learning rate
    lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10**(epoch/20))
    
    # Create a model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(75,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(100,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(100,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(75, activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(9, activation='softmax')  # We have multi-class classification problem
    ])
    
    # Compile the model
    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # expect labels to be provide as integer
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=['accuracy']
                 )
    
    # Fit the model
    history = model.fit(X_train, 
                        y_train, 
                        epochs=60,
                        validation_data=(X_test, y_test),
                        callbacks=[lr_scheduler])
    
    # Evaluate the model
    model.evaluate(X_test,y_test)
    
    return history, model

In [None]:
tf.random.set_seed(42)
history, _ = find_ideal_lr(X_train=X_train_norm,
                           X_test=X_test_norm,
                           y_train=y_train,
                           y_test=y_test)

In [None]:
find_lr_history_df = pd.DataFrame(history.history)
plot_history(find_lr_history_df)

In [None]:
# Plot the learning rate decay curve
lrs = 1e-4 * (10**(tf.range(60)/20))
plt.figure(figsize=(12,7))
plt.semilogx(lrs, history.history['loss'])
plt.axvline(x=1e-2, linestyle='--', color='red')
plt.xlabel("Learning rate")
plt.ylabel("Loss")
plt.title("Finding the ideal learning rate")
plt.grid()

In [None]:
def ann_model(X_train, X_test, y_train, y_test):
    
    # Create Early Stopping
    early_stop = tf.keras.callbacks.EarlyStopping(
        monitor='val_accuracy',patience=4,min_delta=0,verbose=1,
        mode='max',baseline=0,restore_best_weights=True)
    
    # Create a model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(75,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(100,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(100,activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(75, activation='sigmoid'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(9, activation='softmax')  # We have multi-class classification problem
    ])
    
    # Compile the model
    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # expect labels to be provide as integer
                  optimizer=tf.keras.optimizers.Adam(lr=0.01),  # ideal learning rate
                  metrics=['accuracy']
                 )
    
    # Fit the model
    history = model.fit(X_train, 
                        y_train, 
                        epochs=60,
                        validation_data=(X_test, y_test),
                        callbacks=[early_stop]) 
    
    # Evaluate the model
    model.evaluate(X_test,y_test)
    
    return history, model

tf.random.set_seed(42)
history, ann = ann_model(X_train=X_train_norm,
                         X_test=X_test_norm,
                         y_train=y_train,
                         y_test=y_test)

In [None]:
history_df = pd.DataFrame(history.history)
plot_history(history_df)

As we can see our model has improved with less epochs,means using less comutation time. Now, I will try weights initialization to find out if this could improve my model. I'm also going to change metric which we're going to monitor as accuracy is not our goal, hence by observing accuracy we might not necessary get the best loss.

In [None]:
SEED=45
EPOCHS=100
BATCH_SIZE=512
N_FOLDS=10
N_CLASS=9

In [None]:
def custom_metric(y_true,y_pred):
    cce = tf.keras.losses.SparseCategoricalCrossentropy()
    y_pred = K.clip(y_pred, 1e-15, 1-1e-15)
    loss = K.mean(cce(y_true, y_pred))
    return loss

In [None]:
def ann_model_2(X_train, X_test, y_train, y_test):
    
    # Create Early Stopping
    es = tf.keras.callbacks.EarlyStopping(
         monitor='val_custom_metric',patience=42,min_delta=0.0001,verbose=1,
         mode='min',baseline=0,restore_best_weights=False)
    
    # Create weights initializer
    weights_initializer = tf.keras.initializers.GlorotUniform(seed=SEED)
    
    # Create plateau
    plateau = tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',factor=0.04, patience=3,verbose=1,mode='min',cooldown=1)
    
    # Create a model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(75,activation='sigmoid', kernel_initializer=weights_initializer),
        tf.keras.layers.Dropout(0.2),
        # tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(100,activation='sigmoid', kernel_initializer=weights_initializer),
        tf.keras.layers.Dropout(0.2),
        # tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(100,activation='sigmoid', kernel_initializer=weights_initializer),
        tf.keras.layers.Dropout(0.2),
        # tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(75, activation='sigmoid', kernel_initializer=weights_initializer),
        tf.keras.layers.Dropout(0.2),
        # tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(9, activation='softmax'), # We have multi-class classification problem
    ])
    
    # Compile the model
    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), # expect labels to be provide as integer
                  optimizer=tf.keras.optimizers.Adam(lr=0.01),  # ideal learning rate
                  metrics=["accuracy",custom_metric]
                 )
    
    # Fit the model
    history = model.fit(X_train, 
                        y_train, 
                        epochs=EPOCHS,
                        batch_size=BATCH_SIZE,
                        validation_data=(X_test, y_test),
                        callbacks=[plateau,es],
                        verbose=1) 
    
    # Evaluate the model
    # model.evaluate(X_test,y_test)
    
    return history, model

In [None]:
tf.random.set_seed(42)
history, ann = ann_model_2(X_train=X_train_norm, # use X_train when using BatchNormalization
                           X_test=X_test_norm,
                           y_train=y_train,
                           y_test=y_test)

In [None]:
history_df = pd.DataFrame(history.history)
plot_history(history_df)

## Submision

In [None]:
# Scale our test data first
test_df_norm = sc.transform(test_df) # we only need to transform the data as we already trained our scaler
# Make predictions
ann_y_preds = ann.predict(test_df_norm)
# Paste the prediction for each class into sample df
for i,col in enumerate(classes_names):
    sample_df[col] = ann_y_preds[:,i]
    
sample_df.to_csv("submission7.csv", index=False)

In [None]:
K.clear_session()

In [None]:
from sklearn.model_selection import StratifiedKFold
train_df["kfold"] = -1
skf = StratifiedKFold(n_splits=N_FOLDS, shuffle=True, random_state=SEED)

for fold, (tr_idx, ts_idx) in enumerate(skf.split(X=train_df, y=train_df['num_target'])):
    train_df.loc[ts_idx,"kfold"] = fold

In [None]:
train_df["kfold"].value_counts()

In [None]:
def ann_model_kfolds(train, test):
    
    """
    Function performs cross-validation using StratifyKFolds.
    """  
    
    # Create place holders for our predictions
    oof_train = np.zeros(shape=(train_df.shape[0], N_CLASS))
    oof_preds = np.zeros(shape=(test_df.shape[0], N_CLASS))
    history_dict = {}
    test_fold_preds = {}
    folds_acc = []
    folds_loss = []
    
    for fold in range(N_FOLDS):
        print(f"=========FOLD_{fold+1}=========")
        t_df = train[train.kfold !=fold].reset_index(drop=True)
        v_df = train[train.kfold ==fold].reset_index(drop=True)
        
        # Split into training and testing set
        xtrain = t_df.drop(["num_target",'kfold'], axis=1).values
        xvalid = v_df.drop(["num_target","kfold"],axis=1).values
        ytrain = t_df["num_target"].values
        yvalid = v_df["num_target"].values
        
        # Normalize datasets
        sc = MinMaxScaler()
        xtrain_norm = sc.fit_transform(xtrain)
        xvalid_norm = sc.transform(xvalid)
        test_norm = sc.transform(test.values)
        
        # Time for our model
        history, ann_model = ann_model_2(X_train=xtrain_norm,
                                         X_test=xvalid_norm,
                                         y_train=ytrain,
                                         y_test=yvalid)
        
        # Save history for a model in specific fold
        history_dict[f"Fold_{fold+1}"] = history
        
        # Make predictions for our model in a fold split 
        fold_y_preds = ann_model.predict(xvalid_norm)
        fold_y_pred_test = ann_model.predict(test_norm)
        
        # Evaluate our model
        model_eval = ann_model.evaluate(xvalid_norm, yvalid)
        
        # Print our our results
        print(f"Fold_{fold+1} Validation Accuracy={model_eval[1]}")
        print(f"Fold_{fold+1} Validation Loss={model_eval[0]}")
        print("\n")
        
        # Save our predictions
        folds_acc.append(model_eval[1])
        folds_loss.append(model_eval[0])
        
        oof_train[v_df.index] = fold_y_preds
        oof_preds += fold_y_pred_test
        test_fold_preds[f"fold_{fold+1}"] = oof_preds
        
    return folds_acc, folds_loss, history_dict, oof_train, oof_preds, test_fold_preds

In [None]:
        
acc, loss, folds_history, train_pred, test_pred, test_fold_dict = ann_model_kfolds(train_df, test_df)
print("\n")
print("============Final Models Metrics==============")
print(f"Mean Accuracy after {N_FOLDS}_folds: {np.mean(acc):.2f}%")
print(f"Mean Loss after {N_FOLDS}_folds: {np.mean(loss)}")

In [None]:
for i in range(1,N_FOLDS):
    temp_df = pd.DataFrame(folds_history[f"Fold_{i}"].history)
    plot_history(temp_df,i)

## Submit our n_folds model

In [None]:
test_pred = np.clip((test_pred / N_FOLDS), 1e-15, 1-1e-15)
sub_id_df = pd.DataFrame(sample_df['id'], columns=['id'])
sub_df = pd.DataFrame(test_pred, columns=lb.classes_)
sub_concat_df = pd.concat([sub_id_df,sub_df], axis=1)
sub_concat_df.to_csv(f"sub_({N_FOLDS})_folds.csv", index=False)

In [None]:
K.clear_session()

## Different aproach

In [None]:
cat_tr_df = train_df.drop(['num_target','kfold'], axis=1).astype('category')
cat_ts_df = test_df.astype('category')
cat_tr_df['train'] = 1
cat_ts_df['train'] = 0

tr_ts_df = pd.concat([cat_tr_df, cat_ts_df])
dummy_tr_ts_df = pd.get_dummies(tr_ts_df, drop_first=True)

dummy_tr_df = dummy_tr_ts_df[dummy_tr_ts_df['train']==1]
dummy_ts_df = dummy_tr_ts_df[dummy_tr_ts_df['train']==0]

dummy_tr_df = pd.concat([dummy_tr_df,train_df[['num_target','kfold']]], axis=1)

In [None]:
N_COMPONENTS = 75

from sklearn.decomposition import PCA, SparsePCA

In [None]:
pca = PCA(n_components=N_COMPONENTS, random_state=SEED).fit(dummy_tr_df.drop(['num_target','kfold'],axis=1))
#sparse_pca = SparsePCA(n_components=N_COMPONENTS, random_state=SEED).fit(dummy_tr_df.drop(['num_target','kfold'],axis=1))

In [None]:
# Create training and test datasets with features from PCA
pca_tr_df = pd.DataFrame(pca.transform(dummy_tr_df.drop(['num_target','kfold'], axis=1)),
                         columns=[f"feature_{i}" for i in range(N_COMPONENTS)])

pca_ts_df = pd.DataFrame(pca.transform(dummy_ts_df),
                         columns = [f"feature_{i}" for i in range(N_COMPONENTS)])

# Create training and test dataset with features from SparsePCA
#spca_tr_df = pd.DataFrame(sparse_pca.transform(dummy_tr_df.drop(['num_target','kfold'], axis=1)),
                          #columns=[f"feature_{i}" for i in range(N_COMPONENTS)])

#spca_ts_df = pd.DataFrame(sparse_pca.transform(dummy_ts_df)),
                          #columns=[f"feature_{i}" for i in range(N_COMPONENTS)])

In [None]:
pca_tr_df['kfold'] = -1
pca_tr_df['num_target'] = train_df['num_target'].values
skf = StratifiedKFold(n_splits=N_FOLDS, shuffle=True, random_state=SEED)
for fold, (tr_idx, ts_idx) in enumerate(skf.split(X=pca_tr_df, y=pca_tr_df['num_target'])):
    pca_tr_df.loc[ts_idx,'kfold'] = fold

In [None]:
pca_tr_df.shape, pca_ts_df.shape

In [None]:
acc, loss, folds_history, train_pred, test_pred, test_fold_dict = ann_model_kfolds(pca_tr_df, 
                                                                                   pca_ts_df)
print("\n")
print("============Final Metrics for a new aproach ==============")
print(f"Mean Accuracy after {N_FOLDS}_folds: {np.mean(acc):.2f}%")
print(f"Mean Loss after {N_FOLDS}_folds: {np.mean(loss)}")

In [None]:
test_pred = test_pred / N_FOLDS
sub_id_df = pd.DataFrame(sample_df['id'], columns=['id'])
sub_df = pd.DataFrame(test_pred, columns=lb.classes_)
sub_concat_df = pd.concat([sub_id_df,sub_df], axis=1)
sub_concat_df.to_csv(f"sub_({N_FOLDS})_folds_dummy_and_pca.csv", index=False)