# DSBA 22/23 HSE & University of London

# Practical assignment 1. DL in classification.

## General info
Release data: 26.09.2022

Soft deadline: 10.10.2022 23:59 MSK

Hard deadline: 13.10.2021 23:59 MSK

In this task, you are to build a NN for a binary classification task. We suggest using Google Colab for access to GPU. Competition invite link: https://www.kaggle.com/t/1917e22edb71437ca24d790ab1d57695

## Evaluation and fines

Each section has a defined "value" (in brackets near the section). Maximum grade for the task - 10 points, other points can be assigned to your tests.

**Your notebook with the best solution must be reproducible should be sent to the dropbox!** If the assessor cannot reproduce your results, you may be assigned score = 0, so make all your computations fixed!

**You can only use neural networks / linear / nearest neighbors models for this task - tree-based models are forbidden!**

All the parts must be done independently.

After the hard deadline is passed, the hometask is not accepted. If you send the hometask after the soft deadline, you will be excluded from competition among your mates and the homework will only be scored by the "Beating the baseline" part.

Feel free to ask questions both the teacher and your mates, but __do not copy the code or do it together__. "Similar" solutions are considered a plagiarism and all the involved students (the ones who gave & the ones who did) cannot get more than 0.01 points for the task. If you found a solution in some open source, you __must__ reference it in a special block at the end of your work (to exclude the suspicions in plagiarism).


## Format of handing over

The tasks are sent to the dropbox: https://www.dropbox.com/request/Y6TJouxNbm3r0RgcBL35. Don't forget to attach your name, surname & your group.


## 1. Model training

**Important!** Public Leaderboard contains only 33% of the test data. Your points will be measured wrt to the whole test set, therefore your position on the LB after the end of the competition may change.

* test_accuracy > weak baseline (public LB): 3 points

* test_accuracy > medium baseline (public LB): + 3 points

* test_accuracy > strong baseline (public LB): + 2 points

* You are among 25% most successful students (private LB): + 2 point

* You are among top-3 most successful students (private LB): + 1 point

* You are among top-2 most successful students (private LB): + 1 point

* You are among top-1 most successful students (private LB): + 1 point

In [1]:
!pip install torch



You should consider upgrading via the 'c:\users\egor\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.


In [2]:
# Your code here ╰( ͡° ͜ʖ ͡° )つ──☆*:

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader

In [5]:
from sklearn import preprocessing
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [6]:
from tensorflow import keras
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping
from keras import optimizers

# **Preprocessing data**

In [7]:
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
target_df = pd.read_csv('train_target.csv')
train_expected_target1 = pd.read_csv('train_expected_target_agent_1.csv')
train_expected_target2 = pd.read_csv('train_expected_target_agent_2.csv')
train_target_agent_1 = pd.read_csv('train_target_agent_1.csv')
train_target_agent_2 = pd.read_csv('train_target_agent_2.csv')

In [8]:
train_target_agent_1 = train_target_agent_1.rename(columns={"0": "expected_target1"})
train_target_agent_2 = train_target_agent_2.rename(columns={"0": "expected_target2"})

In [9]:
train_df = pd.concat([train_df, train_target_agent_1, train_target_agent_2], axis=1)

In [10]:
train_df.head()

Unnamed: 0,agent_1_feat_Possession%,agent_1_feat_Pass%,agent_1_feat_AerialsWon,agent_1_feat_Rating,agent_1_feat_XGrealiz,agent_1_feat_XGArealiz,agent_1_feat_PPDA,agent_1_feat_OPPDA,agent_1_feat_DC,agent_1_feat_ODC,...,agent_2_feattotal_xg_1,agent_2_feattotal_xg_mean_3,agent_2_feattotal_xg_mean,agent_2_featboth_scored_3,agent_2_featboth_scored_2,agent_2_featboth_scored_1,agent_2_featboth_scored_mean_3,agent_2_featboth_scored_mean,expected_target1,expected_target2
0,58.8,85.1,15.8,6.99,1.1437,0.928715,7.13,14.16,267.0,194.0,...,2.739439,2.739439,,0.473684,0.473684,0.473684,0.473684,,1,2
1,44.8,71.1,23.4,6.84,0.954159,0.97535,9.99,7.66,191.0,287.0,...,2.336756,2.336756,,0.578947,0.578947,0.578947,0.578947,,2,2
2,46.3,70.8,21.7,6.77,0.918434,1.118603,9.56,7.34,179.0,298.0,...,2.120322,2.120322,,0.368421,0.368421,0.368421,0.368421,,0,1
3,50.2,77.5,24.4,6.87,1.037613,0.956836,9.6,9.53,195.0,239.0,...,2.216415,2.216415,,0.210526,0.210526,0.210526,0.210526,,0,1
4,44.9,75.0,17.2,6.77,0.983691,0.948837,12.24,8.76,161.0,283.0,...,2.604025,2.604025,,0.421053,0.421053,0.421053,0.421053,,2,2


In [11]:
test_df.head()

Unnamed: 0,agent_1_feat_Possession%,agent_1_feat_Pass%,agent_1_feat_AerialsWon,agent_1_feat_Rating,agent_1_feat_XGrealiz,agent_1_feat_XGArealiz,agent_1_feat_PPDA,agent_1_feat_OPPDA,agent_1_feat_DC,agent_1_feat_ODC,...,agent_2_feattotal_xg_3,agent_2_feattotal_xg_2,agent_2_feattotal_xg_1,agent_2_feattotal_xg_mean_3,agent_2_feattotal_xg_mean,agent_2_featboth_scored_3,agent_2_featboth_scored_2,agent_2_featboth_scored_1,agent_2_featboth_scored_mean_3,agent_2_featboth_scored_mean
0,58.6,87.0,15.2,6.83,0.844742,1.165049,9.19,16.5,337.0,179.0,...,2.66187,1.893116,4.24136,2.932115,2.690442,1.0,0.0,1.0,0.666667,0.333333
1,50.7,81.3,14.2,6.65,0.743218,1.152593,10.31,13.63,311.0,208.0,...,3.550724,2.3737,4.19701,3.373811,3.075302,0.0,1.0,1.0,0.666667,0.625
2,47.3,81.4,17.7,6.73,0.954509,0.956938,14.21,11.82,207.0,270.0,...,2.693652,2.042668,0.966665,1.900995,3.007033,0.0,1.0,1.0,0.666667,0.555556
3,54.5,84.8,14.5,6.85,1.155612,1.049618,10.95,12.46,339.0,186.0,...,3.9381,1.466409,0.922046,2.108852,2.643923,1.0,0.0,0.0,0.333333,0.444444
4,51.3,81.8,16.4,6.81,1.199718,0.856327,11.27,11.52,193.0,293.0,...,3.358338,2.138405,1.872476,2.456406,3.113815,0.0,0.0,0.0,0.0,0.555556


In [12]:
train_df.shape

(2470, 236)

In [13]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2470 entries, 0 to 2469
Columns: 236 entries, agent_1_feat_Possession% to expected_target2
dtypes: float64(212), int64(24)
memory usage: 4.4 MB


In [14]:
target_df.drop('id', axis = 1, inplace = True)

In [15]:
train_df = pd.concat([target_df, train_df], axis = 1)

In [16]:
train_df

Unnamed: 0,category,agent_1_feat_Possession%,agent_1_feat_Pass%,agent_1_feat_AerialsWon,agent_1_feat_Rating,agent_1_feat_XGrealiz,agent_1_feat_XGArealiz,agent_1_feat_PPDA,agent_1_feat_OPPDA,agent_1_feat_DC,...,agent_2_feattotal_xg_1,agent_2_feattotal_xg_mean_3,agent_2_feattotal_xg_mean,agent_2_featboth_scored_3,agent_2_featboth_scored_2,agent_2_featboth_scored_1,agent_2_featboth_scored_mean_3,agent_2_featboth_scored_mean,expected_target1,expected_target2
0,1,58.8,85.1,15.8,6.99,1.143700,0.928715,7.13,14.16,267.0,...,2.739439,2.739439,,0.473684,0.473684,0.473684,0.473684,,1,2
1,1,44.8,71.1,23.4,6.84,0.954159,0.975350,9.99,7.66,191.0,...,2.336756,2.336756,,0.578947,0.578947,0.578947,0.578947,,2,2
2,0,46.3,70.8,21.7,6.77,0.918434,1.118603,9.56,7.34,179.0,...,2.120322,2.120322,,0.368421,0.368421,0.368421,0.368421,,0,1
3,0,50.2,77.5,24.4,6.87,1.037613,0.956836,9.60,9.53,195.0,...,2.216415,2.216415,,0.210526,0.210526,0.210526,0.210526,,0,1
4,1,44.9,75.0,17.2,6.77,0.983691,0.948837,12.24,8.76,161.0,...,2.604025,2.604025,,0.421053,0.421053,0.421053,0.421053,,2,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2465,1,41.6,76.0,17.1,6.62,1.046406,1.032989,18.00,8.27,138.0,...,3.684860,4.024907,3.872622,1.000000,0.000000,0.000000,0.333333,0.444444,1,2
2466,1,42.9,76.1,18.3,6.61,1.161802,1.066236,16.14,7.60,201.0,...,1.568175,2.000313,2.572016,0.000000,0.000000,0.000000,0.000000,0.444444,2,3
2467,0,41.0,72.2,19.1,6.51,1.000858,1.026472,15.99,7.99,164.0,...,3.871643,2.496854,2.555157,0.000000,0.000000,1.000000,0.333333,0.500000,0,5
2468,1,51.4,79.3,14.1,6.62,1.037986,1.161401,9.73,10.47,222.0,...,4.904164,2.977092,2.495116,1.000000,0.000000,0.000000,0.333333,0.222222,1,3


## Delete outliers

In [17]:
train_expected_target1 = train_expected_target1.rename(columns={"0": "train_expected_target1"})
train_expected_target2 = train_expected_target2.rename(columns={"0": "train_expected_target2"})
train_df = pd.concat([train_expected_target1, train_df], axis = 1)
train_df = pd.concat([train_expected_target2, train_df], axis = 1)
train_df.head()

Unnamed: 0,train_expected_target2,train_expected_target1,category,agent_1_feat_Possession%,agent_1_feat_Pass%,agent_1_feat_AerialsWon,agent_1_feat_Rating,agent_1_feat_XGrealiz,agent_1_feat_XGArealiz,agent_1_feat_PPDA,...,agent_2_feattotal_xg_1,agent_2_feattotal_xg_mean_3,agent_2_feattotal_xg_mean,agent_2_featboth_scored_3,agent_2_featboth_scored_2,agent_2_featboth_scored_1,agent_2_featboth_scored_mean_3,agent_2_featboth_scored_mean,expected_target1,expected_target2
0,0.278076,1.16635,1,58.8,85.1,15.8,6.99,1.1437,0.928715,7.13,...,2.739439,2.739439,,0.473684,0.473684,0.473684,0.473684,,1,2
1,0.613273,1.2783,1,44.8,71.1,23.4,6.84,0.954159,0.97535,9.99,...,2.336756,2.336756,,0.578947,0.578947,0.578947,0.578947,,2,2
2,1.11757,1.90067,0,46.3,70.8,21.7,6.77,0.918434,1.118603,9.56,...,2.120322,2.120322,,0.368421,0.368421,0.368421,0.368421,,0,1
3,0.909774,0.423368,0,50.2,77.5,24.4,6.87,1.037613,0.956836,9.6,...,2.216415,2.216415,,0.210526,0.210526,0.210526,0.210526,,0,1
4,0.991901,1.68343,1,44.9,75.0,17.2,6.77,0.983691,0.948837,12.24,...,2.604025,2.604025,,0.421053,0.421053,0.421053,0.421053,,2,2


In [18]:
print('Rows before deleting: ', train_df.shape[0])
train_df = train_df.drop(train_df[(train_df.train_expected_target2 > 0.9) &
                                  (train_df.train_expected_target1 > 0.9) &
                                  (train_df.category == 0)].index)
train_df.drop(['train_expected_target1', 'train_expected_target2'], axis = 1, inplace = True)
print('Rows after deleting: ', train_df.shape[0])

Rows before deleting:  2470
Rows after deleting:  2230


## Work with missing variables

In [19]:
print('Rows before deleting: ', train_df.shape[0])
train_df = train_df.dropna()  
print('Rows after deleting: ', train_df.shape[0])

Rows before deleting:  2230
Rows after deleting:  2103


# Split train and test dataset on two for each agent

In [20]:
test_df

Unnamed: 0,agent_1_feat_Possession%,agent_1_feat_Pass%,agent_1_feat_AerialsWon,agent_1_feat_Rating,agent_1_feat_XGrealiz,agent_1_feat_XGArealiz,agent_1_feat_PPDA,agent_1_feat_OPPDA,agent_1_feat_DC,agent_1_feat_ODC,...,agent_2_feattotal_xg_3,agent_2_feattotal_xg_2,agent_2_feattotal_xg_1,agent_2_feattotal_xg_mean_3,agent_2_feattotal_xg_mean,agent_2_featboth_scored_3,agent_2_featboth_scored_2,agent_2_featboth_scored_1,agent_2_featboth_scored_mean_3,agent_2_featboth_scored_mean
0,58.6,87.0,15.2,6.83,0.844742,1.165049,9.19,16.50,337.0,179.0,...,2.661870,1.893116,4.241360,2.932115,2.690442,1.0,0.0,1.0,0.666667,0.333333
1,50.7,81.3,14.2,6.65,0.743218,1.152593,10.31,13.63,311.0,208.0,...,3.550724,2.373700,4.197010,3.373811,3.075302,0.0,1.0,1.0,0.666667,0.625000
2,47.3,81.4,17.7,6.73,0.954509,0.956938,14.21,11.82,207.0,270.0,...,2.693652,2.042668,0.966665,1.900995,3.007033,0.0,1.0,1.0,0.666667,0.555556
3,54.5,84.8,14.5,6.85,1.155612,1.049618,10.95,12.46,339.0,186.0,...,3.938100,1.466409,0.922046,2.108852,2.643923,1.0,0.0,0.0,0.333333,0.444444
4,51.3,81.8,16.4,6.81,1.199718,0.856327,11.27,11.52,193.0,293.0,...,3.358338,2.138405,1.872476,2.456406,3.113815,0.0,0.0,0.0,0.000000,0.555556
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
565,50.8,80.3,16.1,6.72,0.814332,1.292407,13.35,8.32,220.4,304.0,...,2.369897,4.584170,3.757075,3.570381,3.240205,0.0,1.0,0.0,0.333333,0.666667
566,51.8,81.8,14.0,6.69,1.186944,0.769231,14.87,14.67,190.0,266.0,...,3.465800,2.800194,4.670804,3.645599,3.315186,0.0,1.0,0.0,0.333333,0.611111
567,63.1,84.9,15.1,7.05,0.877193,0.257069,9.05,22.23,380.0,197.6,...,1.745647,1.865882,3.505180,2.372236,2.343328,0.0,0.0,1.0,0.333333,0.277778
568,68.2,89.7,12.7,7.11,0.846805,0.401606,7.86,26.84,440.8,129.2,...,3.534070,0.828837,3.122840,2.495249,2.364201,1.0,0.0,1.0,0.666667,0.444444


In [21]:
agent1_test_df = test_df.filter(regex=("agent_1_.*"))
agent2_test_df = test_df.filter(regex=("agent_2_.*"))

In [22]:
train_df = train_df.reset_index(drop=True)
train_df

Unnamed: 0,category,agent_1_feat_Possession%,agent_1_feat_Pass%,agent_1_feat_AerialsWon,agent_1_feat_Rating,agent_1_feat_XGrealiz,agent_1_feat_XGArealiz,agent_1_feat_PPDA,agent_1_feat_OPPDA,agent_1_feat_DC,...,agent_2_feattotal_xg_1,agent_2_feattotal_xg_mean_3,agent_2_feattotal_xg_mean,agent_2_featboth_scored_3,agent_2_featboth_scored_2,agent_2_featboth_scored_1,agent_2_featboth_scored_mean_3,agent_2_featboth_scored_mean,expected_target1,expected_target2
0,0,44.0,70.3,25.1,6.79,0.711201,0.915529,10.74,9.43,218.0,...,1.608046,2.112304,1.608046,0.578947,0.578947,1.0,0.719298,1.000000,0,0
1,0,57.0,84.6,15.9,7.07,1.094698,0.938272,7.57,13.92,575.0,...,2.479335,2.214160,2.479335,0.526316,0.526316,1.0,0.684211,1.000000,0,1
2,1,48.1,76.9,17.7,6.74,0.994530,1.235052,9.77,8.24,175.0,...,1.712261,2.183093,1.712261,0.526316,0.526316,1.0,0.684211,1.000000,3,3
3,0,50.7,82.1,14.4,6.86,1.124694,0.875939,11.79,10.66,156.0,...,1.331644,2.260683,1.331644,0.368421,0.368421,0.0,0.245614,0.000000,3,0
4,1,46.7,74.6,22.9,6.85,0.942386,0.818815,11.21,8.03,217.0,...,2.884400,2.536700,2.884400,0.421053,0.421053,1.0,0.614035,1.000000,1,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2098,1,41.6,76.0,17.1,6.62,1.046406,1.032989,18.00,8.27,138.0,...,3.684860,4.024907,3.872622,1.000000,0.000000,0.0,0.333333,0.444444,1,2
2099,1,42.9,76.1,18.3,6.61,1.161802,1.066236,16.14,7.60,201.0,...,1.568175,2.000313,2.572016,0.000000,0.000000,0.0,0.000000,0.444444,2,3
2100,0,41.0,72.2,19.1,6.51,1.000858,1.026472,15.99,7.99,164.0,...,3.871643,2.496854,2.555157,0.000000,0.000000,1.0,0.333333,0.500000,0,5
2101,1,51.4,79.3,14.1,6.62,1.037986,1.161401,9.73,10.47,222.0,...,4.904164,2.977092,2.495116,1.000000,0.000000,0.0,0.333333,0.222222,1,3


In [23]:
Y = train_df['category']

In [24]:
agent1_train_df = train_df.filter(regex=("agent_1_.*"))
agent2_train_df = train_df.filter(regex=("agent_2_.*"))
agent1_train_df = pd.concat([agent1_train_df, train_df['expected_target1']], axis=1)
agent2_train_df = pd.concat([agent2_train_df, train_df['expected_target2']], axis=1)

## Split dataset on train and test

In [25]:
X1 = agent1_train_df.drop('expected_target1', axis=1)
Y1 = agent1_train_df['expected_target1']
X_train1, X_test1, y_train1, y_test1 = (X1.iloc[0:int(len(X1)*0.8)], 
                                    X1.iloc[int(len(X1)*0.8):len(X1)-1], 
                                    Y1.iloc[0:int(len(Y1)*0.8)], 
                                    Y1.iloc[int(len(Y1)*0.8):len(Y1)-1])

In [26]:
X_train1.shape, X_test1.shape, y_train1.shape, y_test1.shape

((1682, 117), (420, 117), (1682,), (420,))

In [27]:
X2 = agent2_train_df.drop('expected_target2', axis=1)
Y2 = agent2_train_df['expected_target2']
X_train2, X_test2, y_train2, y_test2 = (X2.iloc[0:int(len(X2)*0.8)], 
                                    X2.iloc[int(len(X2)*0.8):len(X2)-1], 
                                    Y2.iloc[0:int(len(Y2)*0.8)], 
                                    Y2.iloc[int(len(Y2)*0.8):len(Y2)-1])

In [28]:
X_train2.shape, X_test2.shape, y_train2.shape, y_test2.shape

((1682, 117), (420, 117), (1682,), (420,))

## Scale data

In [29]:
ss1 = StandardScaler()
X_train1 = ss1.fit_transform(X_train1)
X_test1 = ss1.transform(X_test1)
agent1_test_df = ss1.transform(agent1_test_df)

In [30]:
ss2 = StandardScaler()
X_train2 = ss2.fit_transform(X_train2)
X_test2 = ss2.transform(X_test2)
agent2_test_df = ss1.transform(agent2_test_df)

# **Model**

In [31]:
batch_size = [32, 64, 128, 256]
epochs = [50, 75, 150]
optimizer = ['SGD', 'Adam']
learning_rate = [0.001, 0.0001]
param_opt = dict(batch_size=batch_size, epochs=epochs, learning_rate=learning_rate, optimizer = optimizer)

In [32]:
def create_model(batch_size, epochs, learning_rate, optimizer, num_classes, input_shape):
    model = Sequential()
    model.add(Dense(128, activation="relu", input_shape=(input_shape, ))) # Hidden Layer 1

    model.add(Dense(64, activation="relu")) # Hidden Layer 2
    model.add(Dropout(0.2))

    model.add(Dense(64, activation="relu")) # Hidden Layer 3
    model.add(Dropout(0.2))

    model.add(Dense(32, activation="relu")) # Hidden Layer 4
    model.add(Dropout(0.2))
    
    model.add(Dense(num_classes, activation="softmax")) # Outout Layer
    
#     opt = optimizers.Adam(learning_rate=learning_rate)

    model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

    return model

In [33]:
# model_GridSearch = KerasClassifier(build_fn=create_model, verbose=0)
# grid = GridSearchCV(estimator=model_GridSearch, param_grid=param_opt, n_jobs=1, cv=3, verbose = 0)
# grid_result = grid.fit(X_train, y_train)

In [34]:
# print('Best parameters are: ')
# print('batch_size: ' + str(grid_result.best_params_['batch_size']))
# print('epochs: ' + str(grid_result.best_params_['epochs']))
# print('optimizer: ' + str(grid_result.best_params_['optimizer']))
# print('learning_rate: ' + str(grid_result.best_params_['learning_rate']))

In [35]:
# batch_size = grid_result.best_params_['batch_size']
# epochs = grid_result.best_params_['epochs']
# learning_rate = grid_result.best_params_['learning_rate']
# optimizer = grid_result.best_params_['optimizer']

# Model for Agent1

In [36]:
batch_size = 32
epochs = 50
learning_rate = 0.001
optimizer = 'Adam'

In [37]:
y_train1 = pd.get_dummies(y_train1)
y_test1 = pd.get_dummies(y_test1)

In [38]:
dif = len(y_train1.columns) - len(y_test1.columns)
if dif == 0:
    print('finish')
elif dif > 0:
    while dif != 0:
        y_test1[int(y_test1.columns[-1])+1] = 0
        dif -= 1
else:
    while dif != 0:
        y_train1[int(y_train1.columns[-1])+1] = 0
        dif += 1

In [39]:
y_train1

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,1,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0
2,0,0,0,1,0,0,0,0,0
3,0,0,0,1,0,0,0,0,0
4,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
1677,0,1,0,0,0,0,0,0,0
1678,0,1,0,0,0,0,0,0,0
1679,0,1,0,0,0,0,0,0,0
1680,0,1,0,0,0,0,0,0,0


In [40]:
y_test1

Unnamed: 0,0,1,2,3,4,5,6,7,8
1682,0,0,1,0,0,0,0,0,0
1683,1,0,0,0,0,0,0,0,0
1684,0,1,0,0,0,0,0,0,0
1685,0,0,1,0,0,0,0,0,0
1686,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
2097,0,0,1,0,0,0,0,0,0
2098,0,1,0,0,0,0,0,0,0
2099,0,0,1,0,0,0,0,0,0
2100,1,0,0,0,0,0,0,0,0


In [41]:
num_classes = agent1_train_df['expected_target1'].nunique()
model1 = create_model(batch_size, epochs, learning_rate, optimizer, num_classes, X_train1.shape[1])

In [42]:
history1 = model1.fit(X_train1, y_train1, batch_size = batch_size, epochs = epochs, shuffle = True)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [43]:
validation_loss, validation_accuracy = model1.evaluate(X_test1, y_test1, batch_size=batch_size)
print("Loss: "+ str(np.round(validation_loss, 3)))
print("Accuracy: "+ str(np.round(validation_accuracy, 3)))

Loss: 5.047
Accuracy: 0.298


# Model for agent2

In [44]:
batch_size = 32
epochs = 50
learning_rate = 0.001
optimizer = 'Adam'

In [45]:
y_train2 = pd.get_dummies(y_train2)
y_test2 = pd.get_dummies(y_test2)

In [46]:
dif = len(y_train2.columns) - len(y_test2.columns)
if dif == 0:
    print('finish')
elif dif > 0:
    while dif != 0:
        y_test2[int(y_test2.columns[-1])+1] = 0
        dif -= 1
else:
    while dif != 0:
        y_train2[int(y_train2.columns[-1])+1] = 0
        dif += 1

In [47]:
y_train2

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,1,0,0,0,0,0,0,0,0
1,0,1,0,0,0,0,0,0,0
2,0,0,0,1,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0
4,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
1677,0,0,1,0,0,0,0,0,0
1678,0,0,0,0,1,0,0,0,0
1679,1,0,0,0,0,0,0,0,0
1680,1,0,0,0,0,0,0,0,0


In [48]:
y_test2

Unnamed: 0,0,1,2,3,4,5,6,7,9
1682,0,1,0,0,0,0,0,0,0
1683,1,0,0,0,0,0,0,0,0
1684,1,0,0,0,0,0,0,0,0
1685,0,1,0,0,0,0,0,0,0
1686,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
2097,1,0,0,0,0,0,0,0,0
2098,0,0,1,0,0,0,0,0,0
2099,0,0,0,1,0,0,0,0,0
2100,0,0,0,0,0,1,0,0,0


In [49]:
num_classes = agent2_train_df['expected_target2'].nunique()
model2 = create_model(batch_size, epochs, learning_rate, optimizer, num_classes, X_train2.shape[1])

In [50]:
history2 = model2.fit(X_train2, y_train2, batch_size = batch_size, epochs = epochs, shuffle = True)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [51]:
validation_loss, validation_accuracy = model1.evaluate(X_test2, y_test2, batch_size=batch_size)
print("Loss: "+ str(np.round(validation_loss, 3)))
print("Accuracy: "+ str(np.round(validation_accuracy, 3)))

Loss: 5.536
Accuracy: 0.271


# Make a submission

In [52]:
Answer = np.round(model.predict(test_df), 0)
sample_submission = pd.read_csv('data/sample_submission.csv')
sample_submission['tmp'] = Answer
sample_submission.drop(['category'], axis = 1, inplace= True)
sample_submission = sample_submission.rename(columns={"tmp": "category"})
print(sample_submission.head())
sample_submission.to_csv('Answer.csv', index = False)

NameError: name 'model' is not defined