# Brief EDA with a basic Neural Network

This notebook does a short Exploratory Data Analysis on the given data and introduces a basic Artificial Neural Network which performance is finally evaluated.

In [None]:
# Import all the necessary modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score, precision_score, recall_score, mean_squared_error, multilabel_confusion_matrix
from keras.callbacks import Callback

import tensorflow as tf

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

from tensorflow.keras import layers
from tensorflow.keras import regularizers

from keras.models import Sequential
from keras.layers import Dense

# Disregard the warnings
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

## A brief Exploratory Data Analysis

In the following we briefly look at the properties of the dataset especially its columns, rows, checking for missing values and imbalance of the dataset.

In [None]:
df_original = pd.read_csv("data/Train_original.csv")
df = pd.read_csv("data/Train.csv")
df1 = pd.read_csv("data/Train_Dataset1.csv")
df2 = pd.read_csv("data/Train_Dataset2.csv")
df3 = pd.read_csv("data/Train_Dataset3.csv")
df4 = pd.read_csv("data/Train_Dataset4.csv")

In [None]:
df.columns

In [None]:
df.head()

In [None]:
df.describe

In [None]:
df.info

In [None]:
df.shape

In [None]:
df.isnull().sum()

In [None]:
sns.countplot(x="label", color = 'green', data=df)

In [None]:
df_original['date'] = df_original.to_datetime(df["date"])
df_original['month'] = df_original['date'].dt.month

Boxplots are created to get a better understanding of the relationship between bands and the respective labels which represent the crop types.

In [None]:
plt.figure(figsize = (20, 15))
plt.subplot(231)
sns.boxplot(x='label', y='B02', data=df_original, palette='viridis')

plt.subplot(232)
sns.boxplot(x='label', y='B03', data=df_original, palette='viridis')

plt.subplot(233)
sns.boxplot(x='label', y='B04', data=df_original, palette='viridis')

plt.subplot(234)
sns.boxplot(x='label', y='B08', data=df_original, palette='viridis')

plt.subplot(235)
sns.boxplot(x='label', y='B11', data=df_original, palette='viridis')

plt.subplot(236)
sns.boxplot(x='label', y='B12', data=df_original, palette='viridis')

In [None]:
#df=df[df.CLM != 255.0]
#df.date.max()
#df.date.min()
#df.field_id.unique()
#df['date'] = pd.to_datetime(df["date"])
#df['month'] = df['date'].dt.month
#vdf.CLM.value_counts()

## Modelling

* First the Features are saved in X and the target is saved as y. The feature values are used to predict the target. 
* Afterwards the dataset is split and scaled.

In [None]:
X = df[['B02_04', 'B02_05', 'B02_06', 'B02_07', 'B02_08', 'B02_09',
       'B02_10', 'B02_11', 'B03_04', 'B03_05', 'B03_06', 'B03_07', 'B03_08',
       'B03_09', 'B03_10', 'B03_11', 'B04_04', 'B04_05', 'B04_06', 'B04_07',
       'B04_08', 'B04_09', 'B04_10', 'B04_11', 'B08_04', 'B08_05', 'B08_06',
       'B08_07', 'B08_08', 'B08_09', 'B08_10', 'B08_11', 'B11_04', 'B11_05',
       'B11_06', 'B11_07', 'B11_08', 'B11_09', 'B11_10', 'B11_11', 'B12_04',
       'B12_05', 'B12_06', 'B12_07', 'B12_08', 'B12_09', 'B12_10', 'B12_11',
       'NDVI_04', 'NDVI_05', 'NDVI_06', 'NDVI_07', 'NDVI_08', 'NDVI_09',
       'NDVI_10', 'NDVI_11', 'WET_04', 'WET_05', 'WET_06', 'WET_07', 'WET_08',
       'WET_09', 'WET_10', 'WET_11', 'PVR_04', 'PVR_05', 'PVR_06', 'PVR_07',
       'PVR_08', 'PVR_09', 'PVR_10', 'PVR_11']]
y = df.label

In [None]:
# The loaded dataset is split into a train and test set. 
# One set is used to train the model and the other set to estimate respectively evaluate the performance of the model on new data.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=150, shuffle=True)

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In the following a Neural Network is created with regularization and multiple layers and nodes which can be altered based on computing power. The size of the model grows with the number of nodes, the depth with the number of layers and the arrangement of the layers and nodes constitutes the architecture of the network. An optional second Network is set up with additional dropout.

A node (aka neuron) is a computational unit with an input connection, a transfer function and an output connection. Nodes are then organized into layers which make up a network. A multiple-layer network is also called a Multilayer Perceptron.

The input layer has to have the right number of input features - which in this notebook obviously changes from dataset to dataset. It is specified when creating the first layer-shape e.g. here as default (72,) which means 72 input variables.

> Properties of the network:
* "Input" layer
* "Dense" layers: using the 'relu'-nonlinearity
* "Hidden" layers: which means that they are not directly connected to inputs or outputs
* "Output" layer: A layer of nodes that produce the output variables


When compiling the loss function must be specified, the optimizer which searches through different weights and the metric. In the frist Model cross entropy is used as the loss argument which is suitable for classification problems (as the one here in this notebook) and is defined in Keras as “binary_crossentropy“.

In [None]:
train_n = len(X_train)
# Batch: Sample(s) considered by the model within an epoch (before weights are updated).
batch_size = 100
# Epoch: One pass through all of the rows in the training dataset.
epochs = 100
STEPS_PER_EPOCH = train_n // batch_size

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(0.01, decay_steps=STEPS_PER_EPOCH*1000, decay_rate=1, staircase=False)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule, name='Adam')

In [None]:
inp = tf.keras.Input(shape=72,)
a = tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02))(inp)
b= tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02))(a)
c= tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02))(b)
d= tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02))(d)
out = tf.keras.layers.Dense(500, activation = 'softmax')(d)

model = tf.keras.Model(inp, out)
model.compile(optimizer = optimizer,loss = 'sparse_categorical_crossentropy', metrics = 'accuracy')
history = model.fit(X_train, y_train, validation_split = 0.25, batch_size = batch_size, epochs = epochs)

In [None]:
model_two = tf.keras.Sequential([
    tf.keras.layers.Dense(72, activation = 'relu', kernel_regularizer=regularizers.l2(0.02)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1000, activation = 'relu', kernel_regularizer=regularizers.l2(0.02)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(500, activation = 'softmax')
])

# Mean squared error is calculated as the average of the squared differences between the predicted and actual values.
model_two.compile(optimizer=optimizer, loss='mae', metrics=['mse'])
history_two = model.fit(X_train, y_train, validation_split = 0.25, batch_size = batch_size, epochs = epochs)

In [None]:
model.summary()

In [None]:
# model_two.summary()

## Model Evaluation

The Model's performance is evaluated: accuracy, loss and F1-score. The accuracy and the loss are as well plotted.

> The evaluate () function is used to generate a prediction for each input and output pair and collect scores, including the average loss and the chosen metric which is in this case accuracy. The function returns a list with two values. The first is the loss of the model on the dataset and the second is the accuracy of the model on the dataset. 

In [None]:
score = model.evaluate(X_test, y_test, verbose = 0) 
print('Test loss:', score[0]) 
print('Test accuracy:', score[1])

In [None]:
train_acc = model.evaluate(X_train, y_train, verbose=0)
test_acc = model.evaluate(X_test, y_test, verbose=0)

train_acc, test_acc

In [None]:
plt.figure(figsize = (10, 10))
plt.subplot(211)
plt.title('Loss Diagram')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.plot(range(epochs), history.history['loss'], label = 'Training loss')
plt.plot(range(epochs), history.history['val_loss'], label = 'Validation loss')
plt.legend()

plt.subplot(212)
plt.title('Accuracy Diagram')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Test')
plt.legend()

plt.show()

In [None]:
# Making predictions and getting the F1 score.

y_predict = np.argmax(model.predict(X_test), axis=-1)
f1_score(y_test, y_predict,average='micro')

## Further Analysis

In this further analysis  of the second model the values are first converted into Numpy arrays and then based on the second model the MSE is plotted.

In [None]:
X_train = X_train.values
X_test = X_test.values
y_train = y_train.values
y_test = y_test.values

In [None]:
history_two = {}
history_two = model_two.fit(X_train, y_train, validation_split=0.25, verbose=0, steps_per_epoch=STEPS_PER_EPOCH, epochs=epochs)

In [None]:
plt.plot(history_two.history['mse'])
plt.plot(history_two.history['val_mse'])
plt.title('Model MSE')
plt.ylabel('MSE')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'])
plt.show()