# Introduction

I am using “glass_data.csv” dataset in order to perform the following analysis with the use of neural network:

For the purpose of performing an initial analysis of the data(EDA), In order to implement the necessary data preparation, For creating and executing a neural network that will produce a classification depending on the Type of glass(class attribute) feature, For testing and improving the model using various configurations of neurons/layers/loss functions/activation functions, For creating a classification using test data with  the use of final neural network configuration.

# Exploratory Data Analysis(EDA)

The process used to examine or understand the data and extract insights or main characteristics of data is known as EDA. It is classified into two methods which are graphical analysis and non -graphical analysis. Histograms, Box plots, Scatter plot and others are all used for plotting in EDA. Exploration of data mostly takes a lot of time. It is possible to define the problem statement or most importantly define our data set using the process of EDA.

Exploratory Data Analysis being the first step contributes to gaining an insight into a data set, understanding the underlying structure, extracting important parameters and the relationships present between them and also test underlying assumptions.

##### The main purpose of EDA is 
Assessing the data distribution

Managing missing values of the dataset (it is a very common issue with most datasets)

Eliminating duplicate data

Managing outliers

Encoding the categorical variables

Normalizing and scaling


## Step 1

###  DATA PREPARATION

#### Importing Libraries 

Initially all the necessary python libraries will be imported.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
from collections import Counter
from IPython.core.display import display
from keras.utils import to_categorical
from keras.layers.core import Activation, Flatten, Dropout
from keras.layers import BatchNormalization, Input
import plotly.express as px
from sklearn.metrics import confusion_matrix
from keras.wrappers.scikit_learn import KerasClassifier
import keras
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential # Neural network library
from keras.layers import Dense # layer library
from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedKFold
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D
from sklearn.model_selection import train_test_split
#from sklearn.preprocessing import StandardScaler
sns.set_style('darkgrid')

## Step 2

#### Reading Data

A very important step in EDA is loading the data into the pandas data frame. The pandas library function of read_csv() is used to read the csv file.

In [None]:
dataset = pd.read_csv('Database/glass_data.csv')

The first five observations from the data set are returned using “.head()” function of the pandas library.

In [None]:
dataset.head()

#### Load the dataset and assign column names
As we observe that the columns don’t have names we load the dataset and assign the column names.

In [None]:
dataset = pd.read_csv('Database/glass_data.csv', names=['Id_number', 'RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type_of_glass' ])

The first five observations from the data set are returned using “.head()” function of the pandas library.

In [None]:
dataset.head()

The last five observations from the data set are returned using “.tail()” function of the pandas library.

In [None]:
dataset.tail()

## Step 3

#### Observe the dimensions

The dimensions of the data i.e. total number of rows and columns can be observed using shape.

In [None]:
dataset.shape

## Step 4

#### Checking the data types

The info() function is used to check the types of data and find columns it contains, its types, whether it contains any value in it or not.

In [None]:
print('\n',"_"*110,'\n')
print(dataset.info())
print('\n',"_"*110,'\n')

After inspecting the above data, it can be concluded that the data consists of 9 floats,2 integer values and that all column variables are non-null which means no empty or missing value.

## Step 5

The describe() method is used provides the count, mean, standard deviation, minimum and maximum values and the quantities of the data.

In [None]:
dataset.describe().T

## Step 6

#### Handling missing values

The missing values in the dataset needs to be handled.  As it turns out there are no missing values in this dataset but that is not the case in real world.

We need to check the presence of any null values and print them if present.

In [None]:
print('\n',"_"*110,'\n')
print(dataset.isnull().sum())
print('\n',"_"*110,'\n')
print('Number of Null Values: ',dataset.isnull().sum().sum())
print('\n',"_"*110)

Now we can observe that there are no null values in our dataset

## Step 7

In order to check if there are any wrong type or missing type we need to group by type of glass

In [None]:
dataset.groupby(['Type_of_glass']).mean()

Using unique() function in order to find the unique value of the Type of glass column.

In [None]:
dataset.Type_of_glass.unique()

It can be confirmed that ‘type4’ is missing from the Type of glass

Assessing the data counts in type of glass


In [None]:
dataset.Type_of_glass.value_counts()

## Step 8

#### Checking for duplicate values

Checking for presence of duplicate values in our dataset as it will affect the accuracy of our ML model.


In [None]:
duplicate = dataset.duplicated()
print('\n',"_"*110,'\n')
print("Number of Duplicate Values: ",duplicate.sum())
print('\n',"_"*110,'\n')
dataset[duplicate]

Now we know that there are no duplicate values in our dataset.

Now we select duplicate rows depending on list of column names as there could be duplicate entries having different ID(normally consecutive). Then the above method will not recognise it as duplicate due to it different ID.


In [None]:
duplicate_rows = dataset[dataset.duplicated(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type_of_glass'])]
duplicate_rows

Check the rows having same values in order to verify if the ID is consecutive


In [None]:
dataset[dataset.duplicated(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type_of_glass'], keep=False)]

consecutive Now we know that the ID numbers of duplicate values are consecutive

#### Handling duplicate values

By using drop_duplicates() the duplicate values can be removed.


In [None]:
# printing the dataset shape before remove the duplicate values
print('Dataset shape before remove duplicate values: ', dataset.shape)

# remove the duplicate values 
dataset.drop_duplicates(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type_of_glass'],inplace=True)

# printing the dataset shape after remove the duplicate values
print('Dataset shape after remove duplicate values: ', dataset.shape)

In [None]:
duplicate_rows = dataset[dataset.duplicated(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type_of_glass'])]
duplicate_rows

We have handled the duplicate values.

## Step 9

#### Removing unnecessary columns

Drop the ID Number column as it is of no use.

In [None]:
# Drop the ID Number as it is of no use.
dataset.drop('Id_number', axis = 1, inplace = True)

In [None]:
dataset.head(3)

Here we can see we successfully droped the Id_number column

## Step 10

#### Handling the outliers

Managing the outliers i.e. the extreme values in the data. With the use of boxplot and histplot the outliers in our data can be identified.


making a copy of dataset and replacing the ‘Type of glass’ column values (numbers) using real names 


Syntax:DataFrame.copy ( deep=True)

In the case of deep=True(default), a new object will be created ith a copy of the calling object’s data and indices. The original objectwill not reflect the modifications made to the data or indices of the copy(see notes below).

A new object will be created without copying the calling object’s data or index when deep=False (only references to data and index are copied). The changes made in the data of original will be reflected in the shallow copy and vice-versa 

In [None]:
df = dataset.copy(deep=True)
df['Type_of_glass'] = df['Type_of_glass'].replace([1,2,3,4,5,6,7],['Building windows float processed', 'Building windows non float processed', 'Vehicle windows float processed', 'Vehicle windows non float processed (none in this database)', 'Containers', 'Tableware', 'Headlamps'])


Lets group by mean

In [None]:
df.groupby(['Type_of_glass']).mean()

Let's calculate the percentage of each Type_of_glass category.

In [None]:
print('\n',"_"*110,'\n')
print(df.Type_of_glass.value_counts(normalize=True))
print('\n',"_"*110,'\n')

plot the bar graph of percentage Type_of_glass categories

In [None]:
print('\n',"_"*110,'\n')
df.Type_of_glass.value_counts(normalize=True).plot.barh(figsize= (18,5), fontsize=20, color=['Red','Pink','LightBlue','Purple','Brown','Green'])
plt.show()
print('\n',"_"*110,'\n')

Creating plot_hist_box function to make hist and box plot 

In [None]:
def plot_hist_box(name,x_lbl):
    fig, axes = plt.subplots(1,2,figsize=(27,5))
    sns.set(font_scale = 2)
    plt.tight_layout()
    sns.histplot(df[name], ax = axes[0])
    axes[0].set_xlabel(x_lbl, fontsize=20)
    axes[0].set_ylabel('Count', fontsize=20)
    axes[0].yaxis.tick_left()

    sns.boxplot(x = dataset['Type_of_glass'], y = name, data = df, hue = 'Type_of_glass', palette="Set1", dodge = False, ax = axes[1])
    axes[1].set_xlabel('Type of glass', fontsize=20)
    axes[1].set_ylabel(x_lbl, fontsize=20)
    axes[1].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)
    axes[1].yaxis.set_label_position("right")
    axes[1].yaxis.tick_right()
    plt.show()

In [None]:
# Refractive Index
plot_hist_box('RI','Refractive Index')

# Sodium
plot_hist_box('Na', 'Sodium')

# Magnesium
plot_hist_box('Mg', 'Magnesium')

# Aluminum
plot_hist_box('Al', 'Aluminum')

# Silicon
plot_hist_box('Si', 'Silicon')

# Potassium
plot_hist_box('K', 'Potassium')

# Calcium
plot_hist_box('Ca', 'Calcium')

# Barium
plot_hist_box('Ba', 'Barium')

# Iron
plot_hist_box('Fe', 'Iron')

creating a function so that it can be called to check after handling the outliers 


In [None]:
def boxplot(otlr_df):
    plotlocation = 0
    plt.figure(figsize=(25,15))
    sns.set(font_scale = 2)
    columns = otlr_df.columns.values
    for column in columns:
        plotlocation = plotlocation + 1
        plt.subplot(2,5,plotlocation)
        plt.tight_layout()
        otlr_df.boxplot(column=column)

In [None]:
boxplot(dataset.loc[:,dataset.columns != 'Type_of_glass'])

From the above boxplot it is observed that the normal range of data lies within the block and the outliers are indicated by the small circles in the extreme end of the graph

In order to handle it we can either drop the outliers value or replace it using IQR (Interquartile Range Method).

dropping the outliers cannot be afforded when we have less data points hence we replace the outlier values using IQR.

The difference between the 25th and 75th percentile of the data gives the IQR. By sorting the selecting values at specific indices the percentiles can be calculated. The IQR is used to identify outliers by defining limits on the sample values that are a factor k of the IQR. 1.5 is the common value for the factor k .


Creating a copy of dataset

In [None]:
dataset_outlier_removed = dataset.copy(deep=True)
dataset_outlier_removed.head()

lets find the IQR (Inter quantile range)

In [None]:
def find_boundaries(df,variables):
    # lets find the IQR (Inter quantile range)
    Q1 = df[variables].quantile(0.25)
    Q3 = df[variables].quantile(0.75)
    IQR = Q3 - Q1
    lower_boundary = Q1 - (1.5 * IQR)
    upper_boundary = Q3 + (1.5 * IQR)
    
    return lower_boundary, upper_boundary

Finding the lower and upper limits


In [None]:
lower_RI, upper_RI = find_boundaries(dataset_outlier_removed, 'RI')
lower_Na, upper_Na = find_boundaries(dataset_outlier_removed, 'Na')
lower_Mg, upper_Mg = find_boundaries(dataset_outlier_removed, 'Mg')
lower_Al, upper_Al = find_boundaries(dataset_outlier_removed, 'Al')
lower_Si, upper_Si = find_boundaries(dataset_outlier_removed, 'Si')
lower_K, upper_K = find_boundaries(dataset_outlier_removed, 'K')
lower_Ca, upper_Ca = find_boundaries(dataset_outlier_removed, 'Ca')
#lower_Ba, upper_Ba = find_boundaries(dataset_outlier_removed, 'Ba')
lower_Fe, upper_Fe = find_boundaries(dataset_outlier_removed, 'Fe')

In [None]:
print('Lower limit of RI is: ', lower_RI,'\nUpper limit of RI is: ', upper_RI)
print('Lower limit of Na is: ', lower_Na,'\nUpper limit of Na is: ', upper_Na)
print('Lower limit of Mg is: ', lower_Mg,'\nUpper limit of Mg is: ', upper_Mg)
print('Lower limit of Al is: ', lower_Al,'\nUpper limit of Al is: ', upper_Al)
print('Lower limit of Si is: ', lower_Si,'\nUpper limit of Si is: ', upper_Si)
print('Lower limit of K is: ', lower_K,'\nUpper limit of K is: ', upper_K)
print('Lower limit of Ca is: ', lower_Ca,'\nUpper limit of Ca is: ', upper_Ca)
#print('Lower limit of Ba is: ', lower_Ba,'\nUpper limit of Ba is: ', upper_Ba)
print('Lower limit of Fe is: ', lower_Fe,'\nUpper limit of Fe is: ', upper_Fe)

at upper and lower limits capping variables

In [None]:
def capping_ver(ver,upper,lower):
    dataset_outlier_removed[ver] = np.where(dataset_outlier_removed[ver] > upper, upper,
                          np.where(dataset_outlier_removed[ver] < lower, lower, dataset_outlier_removed[ver]))
    return dataset_outlier_removed[ver]

In [None]:
capping_ver('RI',upper_RI,lower_RI)
capping_ver('Na',upper_Na,lower_Na)
capping_ver('Mg',upper_Mg,lower_Mg)
capping_ver('Al',upper_Al,lower_Al)
capping_ver('Si',upper_Si,lower_Si)
capping_ver('K',upper_K,lower_K)
capping_ver('Ca',upper_Ca,lower_Ca)
#capping_ver('Ba',upper_Ba,lower_Ba)
capping_ver('Fe',upper_Fe,lower_Fe)

again plot the boxplot and check if we handled the outliers


In [None]:
boxplot(dataset_outlier_removed.loc[:,dataset.columns != 'Type_of_glass'])

here we can see we successfully processed all the outliers except the 'Ba' column. which we didnt process because the column contains full of outliers and we loose data if we try to process it. 

to confirm we didn’t loose any data lets observe our dataset shape.

In [None]:
dataset_outlier_removed.shape

we have the same shape which means we didn’t loose any data


## Step 11

Normalizing and Scaling – Data Normalization or feature scaling is A process used to standardize the range of features of the data as the range may vary a lot is known as data normalization or feature scaling. Thus we can preprocess the data using ML algorithms.

### Normalizing and Scaling (StandardScaler, MinMaxScaler and RobustScaler)

In [None]:
#from sklearn.preprocessing import RobustScaler
from sklearn import preprocessing

# taking all the columns except Type_of_glass column
data = dataset.loc[:,dataset.columns != 'Type_of_glass']
# perform a robust scaler transform of the dataset
scaler = preprocessing.RobustScaler()
robust_df = scaler.fit_transform(data)
# convert the array back to a dataframe
robust_df = pd.DataFrame(robust_df, columns =['RI', 'Na', 'Mg', 'Al','Si','K','Ca','Ba','Fe'])

# perform a Standard Scaler transform of the dataset
scaler = preprocessing.StandardScaler()
standard_df = scaler.fit_transform(data)
# convert the array back to a dataframe
standard_df = pd.DataFrame(standard_df, columns =['RI', 'Na', 'Mg', 'Al','Si','K','Ca','Ba','Fe'])

# perform a MinMaxScaler transform of the dataset
scaler = preprocessing.MinMaxScaler()
minmax_df = scaler.fit_transform(data)
# convert the array back to a dataframe
minmax_df = pd.DataFrame(minmax_df, columns =['RI', 'Na', 'Mg', 'Al','Si','K','Ca','Ba','Fe'])

Creating a function to plot the data that we scaled using StandardScaler, MinMaxScaler and RobustScaler to compare the difference

In [None]:
def plot_scale(df_type_1,df_type_2,title_1,title_2):
    fig, axes = plt.subplots(1,2,figsize=(27,8))
    # increasing font size
    sns.set(font_scale = 2)
    sns.kdeplot(df_type_1['RI'], ax = axes[0], label='RI')
    sns.kdeplot(df_type_1['Na'], ax = axes[0], label='Na')
    sns.kdeplot(df_type_1['Mg'], ax = axes[0], label='Mg')
    sns.kdeplot(df_type_1['Al'], ax = axes[0], label='Al')
    sns.kdeplot(df_type_1['Si'], ax = axes[0], label='Si')
    sns.kdeplot(df_type_1['K'], ax = axes[0], label='K')
    sns.kdeplot(df_type_1['Ca'], ax = axes[0], label='Ca')
    sns.kdeplot(df_type_1['Ba'], ax = axes[0], label='Ba')
    sns.kdeplot(df_type_1['Fe'], ax = axes[0], label='Fe')
    axes[0].set_xlabel('data', fontsize=20)
    axes[0].set_title(title_1, fontsize=25)
    axes[0].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=5, mode="expand", borderaxespad=0.)
    axes[0].yaxis.set_label_position("left")
    axes[0].yaxis.tick_left()
    
    # increasing font size
    sns.set(font_scale = 2)
    sns.kdeplot(df_type_2['RI'], ax = axes[1], label='RI')
    sns.kdeplot(df_type_2['Na'], ax = axes[1], label='Na')
    sns.kdeplot(df_type_2['Mg'], ax = axes[1], label='Mg')
    sns.kdeplot(df_type_2['Al'], ax = axes[1], label='Al')
    sns.kdeplot(df_type_2['Si'], ax = axes[1], label='Si')
    sns.kdeplot(df_type_2['K'], ax = axes[1], label='K')
    sns.kdeplot(df_type_2['Ca'], ax = axes[1], label='Ca')
    sns.kdeplot(df_type_2['Ba'], ax = axes[1], label='Ba')
    sns.kdeplot(df_type_2['Fe'], ax = axes[1], label='Fe')
    axes[1].set_xlabel('data', fontsize=20)
    axes[1].set_title(title_2, fontsize=25)
    axes[1].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=5, mode="expand", borderaxespad=0.)
    axes[1].yaxis.set_label_position("right")
    axes[1].yaxis.tick_right()
    plt.show()

In [None]:
plot_scale(data,robust_df,'Before Scaling\n\n\n','After Robust Scaling\n\n\n')
plot_scale(standard_df,minmax_df,'After Standard Scaling\n\n\n','After Min-Max Scaling\n\n\n')

As Standard Scaling turns out to be better than others we are using it for further process


Lets see the shape of the data to makesure that we didnt loose any values

In [None]:
print('Data shape before process: ',data.shape)
print('Data shape After Robust Scaling: ',robust_df.shape)
print('Data shape After After Standard Scaling: ',standard_df.shape)
print('Data shape After After Min-Max Scaling: ',minmax_df.shape)

he data has been normalized


Lets add the Type_of_glass column back to the dataframe

In [None]:
standard_df['Type_of_glass'] = dataset_outlier_removed['Type_of_glass'].values
#dataset_outlier_removed[dataset_outlier_removed.isnull().any(axis=1)]
robust_df.head(4)

## Step 12

#### Data Visualization and Preprocessing 

 	

#### hue

Variable in data to map plot aspects to different colors.	

#### palette

set of colours used for mapping hue variables


#### kind

plot used for non-identity relationships. {‘scatter’, ‘reg’}


#### diag_kind
 plot used for the diagonal subplots. {‘hist’, ‘kde’}


(tutorialspoint, 2021)

lets make a Pairplot to visualizes the data to find the relationship

In [None]:
print('\n',"_"*110,'\n')
sns.set(font_scale=1.8)
g = sns.pairplot(standard_df, hue="Type_of_glass",diag_kind = "kde",diag_kws={"hue": None, "color": ".2"},kind = "scatter",palette = "rainbow")
handles = g._legend_data.values()
labels = g._legend_data.keys()
g.fig.legend(title='Type_of_glass',handles=handles, labels=labels, loc='upper center', ncol=6)
g.fig.legend(title='Type_of_glass',handles=handles, labels=labels, loc='lower center', ncol=6)
g.fig.subplots_adjust(top=0.94, bottom=0.09)
g._legend.set_bbox_to_anchor((0.4, 0.))
g._legend.remove()
plt.show()
print('\n',"_"*110,'\n')

The corr() method can be used to find the pairwise correlation between the different columns of the data.

We find the pairwise correlation of all columns in the data frame using standard_df.corr() . e automatically exclude any ‘nan’ values.

A value between -1 and 1 inclusive is the resulting coefficient, where:

1: Total positive linear correlation
0: No linear correlation, the two variables most likely do not affect each other
-1: Total negative linear correlation


Creating a heatmap using Seaborn for visualizing the correlation between the different columns of our data:


In [None]:
corr = standard_df.corr()
corr

In [None]:
print('\n',"_"*110,'\n')
#Plot figsize
plt.figure(figsize=(17, 17))
# increasing font size
sns.set(font_scale = 1.5)
#Generate Heat Map
ax = sns.heatmap(corr, cmap=plt.cm.plasma, annot=True,linecolor="white", square=True, fmt=".2f",linewidths=(1,1))

plt.title("Correlations between the dimensions\n", fontsize=30)
plt.xlabel("Different kind of glass", fontsize=20)
plt.ylabel("Different kind of glass", fontsize=20)
#Apply xticks
plt.xticks(rotation=90)
plt.xticks(np.arange(len(corr.columns))+0.5, corr.columns)

#Apply yticks
plt.yticks(np.arange(len(corr.columns))+0.5, corr.columns)
plt.yticks(rotation=0) 
#show plot
plt.show()
print('\n',"_"*110,'\n')

Visualizing the relation using Seaborn.


In [None]:
chartlocation=0
columns =['RI', 'Na', 'Mg', 'Al','Si','K','Ca','Ba','Fe']
plt.figure(figsize=(17,14))
for i in columns:
    chartlocation = chartlocation + 1
    plt.subplot(3,3,chartlocation)
    plt.tight_layout()
    sns.regplot(x=i,y='Type_of_glass',data = standard_df)

In [None]:
features = standard_df.loc[:,dataset_outlier_removed.columns != 'Type_of_glass']
target = standard_df.loc[:,'Type_of_glass']

In [None]:
features.corr()

##### Observation

The Refractive Index is correlated to the Ca Oxide Content quite heavily...
the increase in CaO content or R ratio, heat‐treated glasses exhibit direct band gap within 5.92‐6.01 eV range. 
The Urbach energy lies within the 0.62‐0.86 eV range for all the heat‐treated glass samples.

It is proved that the optical parameters of the glass are influenced by the CaO content.


In [None]:
#Plot figsize
plt.figure(figsize=(17, 8))
corr.iloc[0,:].plot(kind='bar', color=['Red','Pink','LightBlue','Purple','Brown','Green','Blue','DarkGreen','Yellow'])
#show plot
plt.show()
print('\n',"_"*110,'\n')

An increase in calcium and iron content positively affects the refractive index of glass while rest of the elements have a negative correlation with the refractive index 




##### Lets have a look how other elements are correlated with each other.

In [None]:
chartlocation = 0
plt.figure(figsize=(17,18))
columns = np.copy(corr.columns.values)
for index, row in corr.iterrows():
    column_name = columns[chartlocation]
    chartlocation = chartlocation + 1
    plt.subplot(4,3,chartlocation)
    plt.tight_layout()
    row.drop(index).plot(kind='bar', title='\n'+column_name+'\n', color=['Red','Pink','LightBlue','Purple','Brown','Green','Blue','DarkGreen','Yellow'])

The only element that is negatively correlated with every other element that make up glass is silica. If there is an increase in silica every other element needs to be decreased. This increase mainly affects the refractive index as seen in the graph.


In [None]:
X = standard_df.drop(columns = ['Type_of_glass'], axis = 1)
y = standard_df['Type_of_glass']

In [None]:
X.skew().plot(kind='bar', figsize=(17,8))
plt.show()

##### Visualizing the dataset

In [None]:
fig = plt.figure(1, figsize=(15, 15))
ax = Axes3D(fig)
X_reduced = PCA(n_components=3).fit_transform(features)
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=y)
plt.title("Priciple components 3")
plt.show()

plt.figure(1, figsize=(17, 8))
X_reduced = PCA(n_components=2).fit_transform(features)
plt.title("Priciple components 2")
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y)
plt.show()

## Step 13

### Splitting the dataset into the Training (80%) set and Test(20%) set

In [None]:
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 42)

lets have a look at the shape of our data

In [None]:
print('\n',"_"*110,'\n')
print("Shape of X_train: ",X_train.shape)
print("Shape of X_test: ", X_test.shape)
print("Shape of y_train: ",y_train.shape)
print("Shape of y_test:",y_test.shape)
print('\n',"_"*110,'\n')

####  Feature Scaling

Now we have to scale our dataset using Sklearn’s StandardScaler. Due to the massive amounts of computations taking place in deep learning, feature scaling is compulsory. Feature scaling standardizes the range of our independent variables.

In [None]:
# Feature Scaling
sc = preprocessing.StandardScaler()
#X_train_scaled = pd.DataFrame(sc.fit_transform(X_train), columns=X_train.columns.values)
#X_test_scaled = pd.DataFrame(sc.fit_transform(X_test), columns=X_test.columns.values)

X_train_scaled = sc.fit_transform(X_train)
X_test_scaled = sc.fit_transform(X_test)
print('\n',"_"*110,'\n')
print(X_train_scaled.shape)
print(X_test_scaled.shape)
print('\n',"_"*110,'\n')


We will use one of the ensemble methods to find how important different elements are to the making of a glass.

In [None]:
from sklearn.ensemble import GradientBoostingClassifier

std_df = standard_df.loc[:,standard_df.columns != 'Type_of_glass']

clf = GradientBoostingClassifier(n_estimators=100)
clf = clf.fit(X_train_scaled, np.ravel(y_train,order='C'))
feature_with_importance = pd.DataFrame()
feature_with_importance['columns'] = std_df.columns
feature_with_importance['importance'] = clf.feature_importances_
feature_with_importance.sort_values(by=['importance'], ascending=True, inplace=True)
feature_with_importance.set_index('columns', inplace=True)
feature_with_importance.plot(kind='bar',figsize=(17, 8))
plt.show()

In [None]:
# Defining a function to encode output column
def encode(data):
    print('Shape of data (BEFORE encode): %s' % str(data.shape))
    encoded = to_categorical(data)
    print('Shape of data (AFTER  encode): %s\n' % str(encoded.shape))
    return encoded

In [None]:
y_train_encoded = encode(y_train)

In [None]:
y_test_encoded = encode(y_test)

In [None]:
y_train_encoded = np.delete(y_train_encoded, [0,4], axis = 1)
y_test_encoded = np.delete(y_test_encoded, [0,4], axis = 1)
print('\n',"_"*110,'\n')
print(y_train_encoded[2])
print(y_test_encoded[2])
print('\n',"_"*110,'\n')

## Step 14

## Building the Artificial Neural Network(ANN)

First we create a ANN model and we will improve the model.

#### Adding input layer (First Hidden Layer)
Different layers are added to our ANN using add method. The number of nodes is the fierst parameter to be added to this layer. No rule of thumb is applied to the number of nodes being added. The function used to initialize the waits is the second parameter, kernel_initializer. The number of nodes in the input layer is the final parameter input_dim. The number of independent variables are represented by this
#### Adding Hidden Layers
 The input_shape parameter doesn’t need to specified while adding the second hidden layer as it is already specified in the first hidden layer. It was specified in the first hidden layer to let the layer know how many input nodes to expect. It doesn’t need to be repeated as in the second hidden layer the ANN is ware of the number of input nodes to be expected.
#### Adding the output layer
In this case there are 6 classes. The first parameter is changed to 6(unit=6) and the activation function is changed to softmax. Softmax is a sigmoid function that is applied to an independent variable having more than two categories.

In [None]:
# Initialising ANN by creating an instance of Sequential
first_model = Sequential()
# Adding the input layer and the first hidden layer
first_model.add(Dense(units = 18, input_shape=(9,), kernel_initializer = 'uniform', activation = 'relu'))
# Adding the second hidden layer
first_model.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
first_model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'softmax'))
first_model.summary()

#### Compiling the ANN
Compiling means basically applying a stochastic gradient descent to the whole neural network. The first parameter is the algorithm we prefer to use in order to get the optimal set of weights in the neural network. This has many variants and an efficient one to use is Adam. The loss function is the second parameter and here we use the categorical_crossentopy loss function. The final argument comprises of the criterion used to evaluate our model. In this case we use the accuracy.

In [None]:
# TRAIN Model 
# Compiling the ANN model
first_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy' , metrics = ['accuracy'])

# Fitting the ANN to the Training set
first_model_history = model_01.fit(X_train_scaled,
                                   y_train_encoded,
                                   validation_data=(X_test_scaled, y_test_encoded),
                                   batch_size = 10,
                                   epochs = 500)
print('\n',"_"*110,'\n')

In [None]:
print('\n',"_"*110,'\n')
print("Training set - first_model: ", first_model_history.history.get('accuracy')[-1])
print("Test set - first_model: ", first_model_history.history.get('val_accuracy')[-1])
print('\n')

Now lets create a function to make plot visualize the models train, test accuracys and train, test loss

In [None]:
def plot_acc_lss_oneTry(model,title):
    plt.figure(figsize = (17,8))
    plt.plot(model.history["accuracy"], label = "Training Accuracy")
    plt.plot(model.history["val_accuracy"], label = "Validation Accuracy")
    plt.plot(model.history["loss"], label = "Training Loss")
    plt.plot(model.history["val_loss"], label = "Validation Loss")
    plt.xlabel("Number of Epochs")
    plt.ylabel("Accuracy")
    plt.title(title+"\n\n\n\n")
    plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)
    plt.show()

Lets visualize to get a clear understanding of models accuracy and loss.

In [None]:
plot_acc_lss_oneTry(first_model_history,"Model Accuracy and Loss")

from the above visualisation we can understand that the validation loss is really high especially after 200 epochs

lets print the final loss and accuracy

In [None]:
print("_"*110)
print('\n First Model\n')
final_loss_fm1, final_accuracy_fm1 = first_model.evaluate(X_test_scaled, y_test_encoded)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss_fm1, final_accuracy_fm1))

lets make a plot to visualize the pridicted and actual values (Type of glass)

In [None]:
glasses = ['Building windows\n float processed', 'Building windows\n non float processed', 'Vehicle windows\n float processed', 'Containers', 'Tableware', 'Headlamps']
Y_true = np.argmax(y_test_encoded, axis=1)
# Model 01
Y_pred = first_model.predict(X_test_scaled)

Y_pred = np.argmax(Y_pred, axis=1)

cm = confusion_matrix(Y_true, Y_pred)
plt.figure(figsize=(20, 10))
# increasing font size
sns.set(font_scale=1.5)
ax = sns.heatmap(cm, cmap="RdYlGn" , annot=True, square=True, xticklabels=glasses, yticklabels=glasses,linewidths=.5)
ax.set_ylabel('Actual', fontsize=20)
ax.set_xlabel('Predicted', fontsize=20)
plt.title('Model Accuracy\n', fontsize=30)
plt.show()
print('\n',"_"*110,'\n')

Here we can see we dont have much accurate prediction. 

In [None]:
print('\n',"_"*110,'\n')
print("Calculating first model accuracy")
scores = first_model.evaluate(X_test_scaled, y_test_encoded)
print(f"First Model - Test Accuracy: {scores[1]*100}")
print('\n')

Our model is not good and the accuracy is really low. so we need to improve our model.

## Step 15

#### Defining a function to pass by build_fn argument

KerasClassifier class in Keras takes an argument build_fn which is the name of the function to call to get your model.
We must define a function that defines our model, compiles it and returns it.

we need to initialize our ANN by creating an instance of Sequential. The Sequential function initializes a linear stack of layers. This allows us to add more layers later using the Dense module.


#### Adding input layer (First Hidden Layer)
Different layers are added to our ANN using add method. The number of nodes is the fierst parameter to be added to this layer. No rule of thumb is applied to the number of nodes being added. The function used to initialize the waits is the second parameter, kernel_initializer. The number of nodes in the input layer is the final parameter input_dim. The number of independent variables are represented by this
#### Adding Hidden Layers
 The input_shape parameter doesn’t need to specified while adding the second hidden layer as it is already specified in the first hidden layer. It was specified in the first hidden layer to let the layer know how many input nodes to expect. It doesn’t need to be repeated as in the second hidden layer the ANN is ware of the number of input nodes to be expected.
#### Adding the output layer
In this case there are 6 classes. The first parameter is changed to 6(unit=6) and the activation function is changed to softmax. Softmax is a sigmoid function that is applied to an independent variable having more than two categories.
#### Compiling the ANN
Compiling means basically applying a stochastic gradient descent to the whole neural network. The first parameter is the algorithm we prefer to use in order to get the optimal set of weights in the neural network. This has many variants and an efficient one to use is Adam. The loss function is the second parameter and here we use the categorical_crossentopy loss function. The final argument comprises of the criterion used to evaluate our model. In this case we use the accuracy.


Defining function named as build_model_01 :


In [None]:
def build_model_01():
    # Initialising Model
    model = Sequential()
    # Adding the input layer and first hidden layer
    model.add(Dense(units = 18, input_shape=(9,), kernel_initializer = 'uniform', activation = 'relu'))
    # Adding the hidden layer
    model.add(Dense(units = 12, kernel_initializer = 'uniform', activation = 'relu'))
    # Adding the output layer
    model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'sigmoid'))
    model.summary()
    # compile model
    model.compile(optimizer = 'adam', loss = 'categorical_crossentropy' , metrics = ['accuracy'])
    return model

This function will build the model and return it for use in the next step.



#### Creating model with default batch_size

In [None]:
model_01 = KerasClassifier(build_fn = build_model_01, epochs=100)
model_01.evaluate= "classifier"

#### Fitting model

X_train_scaled represents the independent variables we’re using to train our ANN, and y_train_encoded represents the column we’re predicting. Epochs represents the number of times we’re going to pass our full dataset through the ANN. 

In [None]:
history_01 = model_01.fit(X_train_scaled,
                          y_train_encoded, 
                          validation_data=(X_test_scaled, y_test_encoded))

From above output:

We saw that 100 epochs and 210 iterations for each epoch.Because the default batch size is 32; we had 164 samples / 32 = 5.12 (5 or 6) batches for each epoch. Parameters (weights and bias) were updated and accuracy re-calculated after each batch in each epoch.

For example: in 1st epoch, parameters and accuracy calculated (with 32 samples) after 1st batch (1/6).

Then parameters and accuracy re-calculated (with 32 samples) after 2nd batch (2/6).

Then parameters and accuracy re-calculated (with 32 samples) after 3rd batch (3/6) and so on.

#### Creating model with a decreased batch_size

The only thing we have done here is added batch_size = 10. Batch_size is the number of observations after which the weights will be updated.

In [None]:
model_02 = KerasClassifier(build_fn = build_model, 
                           epochs=100, 
                           batch_size = 10)
model_02.evaluate= "classifier"

#### Fitting model

In [None]:
history_02 = model_02.fit(X_train_scaled, 
                          y_train_encoded, 
                          validation_data=(X_test_scaled, y_test_encoded))

Decreasing batch size will increase the iteration number as well as computation time and cpu usage.

Now lets create a function to make plot visualize the models train, test accuracys and train, test loss

In [None]:
def plot_acc_lss(model_1, model_2, m1_label_1, m1_label_2, m2_label_1, m2_label_2, title):
    plt.figure(figsize = (17,8))
    plt.plot(model_1.history["accuracy"], label = m1_label_1)
    plt.plot(model_2.history["accuracy"], label = m2_label_1)
    plt.plot(model_1.history["loss"], label = m1_label_2)
    plt.plot(model_2.history["loss"], label = m2_label_2)
    plt.xlabel("Number of Epochs")
    plt.ylabel("Accuracy")
    plt.title(title+"\n\n\n\n")
    plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)
    plt.show()

Lets visualize the plot

In [None]:
plot_acc_lss(history_01,
             history_02,
             "Batch size = 32 (Model_01)  Accuracy",
             "Batch size = 32 (Model_01) Loss",
             "Batch size = 10 (Model_02)  Accuracy",
             "Batch size = 10 (Model_02) Loss",
             "Affects of batch size on accuracy")

Here we can see batch size 10 is more accurate with less loss than batch 32

#### Evaluating model with cross_val_score and StratifiedKFold

A new classifier is created using K-fold cross validation and the parameter build_fn is passed as the function created. After this the number of epochs and batch size is passed. Scikit-learn's cross_val_score function is used to apply the k-fold cross validation function. The model built with build_model is the estimator. The number of folds is cv. The accuracies of the test folds used in the computation will be returned with the cross_val_score

In [None]:
model = KerasClassifier(build_fn = build_model, 
                        epochs=100, 
                        batch_size = 10)

kfold = StratifiedKFold(n_splits = 4, 
                        shuffle = True, 
                        random_state = 42)

accuracies = cross_val_score(estimator = model, 
                             X = X_train_scaled, 
                             y = y_train, 
                             cv = kfold)

Lets visualize the plot

In [None]:
plt.subplots(figsize = (10,6))
plt.plot(accuracies)
plt.xlabel("K-fold values of Cross Validation Score")
plt.ylabel("Accuracies")
plt.title("Cross Validation Accuracies vs K-Folds")
plt.grid(axis = "both")

plt.show()

In [None]:
print("Best accuracy : {} @ k-fold value of {}".format(round(accuracies.max()*100,2),accuracies.argmax()))

#### Fighting Overfitting

When a model learns the details and noise in the training set such that it performs poorly on the test set it is known as overfitting in machine learning. Where there are huge differences between the accuracies of test and training set or high variance is observed when the k-fold cross validation is applied it can be observed. Dropout regularization is a technique used in artificial neural networks to counteract this. This technique works by disabling some neurons randomly at each iteration of training in order to prevent increased dependency on each other.

In [None]:
def build_model_02():
    print('\n Model 04 \n')      
    # Initialising Model
    model = Sequential()
    # Adding the input layer and the first hidden layer
    model.add(Dense(units = 162, input_shape=(9,), kernel_initializer = 'uniform', activation = 'relu'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding the second hidden layer
    model.add(Dense(units = 142, kernel_initializer = 'uniform', activation = 'relu'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding hidden layer
    model.add(Dense(units = 72, kernel_initializer = 'uniform', activation = 'relu'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding hidden layer
    model.add(Dense(units = 9, kernel_initializer = 'uniform', activation = 'relu'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding BatchNormalization
    model.add(BatchNormalization())
    # Adding the output layer
    model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'sigmoid'))
    model.summary()
    # compile model
    model.compile(optimizer = 'adam', loss = 'categorical_crossentropy' , metrics = ['accuracy'])
    return model

In this case we start to apply the dropout layer after our first hidden layer till our output layer (excluding output layer). Using 0.1 means that 1% of the neurons will be disabled at each iteration.


now lets try with 500 epochs and batch_size 10

In [None]:
model_03 = KerasClassifier(build_fn = build_model_02, 
                           epochs=500, 
                           batch_size = 10)
history_03 = model_03.fit(X_train_scaled, 
                          y_train_encoded, 
                          validation_data=(X_test_scaled, y_test_encoded))

lets try with reduced batch_size (5)

In [None]:
model_04 = KerasClassifier(build_fn = build_model_02, 
                           epochs=500, 
                           batch_size = 5)
history_04 = model_04.fit(X_train_scaled, 
                          y_train_encoded, 
                          validation_data=(X_test_scaled, y_test_encoded))

Lets plot to get a clear understanding of each models accuracy and loss. 

In [None]:
plot_acc_lss(history_03,
             history_04,
             "Batch size = 10 (Model_03)  Accuracy",
             "Batch size = 10 (Model_03) Loss",
             "Batch size = 5 (Model_04)   Accuracy",
             "Batch size = 5 (Model_04) Loss",
             "Affects of batch size on accuracy")

lets create a plot to visualize and compaire the 4 models Training Accuracy, Validation accurarcy, Training Loss, Validation Loss

In [None]:
fig, ax = plt.subplots(2,2, figsize=(25,15))

# Training Accuracy
ax[0, 0].plot(history_01.history['accuracy'], label='Model 01')
ax[0, 0].plot(history_02.history['accuracy'], label='Model 02')
ax[0, 0].plot(history_03.history['accuracy'], label='Model 03')
ax[0, 0].plot(history_04.history['accuracy'], label='Model 04')
ax[0, 0].set_title('Training Accuracy\n\n\n')
ax[0, 0].set_xlabel('Epoch', fontsize=18)
ax[0, 0].set_ylabel('Accuracy', fontsize=18)
ax[0, 0].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

# Validation accurarcy
ax[0, 1].plot(history_01.history['val_accuracy'], label='Model 01', linestyle='--')
ax[0, 1].plot(history_02.history['val_accuracy'], label='Model 02', linestyle='--')
ax[0, 1].plot(history_03.history['val_accuracy'], label='Model 03', linestyle='--')
ax[0, 1].plot(history_04.history['val_accuracy'], label='Model 04', linestyle='--')
ax[0, 1].set_title('Validation Accurarcy\n\n\n')
ax[0, 1].set_xlabel('Epoch', fontsize=18)
ax[0, 1].set_ylabel('Accuracy', fontsize=18)
ax[0, 1].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

# Training Loss
ax[1, 0].plot(history_01.history['loss'], label='Model 01')
ax[1, 0].plot(history_02.history['loss'], label='Model 02')
ax[1, 0].plot(history_03.history['loss'], label='Model 03')
ax[1, 0].plot(history_04.history['loss'], label='Model 04')
ax[1, 0].set_title('Training Loss\n\n\n')
ax[1, 0].set_xlabel('Epoch', fontsize= 18)
ax[1, 0].set_ylabel('Loss', fontsize= 18)
ax[1, 0].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

# Validation Loss
ax[1, 1].plot(history_01.history['val_loss'], label='Model 01', linestyle='--')
ax[1, 1].plot(history_02.history['val_loss'], label='Model 02', linestyle='--')
ax[1, 1].plot(history_03.history['val_loss'], label='Model 03', linestyle='--')
ax[1, 1].plot(history_04.history['val_loss'], label='Model 04', linestyle='--')
ax[1, 1].set_title('Validation Loss\n\n\n')
ax[1, 1].set_xlabel('Epoch', fontsize=18)
ax[1, 1].set_ylabel('Loss', fontsize=18)
ax[1, 1].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

plt.subplots_adjust(left=0.3,
                    bottom=0.3, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.2, 
                    hspace=0.9)

plt.show()

lets print the training and test accuracy of each model

In [None]:
print('\n',"_"*110,'\n')
print("Training set accuracy - Model 01: ", history_01.history.get('accuracy')[-1])
print("Test set accuracy - Model 01: ", history_01.history.get('val_accuracy')[-1])
print('\n')

print("Training set accuracy - Model 02: ", history_02.history.get('accuracy')[-1])
print("Test set accuracy - Model 02: ", history_02.history.get('val_accuracy')[-1])
print('\n')

print("Training set accuracy - Model 03: ", history_03.history.get('accuracy')[-1])
print("Test set accuracy - Model 03: ", history_03.history.get('val_accuracy')[-1])
print('\n')

print("Training set accuracy - Model 04: ", history_04.history.get('accuracy')[-1])
print("Test set accuracy - Model 04: ", history_04.history.get('val_accuracy')[-1])
print('\n',"_"*110,'\n')

model 4 is more accurate than the other three models

## Step 18

### Comparing the Results

creating a dataframe with the Training Accuracy, Test Accuracy, Training Loss and Test Loss of all the 4 models to compare and visualice 

In [None]:
models = [('Model_01', history_01.history.get('accuracy')[-1], history_01.history.get('val_accuracy')[-1],
           history_01.history.get('loss')[-1], history_01.history.get('val_loss')[-1]),
          ('Model_02', history_02.history.get('accuracy')[-1], history_02.history.get('val_accuracy')[-1],
          history_02.history.get('loss')[-1], history_02.history.get('val_loss')[-1]),
          ('Model_03', history_03.history.get('accuracy')[-1], history_03.history.get('val_accuracy')[-1],
          history_03.history.get('loss')[-1], history_03.history.get('val_loss')[-1]),
          ('Model_04', history_04.history.get('accuracy')[-1], history_04.history.get('val_accuracy')[-1],
          history_04.history.get('loss')[-1], history_04.history.get('val_loss')[-1])
         ]

In [None]:
predict = pd.DataFrame(data = models, columns=['ANN', 'Training Accuracy', 'Test Accuracy', 'Training Loss', 'Test Loss'])
print('\n',"_"*110,'\n')
print(predict)
print('\n',"_"*110,'\n')

###  Visualizing Models Performance

creating two plots to visualize the 4 models accuracy and losses

In [None]:
plt.rcParams["figure.figsize"] = [17, 8]
#plt.style.use('fivethirtyeight') 
predict.reset_index().plot(x="ANN", y=["Training Accuracy", "Test Accuracy"], kind="bar")
plt.title("\n Train and Test Accuracy \n\n", fontsize = 30)
plt.xlabel("\n Models", fontsize = 20)
plt.ylabel("Accuracy\n", fontsize = 20)
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left',
           ncol=4, mode="expand", borderaxespad=0.)
plt.show()

plt.rcParams["figure.figsize"] = [17, 8]
#plt.style.use('fivethirtyeight') 
predict.reset_index().plot(x="ANN", y=["Training Accuracy", "Test Accuracy", 'Training Loss', 'Test Loss'], kind="bar")
plt.title("\n Model Accuracy and Loss \n\n", fontsize = 30)
plt.xlabel("\n Models", fontsize = 20)
plt.ylabel("Accuracy and Loss\n", fontsize = 20)
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left',
           ncol=4, mode="expand", borderaxespad=0.)
plt.show()

## Step 19

### Deep learning with Grid Search

A grid search is used to evaluate different configurations for our neural network model.

 The combination that provides the best-estimated performance will be reported.

The build_model() function is defined to take 4 arguments activation, init, optimizer and loss, four of them must have default values. This lets us evaluate the effect of the use of different optimization algorithms and weight initialization schemes for our network.

 We define the arrays of values for the parameter we wish to search after the creation of our model, specifically:

Optimizers for searching different weight values.
Initializers using different schemes for preparing the network weights
Epochs for training the model for different number of exposures to the training dataset.
Batches for varying the number of samples prior to a weight update.
The options are described into a dictionary and passed to the configuration of the GridSearchCV scikit-learn class. This class will assess a version of our neural network model for each and every combination of parameters

In [None]:
# Tuning the ANN
def build_model(activation='relu',init='glorot_uniform',optimizer='adam',loss='categorical_crossentropy'):
    # create model
    model = Sequential()
    # adding input layer and first hidden layer
    model.add(Dense(units = 32, kernel_initializer = init, activation = activation, input_dim = 9))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 24, kernel_initializer = init, activation = activation))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 16, kernel_initializer = init, activation = activation))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 8, kernel_initializer = init, activation = activation))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding BatchNormalization
    model.add(BatchNormalization())
    # adding output layer
    model.add(Dense(units = 6, kernel_initializer = init, activation = activation))
    # summary
    model.summary()
    # compile model
    model.compile(optimizer = optimizer, loss = loss, metrics = ['accuracy'])
    return model

#### create KerasClassifier object

A special wrapper class from Keras than enmeshes the Scikit-learn classifier PI with Keras parametric models. Various model parameters corresponding to the build_model function and other hyperparameters can be passed.


In [None]:
model = KerasClassifier(build_fn = build_model)
model.evaluate= "classifier"

Searching the following hyperparameters

activation,
loss,
optimizer type,
initialization method,
batch size,
number of epochs

Creating a dictionary of search parameters and passing it onto the Scikit-learn GridSearchCV estimator.


The independent variables used to train the ANN model is represented by X_train_scaled  and the column being predicted is represented by y_train_encoded. The number of times we pass our full dataset through the ANN is expressed by Epochs. The number of observations required after which the weights will be updated is described by Batch_size.

In [None]:
def dimention_function(activations,loss,optimizers,initializers,batches,epochs):
    parameters_grid = dict(activation = activations,
                           loss = loss,
                           optimizer = optimizers,
                           epochs = epochs,
                           batch_size = batches,
                           init = initializers)

    grid_search = GridSearchCV(estimator = model,
                               param_grid = parameters_grid)
    # Fitting our ANN to the training set
    grid_search = grid_search.fit(X_train_scaled, 
                                  y_train_encoded, 
                                  validation_data=(X_test_scaled, y_test_encoded))
    return grid_search

In order to shorten the time taken for the process small number of epochs are used also different dimensions are passed here. This would assist in removing dimensions with poor performance and try more epochs with less batch_size.

In [None]:
# dimensions to search over
activations = ['relu','tanh','sigmoid']
loss = ['categorical_crossentropy','mean_squared_error',"binary_crossentropy"]
optimizers = ['adam', 'rmsprop','sgd']
initializers = ['glorot_uniform', 'normal', 'uniform']
batches = [10, 32, 100]
epochs = [5, 10, 15]

# calling funstion
grid_search_01 = dimention_function(activations,loss,optimizers,initializers,batches,epochs)

#### Summarize results

here we are defining a function to summarize the results and create a dataframe with the results

In [None]:
def summarize_results(best_parameters,best_accuracy,grid_search):
    print("_"*110,'\n')
    print('Best Parameters after tuning: {}'.format(best_parameters))
    print('Best Accuracy after tuning: {}'.format(best_accuracy))
    print('\n',"_"*110,'\n')
    means = grid_search.cv_results_['mean_test_score']
    stds = grid_search.cv_results_['std_test_score']
    params = grid_search.cv_results_['params']
    for mean, stdev, param in zip(means, stds, params):
        print("%f (%f) with: %r" % (mean, stdev, param))
    print("_"*110,'\n')
    data_f = pd.DataFrame(params)
    data_f['Mean'] = means
    data_f['Std. Dev'] = stds
    return data_f

We can get the best selection of parameters using best_params from the grid search object. Likewise we use the best_score_ to get the best score.

In [None]:
best_parameters_01 = grid_search_01.best_params_
best_accuracy_01 = grid_search_01.best_score_

data_f_01 = summarize_results(best_parameters_01,
                              best_accuracy_01,
                              grid_search_01)

Lets make a plot

In [None]:
def gridSearch_plot(data_f):
    plt.figure(figsize=(17,10))
    data_f.plot(x='Std. Dev' )
    plt.legend(bbox_to_anchor=(0., 1.02, 1., .102),
               loc='lower left', 
               ncol=2, 
               mode="expand", 
               borderaxespad=0.)
    plt.show()

Lets visualize the plot

In [None]:
gridSearch_plot(data_f_01)

#### Lets run with the same dimensions to see whether its giving the same result or not

In [None]:
# dimensions to search over
activations = ['relu','tanh','sigmoid']
loss = ['categorical_crossentropy','mean_squared_error',"binary_crossentropy"]
optimizers = ['adam', 'rmsprop','sgd']
initializers = ['glorot_uniform', 'normal', 'uniform']
batches = [10, 32, 100]
epochs = [5, 10, 15]

# calling funstion
grid_search_01_second = dimention_function(activations,
                                           loss,
                                           optimizers,
                                           initializers,
                                           batches,
                                           epochs)

We can get the best selection of parameters using best_params from the grid search object. Likewise we use the best_score_ to get the best score.

In [None]:
best_parameters_01_second = grid_search_01_second.best_params_
best_accuracy_01_second = grid_search_01_second.best_score_

data_f_01_second = summarize_results(best_parameters_01_second,
                                     best_accuracy_01_second,
                                     grid_search_01_second)

Lets visualize the plot

In [None]:
gridSearch_plot(data_f_01_second)

This time we added one more activations dimension and two more loss dimension to find the best one. I am trying with batch size 10 and epochs 15 because it helps to finish our process bit fast.

In [None]:
# dimensions to search over
activations = ['relu','tanh','sigmoid','softmax']
loss = ['categorical_crossentropy','mean_squared_error',"binary_crossentropy",
        'sparse_categorical_crossentropy','kullback_leibler_divergence']
opt = keras.optimizers.Adam(learning_rate=0.01)
optimizers = ['adam', 'rmsprop','sgd',opt]
initializers = ['glorot_uniform', 'normal', 'uniform','zeros','random_normal']
batches = [10]
epochs = [15]

# calling funstion
grid_search_01_third = dimention_function(activations,
                                          loss,
                                          optimizers,
                                          initializers,
                                          batches,
                                          epochs)

In [None]:
best_parameters_01_third = grid_search_01_third.best_params_
best_accuracy_01_third = grid_search_01_third.best_score_

data_f_01_third = summarize_results(best_parameters_01_third, 
                                    best_accuracy_01_third, 
                                    grid_search_01_third)

Lets visualize the plot

In [None]:
gridSearch_plot(data_f_01_third)

Lets print Best Parameters and Accuracy

In [None]:
print('\n',"_"*110,'\n')
print('Best Parameters_01 after tuning: {}'.format(best_parameters_01))
print('Best Accuracy_01 after tuning: {}'.format(best_accuracy_01))
print('\n',"_"*110,'\n')
print('Best Parameters_01_second after tuning: {}'.format(best_parameters_01_second))
print('Best Accuracy_01_second after tuning: {}'.format(best_accuracy_01_second))
print('\n',"_"*110,'\n')
print('Best Parameters_01_third after tuning: {}'.format(best_parameters_01_third))
print('Best Accuracy_01_third after tuning: {}'.format(best_accuracy_01_third))
print('\n',"_"*110,'\n')
print('Best Parameters_new_model after tuning: {}'.format(best_parameters_new_model))
print('Best Accuracy_new_model after tuning: {}'.format(best_accuracy_new_model))
print('\n',"_"*110,'\n')

The result above shows the best parameters in each approach which remain same except in the case of loss.

Now lets create a new model and add the dimensions(parameters) that we find as best and add loss as 'mean_squared_error' because 3 out of 4 times shows mean_squared_error as best.

In [None]:
# Tuning the ANN
def build_model_binary_crossentropy():
    # create model
    model = Sequential()
    # adding input layer and first hidden layer
    model.add(Dense(units = 32, kernel_initializer = 'glorot_uniform', activation = 'tanh', input_dim = 9))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 24, kernel_initializer ='glorot_uniform', activation = 'tanh'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 16, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 8, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding BatchNormalization
    model.add(BatchNormalization())
    # adding output layer
    model.add(Dense(units = 6, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
    model.summary()
    # compile model
    model.compile(optimizer = 'rmsprop', loss = 'binary_crossentropy', metrics = ['accuracy'])
    return model

In [None]:
new_model_binary_crossentropy = KerasClassifier(build_fn = build_model_binary_crossentropy, 
                                                epochs=500, 
                                                batch_size = 10)
new_model_binary_crossentropy_history = new_model_binary_crossentropy.fit(X_train_scaled, 
                                                                          y_train_encoded, 
                                                                          validation_data=(X_test_scaled, y_test_encoded))

In [None]:
print("Training set accuracy (loss dimension 'binary_crossentropy'): ", new_model_binary_crossentropy_history.history.get('accuracy')[-1])
print("Test set accuracy (loss dimension 'binary_crossentropy'): ", new_model_binary_crossentropy_history.history.get('val_accuracy')[-1])
print('\n')

now we can try with the other loss dimension ('mean_squared_error') to see which one is more fit with our model

In [None]:
# Tuning the ANN
def build_model_mean_squared_error():
    # create model
    model = Sequential()
    # adding input layer and first hidden layer
    model.add(Dense(units = 32, kernel_initializer = 'glorot_uniform', activation = 'tanh', input_dim = 9))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 24, kernel_initializer ='glorot_uniform', activation = 'tanh'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 16, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # adding layer
    model.add(Dense(units = 8, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
    # adding dropout layer
    model.add(Dropout(0.1))
    # Adding BatchNormalization
    model.add(BatchNormalization())
    # adding output layer
    model.add(Dense(units = 6, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
    model.summary()
    # compile model
    model.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['accuracy'])
    return model

In [None]:
new_model_mean_squared_error = KerasClassifier(build_fn = build_model_mean_squared_error, 
                                               epochs=500, 
                                               batch_size = 10)
new_model_mean_squared_error_history = new_model_mean_squared_error.fit(X_train_scaled, 
                                                                        y_train_encoded, 
                                                                        validation_data=(X_test_scaled, y_test_encoded))

Lets print Training, Test Accuracy and Loss

In [None]:
print("Training set accuracy (loss dimension'mean_squared_error'): ", new_model_mean_squared_error_history.history.get('accuracy')[-1])
print("Test set accuracy (loss dimension'mean_squared_error'): ", new_model_mean_squared_error_history.history.get('val_accuracy')[-1])
print("Training set loss (loss dimension'mean_squared_error'): ", new_model_mean_squared_error_history.history.get('loss')[-1])
print("Test set loss (loss dimension'mean_squared_error'): ", new_model_mean_squared_error_history.history.get('val_loss')[-1])

print('\n')

Lets visualize Training, Test Accuracy and Loss

In [None]:
plot_acc_lss(new_model_binary_crossentropy_history,
             new_model_mean_squared_error_history,
             "Accuracy (loss dimension 'binary_crossentropy')",
             "Loss (loss dimension 'binary_crossentropy')",
             "Accuracy (loss dimension 'mean_squared_error')",
             "Loss (loss dimension 'mean_squared_error')",
             "Affects by loss function")

Lets visualize to get a clear understanding of each Training, Test Accuracy and Loss of both model.

In [None]:
fig, ax = plt.subplots(2,2, figsize=(25,15))

# Training Accuracy
ax[0, 0].plot(new_model_binary_crossentropy_history.history['accuracy'], label='binary_crossentropy')
ax[0, 0].plot(new_model_mean_squared_error_history.history['accuracy'], label='mean_squared_error')
ax[0, 0].set_title('Training Accuracy\n\n')
ax[0, 0].set_xlabel('Epoch', fontsize=18)
ax[0, 0].set_ylabel('Accuracy', fontsize=18)
ax[0, 0].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

# Validation accurarcy
ax[0, 1].plot(new_model_binary_crossentropy_history.history['val_accuracy'], label='binary_crossentropy', linestyle='--')
ax[0, 1].plot(new_model_mean_squared_error_history.history['val_accuracy'], label='mean_squared_error', linestyle='--')
ax[0, 1].set_title('Validation Accurarcy\n\n')
ax[0, 1].set_xlabel('Epoch', fontsize=18)
ax[0, 1].set_ylabel('Accuracy', fontsize=18)
ax[0, 1].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

# Training Loss
ax[1, 0].plot(new_model_binary_crossentropy_history.history['loss'], label='binary_crossentropy')
ax[1, 0].plot(new_model_mean_squared_error_history.history['loss'], label='mean_squared_error')
ax[1, 0].set_title('Training Loss\n\n')
ax[1, 0].set_xlabel('Epoch', fontsize= 18)
ax[1, 0].set_ylabel('Loss', fontsize= 18)
ax[1, 0].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

# Validation Loss
ax[1, 1].plot(new_model_binary_crossentropy_history.history['val_loss'], label='binary_crossentropy', linestyle='--')
ax[1, 1].plot(new_model_mean_squared_error_history.history['val_loss'], label='mean_squared_error', linestyle='--')
ax[1, 1].set_title('Validation Loss\n\n')
ax[1, 1].set_xlabel('Epoch', fontsize=18)
ax[1, 1].set_ylabel('Loss', fontsize=18)
ax[1, 1].legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', ncol=2, mode="expand", borderaxespad=0.)

plt.subplots_adjust(left=0.3,
                    bottom=0.3, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.2, 
                    hspace=0.9)

plt.show()

From this its clear that mean_squared_error is better than binary_crossentropy. 
In order to make sure that the trained model has no outlier predictions with too many errors MSE is very effective.

## Step 20

### Creating our final model

In [None]:
# Initialising Model
final_model = Sequential()
# Adding the input layer and the first hidden layer
final_model.add(Dense(units = 24, input_shape=(9,), kernel_initializer = 'glorot_uniform', activation = 'tanh'))
# adding dropout layer
final_model.add(Dropout(0.1))
# Adding the second hidden layer
final_model.add(Dense(units = 20, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
# adding dropout layer
final_model.add(Dropout(0.1))
# Adding hidden layer
final_model.add(Dense(units = 14, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
# adding dropout layer
final_model.add(Dropout(0.1))
# Adding hidden layer
final_model.add(Dense(units = 8, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
# adding dropout layer
final_model.add(Dropout(0.1))
# Adding BatchNormalization
#final_model.add(BatchNormalization())
# Adding the output layer
final_model.add(Dense(units = 6, kernel_initializer = 'glorot_uniform', activation = 'tanh'))
# summary
final_model.summary()


Lets use 1500 epochs and batch_size 10

In [None]:
# compile model
final_model.compile(optimizer = 'rmsprop', loss = 'mean_squared_error' , metrics = ['accuracy'])
final_model_history = final_model.fit(X_train_scaled,
                                      y_train_encoded,
                                      validation_data=(X_test_scaled, y_test_encoded),
                                      batch_size = 10,
                                      epochs=500)


In [None]:
print('\n',"_"*110,'\n')
print("Training set - final_Model: ", final_model_history.history.get('accuracy')[-1])
print("Test set - final_Model: ", final_model_history.history.get('val_accuracy')[-1])
print('\n',"_"*110)

##### Checking the final loss and accuracy

Calculating model accuracy

In [None]:
print("_"*110)
print('\nModel\n')
final_loss, final_accuracy = final_model.evaluate(X_test_scaled, y_test_encoded)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))


In [None]:
plot_acc_lss_oneTry(final_model_history,"Final Model Accuracy and loss")

The above visualisation showing our final model training, testing accuracy and training, testing loss. Training accuracy is continuously increasing but testing accuracy is slitely decreasing after 300 epochs. Training loss is continuously decreasing but testing loss is slitly increasing after 100 epochs. 

In [None]:
glasses = ['Building windows\n float processed', 'Building windows\n non float processed', 'Vehicle windows\n float processed', 'Containers', 'Tableware', 'Headlamps']
Y_true = np.argmax(y_test_encoded, axis=1)
# Model 01
Y_pred = final_model.predict(X_test_scaled)

Y_pred = np.argmax(Y_pred, axis=1)

cm = confusion_matrix(Y_true, Y_pred)
plt.figure(figsize=(20, 10))
# increasing font size
sns.set(font_scale=1.5)
ax = sns.heatmap(cm, cmap="RdYlGn" , annot=True, square=True, xticklabels=glasses, yticklabels=glasses,linewidths=.5)
ax.set_ylabel('Actual', fontsize=20)
ax.set_xlabel('Predicted', fontsize=20)
plt.title('Model Accuracy\n', fontsize=30)
plt.show()
print('\n',"_"*110,'\n')

Here we can see we have a lot more accurate prediction than out first model. we have high accuracy in Building windows float processed, Building windows non float processed and Headlamps and the least accurate predictions are in Vehicle windows float processed and Containers(No accurate prediction).

lets print first model accuracy and last model accuracy

In [None]:
print("_"*110)
print('\n****** First Model *******\n')
final_loss_fm1, final_accuracy_fm1 = first_model.evaluate(X_test_scaled, y_test_encoded)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss_fm1, final_accuracy_fm1))
print("_"*110)
print('\n****** Last Model *******\n')
final_loss, final_accuracy = final_model.evaluate(X_test_scaled, y_test_encoded)
print('Final Loss: {}, Final Accuracy: {}'.format(final_loss, final_accuracy))
print("_"*110)

Here we can see we improved our model a lot

Here I tried many different ways of approches to improve our model and we success in that process. we can clearly see the difference between out 1st and final model accuracy and loss.

# Refrance

tutorialspoint, 2021. Seaborn - Visualizing Pairwise Relationship - Tutorialspoint [WWW Document]. URL https://www.tutorialspoint.com/seaborn/seaborn_visualizing_pairwise_relationship.htm (accessed 5.15.21).

ankthon, 2020. Find duplicate rows in a Dataframe based on all or selected columns. GeeksforGeeks. URL https://www.geeksforgeeks.org/find-duplicate-rows-in-a-dataframe-based-on-all-or-selected-columns/ (accessed 5.18.21).

Singh, R., 2020. Exploratory Data Analysis(EDA) from Scratch | With Pythin Implementation. Analytics Vidhya. URL https://www.analyticsvidhya.com/blog/2020/08/exploratory-data-analysiseda-from-scratch-in-python/ (accessed 5.18.21).

Patil, P., 2018. What is Exploratory Data Analysis? [WWW Document]. Medium. URL https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15 (accessed 5.18.21).

seaborn, 2020. seaborn.violinplot — seaborn 0.11.1 documentation [WWW Document]. URL https://seaborn.pydata.org/generated/seaborn.violinplot.html (accessed 5.18.21).

carladasilvamatos, 2020. How to customize a heat map with seaborn [WWW Document]. Carla da Silva Matos. URL https://www.carladasilvamatos.com/blog/2019/12/25/lnn5xyodn2kv4i0o2mg9agdw07qgo0 (accessed 5.18.21).

Kington, J., 2012. python - Improve subplot size/spacing with many subplots in matplotlib [WWW Document]. Stack Overflow. URL https://stackoverflow.com/questions/6541123/improve-subplot-size-spacing-with-many-subplots-in-matplotlib (accessed 5.19.21)

Lynn, S. (2021) Bar Plots in Python using Pandas DataFrames | Shane Lynn. Available at: https://www.shanelynn.ie/bar-plots-in-python-using-pandas-dataframes/ (Accessed: 24 May 2021).

Team, K. (2021) Keras documentation: Layer activation functions. Available at: https://keras.io/api/layers/activations/ (Accessed: 26 May 2021).

REDOC-ER (2021) ‘EDA: Exploratory Data Analysis | Introduction to Exploratory Data Analysis’, Analytics Vidhya, 12 February. Available at: https://www.analyticsvidhya.com/blog/2021/02/introduction-to-exploratory-data-analysis-eda/ (Accessed: 27 May 2021).

ashwinsharmap (2020) ‘StandardScaler, MinMaxScaler and RobustScaler techniques - ML’, GeeksforGeeks, 15 July. Available at: https://www.geeksforgeeks.org/standardscaler-minmaxscaler-and-robustscaler-techniques-ml/ (Accessed: 28 May 2021).

Coder’s Digest (2020) Outlier detection techniques(python)| how to avoid outliers without deleting it. Available at: https://www.youtube.com/watch?v=NPibqifVAW4 (Accessed: 28 May 2021).

Dan, A. (2020) Exploratory Data Analysis (EDA) in Python, Medium. Available at: https://medium.com/@atanudan/exploratory-data-analysis-eda-in-python-893f963cc0c0 (Accessed: 30 May 2021).

Brownlee, J. (2016) ‘Use Keras Deep Learning Models with Scikit-Learn in Python’, Machine Learning Mastery, 30 May. Available at: https://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/ (Accessed: 30 May 2021).
Mwiti, D. (2021) Introduction to Deep Learning with Keras, Medium. Available at: https://heartbeat.fritz.ai/introduction-to-deep-learning-with-keras-c7c3d14e1527 (Accessed: 31 May 2021).