<a href="https://colab.research.google.com/github/Mohamed-Silaya/ML-ZAKA/blob/main/Copy_of_03_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

© 2021 Zaka AI, Inc. All Rights Reserved

#Binary and Multi-class classification
**Objective:** This notebook is comprised of two independent exercises: a multi-classication with Iris flower data and a binary classification with sonar data. The objective of the first exercise is to prepare data for a multiclassification model and training it. For the second, We will train and evaluate a binary classification model and learn how to apply standardization on a dataset and create a pipeline for evaluation of models.

In [1]:
# clone git repo
!git clone https://github.com/zaka-ai/intro2dl.git

# change directory
%cd intro2dl/data/

Cloning into 'intro2dl'...
remote: Enumerating objects: 16, done.[K
remote: Counting objects: 100% (16/16), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 16 (delta 1), reused 7 (delta 0), pack-reused 0[K
Unpacking objects: 100% (16/16), done.
/content/intro2dl/data


## Multi-class classification with Iris Dataset

### 1. Load data

In this notebook, we are going to use the [**Iris flower** dataset](https://archive.ics.uci.edu/ml/datasets/Iris). This is another standard machine learning dataset from the UCI Machine Learning repository. Each instance describes the properties of an observed flower measurements and the output variable is specific iris species.

This is a multi-class classification problem, meaning that there are more than two classes to be predicted, in fact there are three flower species.

The variables can be summarized as follows:

**Input Variables (X):**


1. Sepal length in cm
2. Sepal width in cm
3. Petal length in cm
4. Petal width in cm

**Output Variable (Y):**

*   Class:
 - Iris Setosa
 - Iris Versicolour
 - Iris Virginica




In [2]:
from pandas import read_csv

# load dataset
dataframe = read_csv("iris.csv", header=None)
dataset = dataframe.values

# split X and Y features
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]


### 2. Encode the output variable


In [6]:
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = to_categorical(encoded_Y)

### 3. Define Keras Model

Create a Keras Sequential model that has 1 hidden layers, with the `relu` activation function. 

We should define a `create_model()` funtion that will create the model, compile it and return it.

In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# define baseline model
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(10, input_dim=(4), activation='relu'))
	model.add(Dense(3, activation='softmax'))
  
	# Compile model
	model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
  
	return model

### 4. Train Model

Let's train the model for 20 epochs with batch size equals to 5.

In [8]:
model = create_model()
model.fit(X, dummy_y, epochs=20, batch_size=5)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f1a331248d0>

## Binary Classification with Sonar Dataset

### 1. Load dataset

The dataset we will use in this tutorial is the [Sonar dataset](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)).

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

It is a well-understood dataset. All of the variables are continuous and generally in the range of 0 to 1. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

In [9]:
# Binary Classification with Sonar Dataset: Baseline

from pandas import read_csv
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values

# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

### 2. Encode output variable


In [10]:
from sklearn.preprocessing import LabelEncoder

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

### 3. Define Keras Model

Create a Keras model with 1 hidden layer of size 60 and 1 output layer. The layers should have a 'normal' initialization of weights.

Compile the model with adam optimizer.

We should define a `baseline_model()` funtion that will create the model, compile it and return it.

In [11]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def create_baseline():
	#b
	# create model
	model = Sequential()
	model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
 
	#b
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

	return model

### 4. Evaluate model

Evaluate the model using stratified cross validation in the scikit-learn framework. Number of splits should be 10. 

In [12]:
! pip install scikeras
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold

# evaluate model with dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=15)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikeras
  Downloading scikeras-0.9.0-py3-none-any.whl (27 kB)
Installing collected packages: scikeras
Successfully installed scikeras-0.9.0
Baseline: 79.74% (7.56%)


## Apply Standardization on Dataset

An effective data preparation scheme for tabular data when building neural network models is **standardization**. This is where the data is rescaled such that the mean value for each attribute is 0 and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions whilst normalizing the central tendencies for each attribute.

We can use scikit-learn to perform the standardization of our Sonar dataset using the `StandardScaler` class.

## Create a pipeline

The Scikit-learn pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, we can define a pipeline with the StandardScaler followed by our neural network model.

In [13]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

### Evaluate model

In [14]:
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=15)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Standardized: 85.10% (6.58%)
