<a href="https://colab.research.google.com/github/Ash100/Biopython/blob/main/Dealing_Multiclass_classification_problem_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Multiclass Classification Problem
I am **Dr. Ashfaq Ahmad**, and this notebook is created for teaching and research purposes. Refering to the people working in the field of Biology, I have tried my level best to keep it as simple as possible. For Detailed instruction and understandings, please watch a video tutorial on **https://www.youtube.com/@Bioinformaticsinsights**

This notebook is based on the book **Deep Learning with Python** by Jason Brownlee.

##Aims and Learning Goals
In this project tutorial you will discover the usage of Keras to develop and evaluate neural network models for multiclass classification problems. After completing this step-by-step tutorial,
you will know:

**1.How to load data from CSV and make it available to Keras.**

**2.How to prepare multiclass classification data for modeling with neural networks.**

**3.How to evaluate Keras neural network models with scikit-learn.**

In this tutorial we will use **the iris flowers dataset**. It can be downloaded from UC Irvine Machine Learning repository or Kaggle. It contains four input variables (in neumerical quantities) for
1. Sepal length in centimeters.
2. Sepal width in centimeters.
3. Petal length in centimeters.
4. Petal width in centimeters.
5. Class.
This is a multiclass classification problem, meaning that there are more than two classes to be predicted, in fact there are three flower species. This is an important type of problem on which to practice with neural networks because the three class value require specialized handling.

In [None]:
#Install Keras and Scikit
!pip install --upgrade keras
!pip install --upgrade scikit_learn


In [None]:
#Install as per your need
!pip install scikeras[tensorflow-cpu]

In [None]:
#Install as per your need
!pip install scikeras[tensorflow]      # gpu compute platform

In [1]:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Initialize seed number for reproduction

In [2]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

Let's load the downloaded dataset

In [5]:
# load dataset
dataframe = pandas.read_csv("/content/sample_data/IRIS_1.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]

**Encode variables in the output**
If you see the output column, it contains three variable;
**1.**Iris-setosa, **2.**Iris-versicolor, and **3.**Iris-virginica.
When modeling multiclass classification problems using neural networks, it is good practice to reshape the output attribute from a vector that contains values for each class value to be a matrix with a boolean for each class
value and whether or not a given instance has that class value or not. This is called **one hot encoding** or **creating dummy variables** from a categorical variable.

An example of the Binary Matrix
*Iris-setosa*,  *Iris-versicolor*, *Iris-virginica*

1,                0,              0

0,                1,              0

0,                0,              1

We can do this by first encoding the strings consistently to integers using the scikit-learn class *LabelEncoder()*. Then convert the vector of integers to a one hot encoding using the Keras function to *categorical()*.

In [7]:
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

###Define Model
Below is a function that will create a baseline neural network for the iris classification problem. Things are pretty much similar like the previous lessons.
We will create a simple fully connected network with one hidden layer that contains 4 neurons, the same number of inputs (it could be any number of neurons). The hidden layer uses a rectifier activation function (relu) which is a good practice. Because we used a one-hot encoding for
our iris dataset, the output layer must create 3 output values, one for each class. The output
value with the largest value will be taken as the class predicted by the model.

**Network Topology**

4 inputs -> [4 hidden nodes] -> 3 outputs

We will use a sigmoid activation function in the output layer. This is to ensure the output values are in the range of 0 and 1 and may be used as predicted probabilities. Finally, the network uses the ecient ADAM gradient descent optimization algorithm with a logarithmic loss function, which is called categorical crossentropy in Keras.

In [15]:
# Define baseline model
def baseline_model():
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense
    from tensorflow.keras.optimizers import Adam
    from tensorflow.keras.losses import categorical_crossentropy
    from tensorflow.keras.activations import relu, sigmoid
    # Create model
    model = Sequential()
    model.add(Dense(4, input_dim=4, kernel_initializer='normal', activation='relu'))
    model.add(Dense(3, kernel_initializer='normal', activation='sigmoid'))
    # Compile model
    model.compile(loss=categorical_crossentropy, optimizer=Adam(), metrics=['accuracy'])
    return model

In [None]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

##Model Evaluation with K-fold
We can now evaluate the neural network model on our training data. The scikit-learn library has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating machine learning models is k-fold cross validation. First we can define the model evaluation procedure. Here, we set the number of folds to be 10 (an excellent default) and to shuffle the data before partitioning it.

In [17]:
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)


Now we can evaluate our model (estimator) on our dataset (X and dummy y) using a 10-fold cross validation procedure (kfold).

In [None]:
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

It has generated estimated accuracy and standard deviation with a resonable performance on the Iris Dataset