<a href="https://colab.research.google.com/github/Rahafzsh/DL-Assignment/blob/main/DL_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ASSIGNMENT 1: Iris Data Classification (Using TensorFlow)
### Prepared by [Mustafa Youldash, Ph.D.](https://github.com/youldash)

### The Iris Data Set (i.e., Problem Set)

The [Iris data set](https://archive.ics.uci.edu/ml/datasets/Iris/) is a popular data set for classification tasks in machine learning. It consists of 150 samples of iris plants, with each sample consisting of four features (sepal length, sepal width, petal length, and petal width) and a target label indicating the species of the iris plant (setosa, versicolor, or virginica).

To solve the assignment using the Iris data set, students would need to preprocess the data, develop and train a Deep Learning model, and evaluate the performance of the model. Preprocessing the data might involve scaling the features and splitting the data into training and validation sets. Developing and training the model could involve selecting an appropriate architecture and optimization algorithm, setting the learning rate, and choosing the number of epochs. Evaluating the performance of the model could involve using metrics such as accuracy, precision, and recall to assess the model's ability to classify the iris plants correctly.

In [None]:
# What version of Python do you currently have?
import sys


print(sys.version)

In [None]:
# Do you have TensorFlow installed on your system?
import tensorflow as tf


print(tf.__version__)

## Helpful Functions for Keras and TensorFlow

In [None]:
#from util import helper

## Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a process of analyzing and summarizing a data set in order to understand the underlying structure and relationships within the data. EDA is an important step in the data science process, as it allows you to identify patterns, trends, and anomalies in the data that may not be immediately apparent.

There are several benefits of performing EDA for Deep Learning:

- EDA helps you understand the data: By performing EDA, you can get a better understanding of the data you are working with, including the distribution of the data, the relationships between different features, and any missing or corrupted values.
- EDA can identify potential problems: EDA can help you identify potential problems with the data, such as missing values or outliers, which could impact the performance of your Deep Learning model.
- EDA can inform model selection: EDA can help you understand the characteristics of the data, which can inform your choice of Deep Learning model. For example, if the data is highly non-linear, you may want to consider using a model that is capable of capturing complex relationships, such as a neural network.
- EDA can improve model performance: By understanding the underlying structure of the data, you can better tune the hyperparameters of your Deep Learning model, which can lead to improved performance.

Overall, EDA is an important step in the Deep Learning process, as it helps you understand the data and identify potential issues that could impact the performance of your model. EDA is open-ended, and it is up to you to decide how to look at different ways to slice and dice your data.

## Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import keras
from keras import layers
from matplotlib import pyplot
from keras.models import Sequential, Model
from tensorflow.keras import models, layers, optimizers, utils
import sklearn
from sklearn.preprocessing import StandardScaler


In [None]:
import pandas as pd
#from pandas_profiling.profile_report import ProfileReport

iris = pd.read_csv('IRIS.csv', na_values=['NA','?'])
iris

In [None]:
# Hint: use a DataFrame for both EDA and model development.
iris.shape

In [None]:
# Hint: use a DataFrame for both EDA and model development.
iris.head()

In [None]:
iris.groupby("species").size()

In [None]:
iris.species.unique()

In [None]:
iris.describe()

In [None]:
iris.hist()

In [None]:
from pandas.plotting import scatter_matrix

scatter_matrix(iris)

### Data Preprocessing:

In [None]:
arr = iris.values

features= arr[:,:-1]
print(features.shape)

labels=arr[:,-1]
print(labels.shape)

In [None]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
labels=encoder.fit_transform(labels)

print(set(labels))

In [None]:
newlabels=pd.get_dummies(labels).values
newlabels[:5]

In [None]:
data=np.asarray(features).astype("float32")

## Model Classification

I just split our dataset into 70% for training and 30% for testing after preprocessing and Exploratory Data Analysis (EDA).

In [None]:
from sklearn.model_selection import train_test_split

train_data, valid_data, train_labels, valid_labels= train_test_split(data, newlabels, test_size=0.30, random_state=42)

#Scale the features
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
valid_data = scaler.transform(valid_data)

In [None]:
print("Train Data Shape: ", train_data.shape)
print("Train Labels Shape: ", train_labels.shape)
print("Validation Data Shape: ", valid_data.shape)
print("Validation Labels Shape: ", valid_labels.shape)


## Sequential Model

I built a Keras sequential model with 4 layers. Firstly, in the input layer, we have 16 neurons, four neurons per feature, and the rest are hidden neurons. Secondly, in the hidden layers, we have two; one consists of 16 neurons, and the other has 8 neurons. Finally, the output layer has three neurons, each one for the classes we have.


I compile the model utilizing **Adam** Optimizer, **cotegoral crossentropy** for loss function, and **accuracy** metrics.

In [None]:
# Define, and build your model.
np.random.seed(42)
tf.random.set_seed(42)

model=models.Sequential()
model.add(layers.Dense(16, activation="relu", input_shape=(4,), kernel_initializer='normal')) # input layer 
model.add(layers.Dense(16, activation="relu", kernel_initializer='normal')) # 1st hidden layer 
model.add(layers.Dense(8, activation="relu", kernel_initializer='normal')) # 2nd hidden layer 
model.add(layers.Dense(3, activation="softmax")) # output layer 

# Compile the model.
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']) 

## Model Training

I train the model in 60 epochs, with 10 as the batch size, and validation of the model is done by a valid set.

In [None]:
history=model.fit(train_data,train_labels,validation_data=(valid_data, valid_labels), epochs=60, batch_size=10)

## Model Evaluation

After we complete the training, we can now  evaluate  our model. 

In [None]:
loss, acc= model.evaluate(valid_data,valid_labels)

In [None]:
print("Test Accuracy: {}%".format(acc*100))
print(f'Test loss: {loss:.4f}')

In [None]:
from sklearn import metrics

# Evaluate the success rate using accuracy.
prediction = model.predict(valid_data)

y_test_class = np.argmax(valid_labels, axis=1)
y_pred_class = np.argmax(prediction, axis=1)

print("\nConfusion Matrix:\n", metrics.confusion_matrix(y_test_class,y_pred_class))
print("\nResult Report:\n", metrics.classification_report(y_test_class,y_pred_class))


test_scores = model.evaluate(valid_data, valid_labels, verbose=2)
print("\nTest loss:\n", test_scores[0])
print("\nTest accuracy:\n", test_scores[1])

In [None]:
model.summary()

# Reflection 