# ASSIGNMENT 1: Iris Data Classification (Using TensorFlow)
## Prepared by [Mustafa Youldash, Ph.D.](https://github.com/youldash)

### Student Name:
### Student ID:
### Section Number:

### The Iris Data Set (i.e., Problem Set)

The [Iris data set](https://archive.ics.uci.edu/ml/datasets/Iris/) is a popular data set for classification tasks in machine learning. It consists of 150 samples of iris plants, with each sample consisting of four features (sepal length, sepal width, petal length, and petal width) and a target label indicating the species of the iris plant (setosa, versicolor, or virginica).

To solve the assignment using the Iris data set, students would need to preprocess the data, develop and train a Deep Learning model, and evaluate the performance of the model. Preprocessing the data might involve scaling the features and splitting the data into training and validation sets. Developing and training the model could involve selecting an appropriate architecture and optimization algorithm, setting the learning rate, and choosing the number of epochs. Evaluating the performance of the model could involve using metrics such as accuracy, precision, and recall to assess the model's ability to classify the iris plants correctly.

In [2]:
# What version of Python do you currently have?
import sys


print(sys.version)

3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]


In [1]:
# Do you have TensorFlow installed on your system?
import tensorflow as tf


print(tf.__version__)

ModuleNotFoundError: No module named 'tensorflow'

## Helpful Functions for Keras and TensorFlow

In [None]:
from util import helper

## Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a process of analyzing and summarizing a data set in order to understand the underlying structure and relationships within the data. EDA is an important step in the data science process, as it allows you to identify patterns, trends, and anomalies in the data that may not be immediately apparent.

There are several benefits of performing EDA for Deep Learning:

- By performing EDA, you can get a better understanding of the data you are working with, including the distribution of the data, the relationships between different features, and any missing or corrupted values.
- EDA can help you identify potential problems with the data, such as missing values or outliers, which could impact the performance of your Deep Learning model.
- EDA can help you understand the characteristics of the data, which can inform your choice of Deep Learning model. For example, if the data is highly non-linear, you may want to consider using a model that is capable of capturing complex relationships, such as a neural network.
- By understanding the underlying structure of the data, you can better tune the hyperparameters of your Deep Learning model, which can lead to improved performance.

In the end, EDA is an important step in the Deep Learning process, as it helps you understand the data and identify potential issues that could impact the performance of your model. EDA is open-ended, and it is up to you to decide how to look at different ways to slice and dice your data.

In [None]:
import pandas as pd
import os


path = "./data/"
    
filename = os.path.join(path, "iris.csv")    
df = pd.read_csv(filename, na_values=['NA','?'])

In [None]:
# Hint: use a DataFrame for both EDA and model development.

In [None]:
# Your code goes here...

# Iris Flower Classification

In [None]:
# Imports.
import pandas as pd
import io
import requests
import numpy as np
import os

In [None]:
# File path.
path = "./data/"

# Read the data.
filename = os.path.join(path, "iris.csv")    
df = pd.read_csv(filename, na_values=['NA','?'])

# Encode text values to indexes (i.e., [1],[2],[3] for (red,green,blue) values).
species = helper.encode_text_index(df, "species")

# Convert a Pandas DataFrame to the (x,y) inputs that TensorFlow needs.
x, y = helper.to_xy(df, "species")

# Split the data into training and testing sets.
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=FIXME, random_state=42)

In [None]:
# Define, and build your model.
model = Sequential()
model.add(Dense(FIXME, input_dim=x.shape[1], kernel_initializer='normal',
                activation='FIXME')) # Hint: try different activation functions and see which one produces better results.
model.add(Dense(1, kernel_initializer='normal'))
model.add(Dense(y.shape[1],activation='FIXME'))

# Compile the model.
model.compile(
    loss='FIXME',
    optimizer='FIXME') # Hint: try different optimizers and see which one produces better results.

# Define the training callbacks.
monitor = EarlyStopping(
    monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')

# Train the model.
model.fit(
    x, y, validation_data=(x_test,y_test),
    callbacks=[monitor], verbose=2, epochs=FIXME)

In [None]:
from sklearn import metrics


# Evaluate the success rate using accuracy.
pred = FIXME

y_compare = FIXME

# Log the accuracy score.
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: {}".format(score))