<a href="https://colab.research.google.com/github/NickPetrilli/AI/blob/main/lab06_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Artificial Intelligence
Lab 06

By R. Coleman, Ph.D., Nick Petrilli

---
The goal of this lab is to build a model for learning the iris data.

Study this notebook. Much of the code is already here. The tasks are to complete the *TODO* steps (below).

In the last step, Step 12, the output should look like this [example](https://drive.google.com/file/d/11ouY_YKluC7tfxyqi15eqkXDOKSXjVkx/view?usp=sharing) output in formatting which is a best practice.

The point is to make the output relatively easily discernible. The actual contents of the output will vary with each student and run because of the randomization of the interneuron weights.
Thus, don't try to get them to look *exactly* like the example--that won't be possible. However, take note of the justification.
If you choose to deviate from the example in formatting, have a good reason for doing so, although I have not thought of one.

At the bottom of this notebook you will find the deliver instructions.

In [None]:
# Step 1: Import the Pandas library
import pandas as pd

# Step 2: Read in the data to a DataFrame using the CSV reader method
url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pd.read_csv(url)

In [None]:
# Step 2: Randomize the rows of the dataset since the data are typically ordered by species.
from sklearn.utils import shuffle
df = shuffle(df, random_state=42).reset_index(drop=True)

In [None]:
# Step 3: Normalize numeric columns
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

# These are the numberic columns
numeric_columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

# Each column gets transformed independently to their unique range as a numpy array.
normalized_columns = scaler.fit_transform(df[numeric_columns])

# Convert the numpy array to a pandas dataframe
df_normalized = pd.DataFrame(normalized_columns)
df_normalized.head()

Unnamed: 0,0,1,2,3
0,0.5,0.333333,0.627119,0.458333
1,0.388889,0.75,0.118644,0.083333
2,0.944444,0.25,1.0,0.916667
3,0.472222,0.375,0.59322,0.583333
4,0.694444,0.333333,0.644068,0.541667


In [None]:
# Step 4: One-hot encode the 'species' column
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)

# One-encode and also get the unbique categories
species_encoded = encoder.fit_transform(df[['species']])

# The categories are stored as nyumpy array of with the first element being an array
species_categories = encoder.categories_[0]

df_species_encoded = pd.DataFrame(species_encoded, columns=species_categories)
df_species_encoded.head()

Unnamed: 0,setosa,versicolor,virginica
0,0.0,1.0,0.0
1,1.0,0.0,0.0
2,0.0,0.0,1.0
3,0.0,1.0,0.0
4,0.0,1.0,0.0


## At this point *df_normalized* and *df_species_encoded* are READY for MLP
See Lab 4 for details to this point.

In [None]:
# Step 5. Split the normalized/encoded data into training and testing sets.
from sklearn.model_selection import train_test_split
X_train_normalized, X_test_normalized, y_train_encoded, y_test_encoded = train_test_split(df_normalized, df_species_encoded, test_size=0.2, random_state=44)

In [None]:
X_train_normalized.shape

(120, 4)

In [None]:
y_train_encoded.shape

(120, 3)

In [None]:
# Defines the callback to get diagnostic output.
from tensorflow import keras

class DiagnosticCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        if (epoch + 1) % 100 == 0:
            print(f"Epoch {epoch + 1}: loss = {logs['loss']:.4f}, accuracy = {logs['accuracy']:.4f}")

In [None]:
# Step 6.
# Instantiate a dense model with an input layer, one neuron per feature,
# one hidden layer with 8 nodes and RELU activation, and finally, an output layer
# of 3 nodes (corresponding to the one-hot encoding) and softmax activation.

# Import the modules from Keras which we can get from Keras or Keras of tensorflow.
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.initializers import Constant

model = Sequential()

model.add(Dense(4, input_dim=4, activation='relu', bias_initializer=Constant(1.0)))
model.add(Dense(8, activation='relu'))
model.add(Dense(3, activation='softmax'))

In [None]:
# Step 7.
# Compile the model categorical crossentropy loss, adam optimizer, and accuracy metric
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
# Step 8.
# Train the model with a batch size of 8 and the diagnostic callback.
# Use just one epoch here to test the failure mode of the MLP. You can increase
# this later to improve the accuracy.
NUM_EPOCHS = 3000
model.fit(X_train_normalized, y_train_encoded, epochs=NUM_EPOCHS, batch_size = 8, verbose=0,callbacks=[DiagnosticCallback()])

Epoch 100: loss = 0.2802, accuracy = 0.9250
Epoch 200: loss = 0.2015, accuracy = 0.9667
Epoch 300: loss = 0.1123, accuracy = 0.9667
Epoch 400: loss = 0.0640, accuracy = 0.9833
Epoch 500: loss = 0.0450, accuracy = 0.9917
Epoch 600: loss = 0.0358, accuracy = 0.9917
Epoch 700: loss = 0.0313, accuracy = 0.9917
Epoch 800: loss = 0.0280, accuracy = 0.9917
Epoch 900: loss = 0.0260, accuracy = 0.9917
Epoch 1000: loss = 0.0245, accuracy = 0.9917
Epoch 1100: loss = 0.0231, accuracy = 0.9917
Epoch 1200: loss = 0.0236, accuracy = 0.9917
Epoch 1300: loss = 0.0224, accuracy = 0.9917
Epoch 1400: loss = 0.0214, accuracy = 0.9917
Epoch 1500: loss = 0.0202, accuracy = 0.9917
Epoch 1600: loss = 0.0198, accuracy = 0.9917
Epoch 1700: loss = 0.0197, accuracy = 0.9917
Epoch 1800: loss = 0.0195, accuracy = 0.9917
Epoch 1900: loss = 0.0190, accuracy = 0.9917
Epoch 2000: loss = 0.0185, accuracy = 0.9917
Epoch 2100: loss = 0.0177, accuracy = 0.9917
Epoch 2200: loss = 0.0169, accuracy = 0.9917
Epoch 2300: loss = 

<keras.src.callbacks.History at 0x78b92f9c1b10>

In [None]:
# Step 9. Make predictions on the normalized test set.
y_pred = model.predict(X_test_normalized)
y_pred



array([[9.9999994e-01, 1.8389560e-14, 0.0000000e+00],
       [3.5350575e-15, 9.9999994e-01, 1.1030957e-17],
       [9.3625996e-11, 9.9999994e-01, 6.5775884e-20],
       [1.7503658e-17, 9.9999994e-01, 9.0658046e-15],
       [1.2740827e-28, 9.0445263e-15, 9.9999994e-01],
       [1.0286682e-26, 1.3419307e-07, 9.9999982e-01],
       [2.4674776e-11, 9.9999994e-01, 1.9931133e-21],
       [7.3334552e-30, 8.2283798e-16, 9.9999994e-01],
       [9.9999994e-01, 5.0307515e-16, 0.0000000e+00],
       [5.2144573e-09, 9.9999994e-01, 1.3776176e-20],
       [4.7279939e-15, 9.9999994e-01, 5.2558320e-18],
       [3.4507248e-19, 9.9999994e-01, 9.2980302e-12],
       [7.5661651e-22, 9.9998540e-01, 1.4587957e-05],
       [9.9999994e-01, 2.1835330e-16, 0.0000000e+00],
       [9.9999994e-01, 1.1596209e-13, 0.0000000e+00],
       [1.0291974e-20, 9.9999994e-01, 2.5700355e-12],
       [9.9999994e-01, 2.3068246e-16, 0.0000000e+00],
       [9.9999994e-01, 3.3053319e-16, 0.0000000e+00],
       [2.9191449e-19, 9.999

In [None]:
# Step 10. The argmax function returns the index of axis 1 which is largest, ie,
# an array that contains values of 0, 1, or 2.
y_pred_labels = y_pred.argmax(axis=1)
y_pred_labels

array([0, 1, 1, 1, 2, 2, 1, 2, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0,
       2, 2, 2, 2, 2, 0, 1, 2])

In [None]:
# Validate y_test encoded correctly
y_test_encoded

Unnamed: 0,setosa,versicolor,virginica
144,1.0,0.0,0.0
9,0.0,1.0,0.0
79,0.0,1.0,0.0
95,0.0,1.0,0.0
104,0.0,0.0,1.0
47,0.0,0.0,1.0
118,0.0,1.0,0.0
107,0.0,0.0,1.0
102,1.0,0.0,0.0
90,0.0,1.0,0.0


In [None]:
# Step 11. Since y_test_encoded as a DataFrame, get the values along the column
# which in this case gives the inner rows.
y_test_labels = y_test_encoded.values.argmax(axis=1)

In [None]:
# This verify the number of correct choices
import numpy as np
identical_counts = np.sum(np.equal(y_pred_labels, y_test_labels))
identical_counts

29

In [None]:
# Decode the species column and transform into a list
species_decoded = encoder.inverse_transform(y_test_encoded)
species_decoded_list = species_decoded.tolist()
print(species_decoded_list[0][0])

setosa


In [None]:
# Create the encoded prediction labels list to be used in output
y_pred_labels_encoded = []
for i in range(len(y_pred_labels)):
  if (y_pred_labels[i] == 0):
    y_pred_labels_encoded.append('setosa')
  if (y_pred_labels[i] == 1):
    y_pred_labels_encoded.append('versicolor')
  if (y_pred_labels[i] == 2):
    y_pred_labels_encoded.append('virginica')
print(y_pred_labels_encoded)

['setosa', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'versicolor', 'virginica', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'setosa', 'setosa', 'versicolor', 'setosa', 'setosa', 'versicolor', 'versicolor', 'setosa', 'setosa', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'setosa', 'versicolor', 'virginica']


In [None]:
# Step 12.
# Output the results, test by test.

# "10" is the field width with matches "versicolor" so everything lines up nicely.
print(f'{"#":>2} {"LABEL":10} {"PREDICTED":10}')

y_labels_count = len(y_pred_labels)

for i in range(y_labels_count):
  if (species_decoded[i][0] != y_pred_labels_encoded[i]):
    print(f'{i:>2} {species_decoded[i][0]:10} {y_pred_labels_encoded[i]:10} MISSED !')
  else:
    print(f'{i:>2} {species_decoded[i][0]:10} {y_pred_labels_encoded[i]:10}')

accuracy = identical_counts / y_labels_count * 100
print(f'Accuracy {identical_counts}/{y_labels_count} or {accuracy:.1f}%')



 # LABEL      PREDICTED 
 0 setosa     setosa    
 1 versicolor versicolor
 2 versicolor versicolor
 3 versicolor versicolor
 4 virginica  virginica 
 5 virginica  virginica 
 6 versicolor versicolor
 7 virginica  virginica 
 8 setosa     setosa    
 9 versicolor versicolor
10 versicolor versicolor
11 versicolor versicolor
12 virginica  versicolor MISSED !
13 setosa     setosa    
14 setosa     setosa    
15 versicolor versicolor
16 setosa     setosa    
17 setosa     setosa    
18 versicolor versicolor
19 versicolor versicolor
20 setosa     setosa    
21 setosa     setosa    
22 virginica  virginica 
23 virginica  virginica 
24 virginica  virginica 
25 virginica  virginica 
26 virginica  virginica 
27 setosa     setosa    
28 versicolor versicolor
29 virginica  virginica 
Accuracy 29/30 or 96.7%


## Deliverables

1. Share the notebook as viewable only. *Do not remove the outputs.* Copy the link and paste it into the assignment shell.
2. Complete the [submission flight checklist](https://docs.google.com/spreadsheets/u/0/d/1lgCttHGUIbCUTrd0TZIm4Nxfy8wy3jnIvNv7cUPJ-Gw/edit).
When done, export the checklist as lab04-checklist.pdf, and upload it to the assignment shell.