<a href="https://colab.research.google.com/github/NickPetrilli/AI/blob/main/lab07_ai_KNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Lab 7
This lab demonstrates how to do classification with k-nearest neighbor classification.

By Ron Coleman, Ph.D., Nick Petrilli

Study this notebook and add code to Step 12 to generate markdown output that corresponds to the sample shown in the next to last cell for random_state=42. (The very last cell gives the deliverables.)

Here's how to go about this. After you do the code for Step 12 and you get its output, create a new text cell and copy the output of Step 12 to this new text cell. In other words, the output of Step 12 is *markdown* but it won't look formatted like the sample (random_state=42) until you paste it into a text cell. It should have the identical formatting and values.

Then once this is done, set random_state=44. Create a next text cell, put at the top, random_state=44, and below that copy in the random_state=44 output text. It should have the identical formatting, but the results will be different.

Finally, repeat this for a random sate that corresponds to the last two digits of your student id.

In total, you will have three different outputs: random_state=42, random_state=44, and random_state=(last two digits of your student id)

If you are not familiar with markdown, there are many resouces on the internet for how to write markdown. [This](https://www.w3schools.io/file/markdown-introduction/) is but one from W3Schools which you might find helpful. You can also ask [ChatGPT](https://chat.openai.com/auth/login) for guidance or even a crash course.

At the bottom of this notebook, find the deliverables.

In [None]:
# Step 1: Import the Pandas library
import pandas as pd

# Step 2: Read in the data to a DataFrame using the CSV reader method
url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pd.read_csv(url)

In [None]:
# Step 2: Randomize the rows of the dataset since the data are typically ordered by species.
from sklearn.utils import shuffle
df = shuffle(df, random_state=42).reset_index(drop=True)

In [None]:
# Step 3: Normalize numeric columns
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

# These are the numberic columns
numeric_columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

# Each column gets transformed independently to their unique range as a numpy array.
normalized_columns = scaler.fit_transform(df[numeric_columns])

# Convert the numpy array to a pandas dataframe
df_normalized = pd.DataFrame(normalized_columns)

In [None]:
# Step 4: One-hot encode the 'species' column
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)

# One-encode and also get the unbique categories
species_encoded = encoder.fit_transform(df[['species']])

# The categories are stored as nyumpy array of with the first element being an array
species_categories = encoder.categories_[0]

df_species_encoded = pd.DataFrame(species_encoded, columns=species_categories)

## At this point *df_normalized* and *df_species_encoded* READY for MLP
See Lab 4 for details to this point.

In [None]:
# Step 5: Split the dataset into training and testing sets
RANDOM_STATE = 23
from sklearn.model_selection import train_test_split
X_trains, X_tests, y_trains, y_tests = train_test_split(df_normalized, df_species_encoded, test_size=0.2, random_state=RANDOM_STATE)

In [None]:
# Step 6: get the labels as indexes of species_categories.
y_test_labels = y_tests.values.argmax(axis=1)
print(y_test_labels)

[2 1 2 1 2 1 1 2 0 1 2 0 0 2 2 2 2 0 1 0 2 1 2 2 1 1 0 2 1 1]


In [None]:
# Step 7: Create a k=1 nearest neighbors classifier
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

In [None]:
# Step 8: Train the classifier
knn.fit(X_trains, y_trains)

In [None]:
# Step 9: Make predictions on the test set
y_pred = knn.predict(X_tests)

In [None]:
# Step 10: Convert one-hot encoded predictions to class labels: axis=1 along columns
import numpy as np
y_pred_labels = np.argmax(y_pred, axis=1)
print(y_pred_labels)

[2 1 2 1 2 1 1 2 0 1 2 0 0 2 2 2 2 0 1 0 2 1 1 2 1 1 0 2 1 1]


In [None]:
# Step 11. TODO: Write the code to output the results, test by test.
# The output should be markdown which when copied to a text cell
# renders like the cell below.

import numpy as np
identical_counts = np.sum(np.equal(y_pred_labels, y_test_labels))

# Create the encoded prediction labels list to be used in output
y_pred_labels_encoded = []
for i in range(len(y_pred_labels)):
  if (y_pred_labels[i] == 0):
    y_pred_labels_encoded.append('setosa')
  if (y_pred_labels[i] == 1):
    y_pred_labels_encoded.append('versicolor')
  if (y_pred_labels[i] == 2):
    y_pred_labels_encoded.append('virginica')

# Create the encoded test labels list to be used in output
y_test_labels_encoded = []
for i in range(len(y_test_labels)):
  if (y_test_labels[i] == 0):
    y_test_labels_encoded.append('setosa')
  if (y_test_labels[i] == 1):
    y_test_labels_encoded.append('versicolor')
  if (y_test_labels[i] == 2):
    y_test_labels_encoded.append('virginica')

print(f'{"#":>2} {"LABEL":10} {"PREDICTED":10} {"OUTCOME":10}')

y_labels_count = len(y_pred_labels)

for i in range(y_labels_count):
  if (y_pred_labels_encoded[i] != y_test_labels_encoded[i]):
    print(f'{i:>2} {y_test_labels_encoded[i]:10} {y_pred_labels_encoded[i]:10} MISSED !')
  else:
    print(f'{i:>2} {y_test_labels_encoded[i]:10} {y_pred_labels_encoded[i]:10}')

accuracy = identical_counts / y_labels_count * 100
print(f'Accuracy {identical_counts}/{y_labels_count} or {accuracy:.1f}%')

 # LABEL      PREDICTED  OUTCOME   
 0 virginica  virginica 
 1 versicolor versicolor
 2 virginica  virginica 
 3 versicolor versicolor
 4 virginica  virginica 
 5 versicolor versicolor
 6 versicolor versicolor
 7 virginica  virginica 
 8 setosa     setosa    
 9 versicolor versicolor
10 virginica  virginica 
11 setosa     setosa    
12 setosa     setosa    
13 virginica  virginica 
14 virginica  virginica 
15 virginica  virginica 
16 virginica  virginica 
17 setosa     setosa    
18 versicolor versicolor
19 setosa     setosa    
20 virginica  virginica 
21 versicolor versicolor
22 virginica  versicolor MISSED !
23 virginica  virginica 
24 versicolor versicolor
25 versicolor versicolor
26 setosa     setosa    
27 virginica  virginica 
28 versicolor versicolor
29 versicolor versicolor
Accuracy 29/30 or 96.7%


## random_state = 42
|# |LABEL     |PREDICTED|OUTCOME|
|-:|-|-|-|
| 0| virginica | virginica |  |
| 1| versicolor| versicolor|  |
| 2| versicolor| versicolor|  |
| 3| setosa    | setosa    |  |
| 4| virginica | virginica |  |
| 5| setosa    | setosa    |  |
| 6| versicolor| versicolor|  |
| 7| versicolor| versicolor|  |
| 8| setosa    | setosa    |  |
| 9| setosa    | setosa    |  |
|10| versicolor| versicolor|  |
|11| setosa    | setosa    |  |
|12| versicolor| versicolor|  |
|13| versicolor| versicolor|  |
|14| virginica | virginica |  |
|15| setosa    | setosa    |  |
|16| virginica | virginica |  |
|17| versicolor| versicolor|  |
|18| versicolor| versicolor|  |
|19| setosa    | setosa    |  |
|20| setosa    | setosa    |  |
|21| virginica | virginica |  |
|22| virginica | virginica |  |
|23| setosa    | setosa    |  |
|24| virginica | virginica |  |
|25| versicolor| versicolor|  |
|26| setosa    | setosa    |  |
|27| virginica | virginica |  |
|28| versicolor| versicolor|  |
|29| setosa    | setosa    |  |
Accuracy: 30/30 or 100.0%

## random state = 44
|# |LABEL     |PREDICTED|OUTCOME|
|-:|-|-|-|
| 0| setosa    |     setosa|  |    
| 1| versicolor| versicolor|  |
| 2| versicolor| versicolor|  |
| 3| versicolor| versicolor|  |
| 4| virginica |  virginica|  |
| 5| virginica |  virginica|  |
| 6| versicolor| versicolor|  |
| 7| virginica | virginica |  |
| 8| setosa    | setosa    |  |    
| 9| versicolor| versicolor|  |
|10| versicolor| versicolor|  |
|11| versicolor| versicolor|  |
|12| virginica | versicolor| MISSED ! |
|13| setosa    | setosa    |  |    
|14| setosa    | setosa    |  |    
|15| versicolor| versicolor|  |
|16| setosa    | setosa    |  |    
|17| setosa    | setosa    |  |    
|18| versicolor| versicolor|  |
|19| versicolor| versicolor|  |
|20| setosa    | setosa    |  |    
|21| setosa    | setosa    |  |    
|22| virginica | virginica |  |
|23| virginica | versicolor| MISSED ! |
|24| virginica | virginica |  |
|25| virginica | virginica |  |
|26| virginica | virginica |  |
|27| setosa    | setosa    |  |    
|28| versicolor| versicolor|  |
|29| virginica | virginica |  |
Accuracy 28/30 or 93.3%

## random state = 23
## student id = 20106623
|# |LABEL     |PREDICTED|OUTCOME|
|-:|-|-|-|   
| 0| virginica |  virginica|  |
| 1| versicolor| versicolor|  |
| 2| virginica |  virginica|  |
| 3| versicolor| versicolor|  |
| 4| virginica |  virginica|  |
| 5| versicolor| versicolor|  |
| 6| versicolor| versicolor|  |
| 7| virginica |  virginica|  |
| 8| setosa    |     setosa|  |    
| 9| versicolor| versicolor|  |
|10| virginica |  virginica|  |
|11| setosa    |     setosa|  |    
|12| setosa    |     setosa|  |    
|13| virginica | virginica |  |
|14| virginica |  virginica|  |
|15| virginica |  virginica|  |
|16| virginica |  virginica|  |
|17| setosa    |     setosa|  |   
|18| versicolor| versicolor|  |
|19| setosa    |     setosa|  |  
|20| virginica |  virginica|  |
|21| versicolor| versicolor|  |
|22| virginica | versicolor| MISSED ! |
|23| virginica | virginica |  |
|24| versicolor| versicolor|  |
|25| versicolor| versicolor|  |
|26| setosa    |    setosa |  |
|27| virginica | virginica |  |
|28| versicolor| versicolor|  |
|29| versicolor| versicolor|  |
Accuracy 29/30 or 96.7%


# Deliverables
1. The notebook should have clearly labelled outputs for the random states indicated above. Full credit requires properly labelling the output which is in markdown, correct formatting, and correct results.
1. Share the notebook as viewable only. *Do not remove the outputs.* Copy the link and paste it into the assignment shell.
1. Complete the [submission flight checklist](https://docs.google.com/spreadsheets/u/0/d/1lgCttHGUIbCUTrd0TZIm4Nxfy8wy3jnIvNv7cUPJ-Gw/edit).
When done, export the checklist as lab04-checklist.pdf, and upload it to the assignment shell.