# Introduction to Artificial Intelligence and Machine Learning (AI/ML)

## Project Objectives
- Define AI/ML and the difference between them
- Compare ML vs. expert based systems
- Explore ways in which AI/ML gets used
- Do the "Hello World" of neural networks as an introduction to AI/ML

## Our Sources

- <a href=https://web.archive.org/web/20231129013703/https://www.ibm.com/topics/artificial-intelligence>What is Artificial Intelligence? (IBM)<a>
- <a href=https://web.archive.org/web/20231127132627/https://www.ibm.com/topics/machine-learning>What is Machine Learning? (IBM)<a>
- <a href=https://web.archive.org/web/20140505045226/http://stpk.cs.rtu.lv/sites/all/files/stpk/materiali/MI/Artificial%20Intelligence%20A%20Modern%20Approach.pdf>Russell & Norvig (1995)</a>
- https://www.statlearning.com/

## AI and ML: Definitions and Differences (2 points)

Read from both of these links on the difference (or lack thereof) between AI and ML:

- <a href=https://web.archive.org/web/20231129013703/https://www.ibm.com/topics/artificial-intelligence>What is Artificial Intelligence? (IBM)<a>
- <a href=https://web.archive.org/web/20231127132627/https://www.ibm.com/topics/machine-learning>What is Machine Learning? (IBM)<a>

**In 1-2 sentences, describe the difference (or lack thereof) between machine learning and artificial intelligence in the cell below.** Citations are not required.

## Expert Based Systems (2 points)

Read from <a href=https://web.archive.org/web/20140505045226/http://stpk.cs.rtu.lv/sites/all/files/stpk/materiali/MI/Artificial%20Intelligence%20A%20Modern%20Approach.pdf>this AI textbook from Russell & Norvig (1995)</a> on what an **expert system** is. Then describe the difference between an expert system and machine learning. The relevant section is titled *Knowledge-based systems: The key to power? (1969-1979)* and starts at page 22 (you should only need to read this one section).

**In a few sentences, describe the difference between machine learning and expert systems in the cell below**. Citations are not required.

## Use Cases For AI/ML (2 points)

Go back to the link <a href=https://web.archive.org/web/20231127132627/https://www.ibm.com/topics/machine-learning>What is Machine Learning? (IBM)<a> and scroll down to the "Real-world machine learning use cases" section.

**In 1-3 sentences, explain which one of the given use cases interest you the most and why**. Citations are not required.

## A Machine Learning Algorithm: $k$-nearest neighbors (knn)

Below we implement the $k$-nearest neighbors algorithm using Scikit-Learn, a machine learning package. 

*You might not understand what most of this code is doing, and we don't expect you to! The entire algorithm is mostly implemented for you; all you need to do is edit a few lines of code to finish it.* **There will be clear instructions at the two points where you need to edit the code to get it to work.**

If you are interested in learning more about $k$-nearest neighbors, check out chapter 2.2.3 of ISL: https://www.statlearning.com/ or visit the <a href=https://the-examples-book.com/starter-guides/data-science/data-analysis/k-nearest-neighbors>Starter Guide page</a> (this code comes directly from the `knn` code example).

In [None]:
import pandas as pd
import openpyxl
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import math
from sklearn.model_selection import train_test_split

warnings.filterwarnings('ignore') #ignore warnings that occur

In [None]:
df = pd.read_excel("data.xlsx")
df = df.dropna()
columns_to_convert = ['satisfaction_v2','Gender','Customer Type','Type of Travel','Class']

for column in columns_to_convert:
    df[column] = df[column].astype('category')
    df[column+"_coded"] = df[column].cat.codes

old_df = df

df = df.drop(columns=['id'])
df = df.drop(columns=columns_to_convert)

columns_to_norm = ['Age','Flight Distance','Departure Delay in Minutes','Arrival Delay in Minutes']

for column in columns_to_norm:
    df[column] = df[column]/np.max(df[column])

#### Train Test Splits (2 points)

Below we create the cross validation train test splits from our data. You can learn about train/test splits here: https://the-examples-book.com/starter-guides/data-science/data-modeling/resampling-methods/cross-validation/train-valid-test

**Set a float called `test_size` to be some value between 0.05 and 0.30 to create a test split that is 5-30% of our total dataset**. `test_size` gets used in the scikit-learn function `train_test_split` to automatically shuffle our data and create train test splits.

In [None]:
test_size = ??

In [None]:
labels = df['satisfaction_v2_coded'] #create the labels 
data = df.drop(columns=['satisfaction_v2_coded']) #recreate the data
train_x, test_x, train_y, test_y = train_test_split(data, labels, test_size=test_size, random_state=42)

#### Setting the Max $k$ Value (2 points)

Below we create a for loop to try out multiple different $k$ values. Here we set the maximum value of $k$. You will want to set your `max_k` value to not be more than 20; it might take a while if you go higher than that, and besides, you will see that this data (like most datasets) doesn't benefit from a $k$ value higher than 10. **Set variable `max_k` to be equal to an int between 1 and 21 of your choice.**

In [None]:
max_k = ??

In [None]:
k_values = []
train_acc = []
test_acc = []

#for each possible k we can test from 2 to the max possible k value (including max_k)
for k in range(2,max_k+1):
#Train Model and Predict  
    print("Now testing value of k:",k)
    neigh = KNeighborsClassifier(n_neighbors = k).fit(train_x,train_y)
    yhat = neigh.predict(test_x)
    k_values.append(k)
    train_acc.append(metrics.accuracy_score(train_y, neigh.predict(train_x)))
    test_acc.append(metrics.accuracy_score(test_y, yhat))

#convert results to df
results_data = {'k':k_values, 'Training Accuracy':train_acc, 'Test Accuracy':test_acc}
results_df = pd.DataFrame(data=results_data)

print("The k value with the highest accuracy betwen 2 and", max_k,"is",np.argmax(test_acc)+2)

# setting the dimensions
fig, ax = plt.subplots(figsize=(30, 18))
 
# drawing the plot
sns.lineplot(results_df, x='k',y='Test Accuracy', ax=ax).set_title("Test Accuracy For Each k Value")
plt.show()