# CSI 4106 Introduction to Artificial Intelligence 
## Assignment 3: Neural Networks

## Report Title: Implementing Neural Networks, Tuning Hyperparameters and Evaluating Models for Machine Learning

### Identification

Name: Alex Govier <br/>
Student Number: 300174954

#### Import Necessary Libraries

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import precision_score, recall_score, f1_score, make_scorer
from sklearn.metrics import classification_report
from sklearn.dummy import DummyClassifier
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

### Exploratory Analysis

#### 1. Loading Dataset and Summary

In [8]:
# Loading the three datasets from my GitHub
test = "https://github.com/alex-govier5/intro-to-ai/raw/master/A3/cb513_test.csv"
test_set = pd.read_csv(test)

train = "https://github.com/alex-govier5/intro-to-ai/raw/master/A3/cb513_train.csv"
training_set = pd.read_csv(train)

valid = "https://github.com/alex-govier5/intro-to-ai/raw/master/A3/cb513_valid.csv"
valid_set = pd.read_csv(valid)

Here I load my data sets and can see that the training set includes 58,291 examples, the validation set contains 7,409 examples, and the test set has 7,432 examples. The target variable in the first column, can take on one of three values: 0, 1, or 2. The remaining 462 columns represent attributes, which are numerical values ranging from 0 to 1.

#### 2. Shuffling the Rows
Here I will shuffle my rows to mitigate the potential negative impact on model training.

In [9]:
# Shuffling the rows of the datasets
test_set = test_set.sample(frac=1).reset_index(drop=True)
training_set = training_set.sample(frac=1).reset_index(drop=True)
valid_set = valid_set.sample(frac=1).reset_index(drop=True)

The frac=1 part means that 100% of the data is being shuffled. The reset_index makes sure that the index is reset after shuffling. So this should help with the adjacent examples problem.

#### 3. Isolating the Target and the Data
Here I will isolate the target and separate it from the features before I scale any features so that the target variable does not get scaled.

In [10]:
# Separate features (X) and target (y) for training set
y_train = training_set.iloc[:, 0]     # Only the first column
X_train = training_set.iloc[:, 1:]    # All columns except the first

# Separate features (X) and target (y) for validation set
y_valid = valid_set.iloc[:, 0]        # Only the first column
X_valid = valid_set.iloc[:, 1:]       # All columns except the first

# Separate features (X) and target (y) for test set
y_test = test_set.iloc[:, 0]          # Only the first column
X_test = test_set.iloc[:, 1:]         # All columns except the first

So the three datasets are now separated into the features (x) and the target (y).

#### 4. Scaling the Numerical Features
Here I will scale my features for one dataset so that it will act as one experiment I can use later on to compare performance with.

In [11]:
# Scaling the train features to use as experiment later on
scaler = MinMaxScaler()

X_train_scaled = scaler.fit_transform(X_train)

So here I use the min max scaler to fit transform one of my dataset features, I chose the X_train set. Since X_train contains all my coumns, the scaling will be applied uniformly across all columns. I will use this scaled set later on to see if it improves any performance.

#### 5. Model Development
Here I will 

In [None]:
# Dummy model that predicts the majority class
dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(X_train, y_train)

# Baseline model, logistic regression
baseline_clf = LogisticRegression(max_iter=200)
baseline_clf.fit(X_train, y_train)

# Neural network 
nn_model = Sequential([
    Input(shape=(462,)),             # Input layer
    Dense(8),                        # Hidden layer with 8 nodes
    Dense(3, activation='softmax')   # Output layer with 3 nodes for 3 classes
])

Here I implement my dummy model which will predict the majority class. Then I chose to implement logistic regression because it tends to work well with high dimensionality which my datasets have, it uses probabilistic interpretation, so it provides a natural way to interpret the likelihood of each structure. It can handle scaled data well which will be useful for my scaled dataset experiment, and it is a relatively simple model that is fast to train. I then implement the neural network with tensorflow and keras, with an input layer with 462 nodes, then the hidden layer with 8 nodes and the default activation function (so not specified) and finally the output layer with 3 nodes using the softmax activation function. So my three models are setup for training.

#### 6. Model Evaluation
Here I will evaluate my models with cross validation for the baseline, and using the validation set for the neural network.

#### 7. Baseline Model


#### 8. Neural Network


#### 9. Model Comparison


--------------------------------------------------------------------------

### References
[Matplotlib Pyplot Documentation](https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html)<br/>
[Numpy User Guide](https://numpy.org/devdocs/user/)<br/>
[Pandas User Guide](https://pandas.pydata.org/docs/user_guide/index.html) <br/>
[Seaborn User Guide](https://seaborn.pydata.org/tutorial/introduction.html)<br/>
[Sklearn Linear Model Documentation](https://scikit-learn.org/stable/modules/linear_model.html)<br/>
[Sklearn Metrics Documentation](https://scikit-learn.org/stable/api/sklearn.metrics.html)<br/>
[Sklearn Model Selection Documentation](https://scikit-learn.org/stable/api/sklearn.model_selection.html)<br/>
[Sklearn Neighbors Documentation](https://scikit-learn.org/stable/modules/neighbors.html)<br/>
[Sklearn Preprocessing Documentation](https://scikit-learn.org/stable/modules/preprocessing.html)<br/>
[Sklearn Tree Documentation](https://scikit-learn.org/stable/modules/tree.html)<br/>
<br/>
Most of my reference came from my own first assignment since a lot of the techniques were able to be used for this assignment as well. For the newer concepts I referred to the documentation and the course lecture notes to see how to implement them. 