# Lab Assignment Four: Multi-Layer Perceptron

Code by Miller Boyd

## Introduction
In this lab, you will compare the performance of multi-layer perceptrons with your own implementations. This project is a crucial component of the course, constituting 10% of the final grade. Teams are required to submit a comprehensive report in a Jupyter notebook format, including all code, visualizations, and narratives. For visualizations that cannot be directly embedded, include screenshots. Ensure the results are reproducible using the submitted notebook.


## Dataset Selection
You will employ the US Census data for this assignment, specifically chosen by the instructor. This dataset is available on Kaggle and can also be downloaded from [Dropbox](https://www.dropbox.com/s/bf7i7qjftk7cmzq/acs2017_census_tract_data.csv?dl=0). The classification task involves predicting the child poverty rate across different tracts, requiring you to convert this into a four-level classification task by quantizing the variable of interest.

Found in file: acs2017_census_tract_data.csv

### Load, Split, and Balance (1.5 points)
- **[0.5 points]** Load the data into a pandas DataFrame, remove missing data, encode strings as integers, and decide on the inclusion of the "county" variable with justification.
- **[0.5 points]** Balance the dataset ensuring an equal number of instances across classes. Explain your chosen method for balancing and whether it applies to both training and testing sets.
- **[0.5 points]** Split the dataset into an 80/20 train/test ratio, aiming for equal classification performance across classes. Only one-hot encode the target at this stage.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import numpy as np

# Load the data
data_path = 'acs2017_census_tract_data.csv'
df = pd.read_csv(data_path)

# Remove missing data
df = df.dropna()

# Encode string data as integers
# Assuming 'county' is a string variable among others. Adjust as necessary.
le = LabelEncoder()
for col in df.select_dtypes(include=['object']).columns:
    df[col] = le.fit_transform(df[col])


In [None]:
# Decide on the inclusion of the "county" variable
# This is a placeholder for your analysis on whether to keep or remove the 'county' variable.
# Example decision: Remove 'county' if it was encoded above
# df = df.drop(columns=['county'])

In [3]:
# Balance the dataset
# First, quantize the 'ChildPoverty' variable into 4 classes
df['ChildPoverty_Quantized'] = pd.qcut(df['ChildPoverty'], 4, labels=False)

# Count the number of instances in each class to decide on balancing method
class_counts = df['ChildPoverty_Quantized'].value_counts()
print("Class distribution before balancing:", class_counts)

# Assuming a simple undersampling strategy for balancing for demonstration
# Ideally, you might want to explore more sophisticated methods depending on class distribution
min_class_size = class_counts.min()
df_balanced = df.groupby('ChildPoverty_Quantized').apply(lambda x: x.sample(min_class_size)).reset_index(drop=True)


Class distribution before balancing: ChildPoverty_Quantized
0    18229
1    18171
3    18170
2    18148
Name: count, dtype: int64


  df_balanced = df.groupby('ChildPoverty_Quantized').apply(lambda x: x.sample(min_class_size)).reset_index(drop=True)


In [4]:
# Split the dataset into 80% training and 20% testing
# Assuming 'ChildPoverty_Quantized' is the target variable
X = df_balanced.drop(['ChildPoverty_Quantized'], axis=1)
y = df_balanced['ChildPoverty_Quantized']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print("Training set size:", X_train.shape[0])
print("Testing set size:", X_test.shape[0])

Training set size: 58073
Testing set size: 14519



### Pre-processing and Initial Modeling (2.5 points)
- **[0.5 points]** Employ a two-layer perceptron (with vectorized gradient computation, mini-batching, cross entropy loss, and Glorot initialization) to model the data without normalization or one-hot encoding. Ensure model convergence by graphing loss over epochs.
- **[0.5 points]** Normalize continuous numeric features, reapply the two-layer perceptron model, and graph loss over epochs for performance quantification.
- **[0.5 points]** Normalize numeric features and one-hot encode categorical data, reapply the model, and graph loss over epochs.
- **[1 point]** Compare the performance across the three models, discussing any significant differences and their potential causes. Use one-hot encoding and normalization for all data in subsequent tasks.


### Modeling (5 points)
- **[1 point]** Extend the perceptron model to include a third layer, incorporating gradient magnitude tracking for each layer per epoch. Quantify performance and graph gradient magnitudes.
- **[1 point]** Add a fourth layer to the model, repeating the performance quantification and gradient magnitude tracking.
- **[1 point]** Introduce a fifth layer, continuing with performance quantification and gradient tracking.
- **[2 points]** Implement an adaptive learning technique (excluding AdaM) for the five-layer network. Discuss your choice of technique, compare model performances with and without the adaptive strategy.

### Exceptional Work (1 point)
- **5000 level students:** You are encouraged to explore additional analyses.
- **7000 level students (required):** Implement AdaM in the five-layer neural network and compare its performance with other models.