# Supervised Learning Section

1. [Introduction](#introduction)
2. [Data Setup](#data-setup)
3. [Principal Component Analysis (PCA) Review](#principal-component-analysis-pca-review)
    - [Significant Features](#significant-features-identification)
4. [Supervised Learning Model Implementation](#supervised-learning-model-implementation)
    - [Data Splitting](#data-splitting)
    - [Lasso Regression](#lasso-regression)
        - [Model Training](#model-training)
        - [Model Evaluation](#model-evaluation)
    - [Decision Tree](#decision-tree)
        - [Model Training](#model-training-1)
        - [Model Evaluation](#model-evaluation-1)
5. [Comparison of Model Results](#comparison-of-model-results)
    - [R squared analysis](#r-squared-analysis)
    - [MSE analysis](#mse-analysis)
6. [Policy Recommendation](#policy-recommendation-development)
    - [Interpretation of Findings](#interpretation-of-findings)
    - [Our Policy Recommendations](#formulating-policy-decisions)
7. [Conclusion](#conclusion)


## Introduction
Now that we have performed PCA and clustering to determine key features in the dataset, we would like to support these findings with supervised learning. Our goal in this section is to train easily interpretable supervised learning models to predict digital equity statistics in areas of Michigan. We aim to provide insight into which factors are the most significant in determining digital equity. The relative importance of each feature can be gleaned by the weight given to them during the training of these supervised learning models. With these key features in mind, we will recommend policy decisions that could use this insight to better allocate public funds. Since we are using both categorical and quantitative variables to predict our quantitative equity metric, we plan to compare the efficacy of a lasso regression approach.

## Data setup
Access the upload + download speed results etc. from /assets, save as df

In [None]:
import sklearn
import pandas

## Principal Component Analysis (PCA) Review
Write out which groups our PCA highlighted, and which features emerged as the most significant. We will see if our supervised learning yields the same results

### Significant features
bullet list of key features

## Supervised Learning Model Implementation
### Data splitting

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Lasso Regression

### Lasso training 

In [None]:
from sklearn import linear_model
# Lasso regression model
lasso = linear_model.Lasso(alpha=0.1)
# Fitting model to data
lasso.fit(X, y)
y_pred = lasso.predict(X)

### Lasso evaluation

In [None]:
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred, squared=False)
print("Coefficients:", lasso.coef_)
r2 = r2_score(y_true, y_pred)
print(r2)
print(mse)
print(rmse)

## Decision Tree

### Tree training

In [None]:
from sklearn.tree import DecisionTreeRegressor
tree_model = DecisionTreeRegressor(random_state=42)
tree_model.fit(X_train, y_train)
y_pred = tree_model.predict(X_test)

### Tree evaluation

In [None]:
from sklearn.tree import export_text
mse = mean_squared_error(y_true, y_pred)
rmse = mean_squared_error(y_true, y_pred, squared=False)
tree_rules = export_text(tree_model, feature_names=list(X.columns))
print(tree_rules)
r2 = r2_score(y_true, y_pred)
print(r2)
print(mse)
print(rmse)

## Comparison of Model Results

Compare the statistics we drew from the last section to see if the key features (those with the greatest weights) matches up with our unsupervised learning results

### R squared analysis

### MSE Analysis

## Policy Recommendation
Using the key features we found, make an argument for how we should allocate spending to take these results into account

### Interpretation of Findings

### Our Policy Recommendations

## Conclusion
Write a few sentences summing up the findings, and giving contact info / link to our repo