# Lab Assignment 2: Regression and Classification
Please refer to the `README.md` for full laboratory instructions--part of the `README.md` is replicated below for your reference.

## Part A: Linear Regression

We are given data used in a study of the homicide rate (HOM) in Detroit, over the years 1961-1973. The following data were collected by J.C. Fisher, and used in his paper ”Homicide in Detroit: The Role of Firearms,” Criminology, vol. 14, pp. 387-400, 1976. Each row is for a year, and each column are values of a variable. A picture of the table for your reference immediately follows, but have access to the raw data in this lab.

![image](https://peilundai.com/ps2_programming/table.png)

* FTP    - Full-time police per 100,000 population
* UEMP   - % unemployed in the population
* MAN    - number of manufacturing workers in thousands
* LIC    - Number of handgun licenses per 100,000 population
* GR     - Number of handgun registrations per 100,000 population
* NMAN   - Number of non-manufacturing workers in thousands
* GOV    - Number of government workers in thousands
* HE     - Average hourly earnings
* WE     - Average weekly earnings
* HOM    - Number of homicides per 100,000 population

It turns out that three of the variables together are good predictors of the homicide rate: `FTP`, `WE`, and one more variable.

Use methods described in Chapter 3 of the textbook to devise a mathematical formulation to determine the third variable. Implement your formulation and then conduct experiments to determine the third variable. In your report, be sure to provide the step-by-step mathematical formulation (citing Chapter 3 as needed) that corresponds to the implementation you turn in. Also give plots and a rigorous argument to justify the scheme you use and your conclusions.

### Accessing the Data Set
The data is stored in a file called `detroit.npy`. `detroit.npy` has already been loaded into this environment, so you do not need to run this following cell. The command for how you can get data like this is included for your reference and edification.

You can find the data under the path:
`Assignment2/detroit.npy`.

In [8]:
# download data. 
#!wget https://peilundai.com/ps2_programming/detroit.npy

### Write and Run Your Own Code

In [1]:
#Library declarations
import matplotlib.pyplot as plt
import numpy as np

In [9]:
# load data
X=np.load('detroit.npy')
print(X.shape)
print(X)

# Note: Least-squares linear regression in Python can be done with the help of np.linalg.lstsq()

## PLEASE ADD YOUR CODE HERE

(13, 10)
[[ 260.35   11.    455.5   178.15  215.98  538.1   133.9     2.98  117.18
     8.6 ]
 [ 269.8     7.    480.2   156.41  180.48  547.6   137.6     3.09  134.02
     8.9 ]
 [ 272.04    5.2   506.1   198.02  209.57  562.8   143.6     3.23  141.68
     8.52]
 [ 272.96    4.3   535.8   222.1   231.67  591.    150.3     3.33  147.98
     8.89]
 [ 272.51    3.5   576.    301.92  297.65  626.1   164.3     3.46  159.85
    13.07]
 [ 261.34    3.2   601.7   391.22  367.62  659.8   179.5     3.6   157.19
    14.57]
 [ 268.89    4.1   577.3   665.56  616.54  686.2   187.5     3.73  155.29
    21.36]
 [ 295.99    3.9   596.9  1131.21 1029.75  699.6   195.4     2.91  131.75
    28.03]
 [ 319.87    3.6   613.5   837.6   786.23  729.9   210.3     4.25  178.74
    31.49]
 [ 341.43    7.1   569.3   794.9   713.77  757.8   223.8     4.47  178.3
    37.39]
 [ 356.59    8.4   548.8   817.74  750.43  755.3   227.7     5.04  209.54
    46.26]
 [ 376.69    7.7   563.4   583.17 1027.38  787.    230.9 

In [19]:
import numpy as np

# Load dataset from file
dataset = np.load("detroit.npy")

# Define column headers
features = ["FTP", "UEMP", "MAN", "LIC", "GR", "NMAN", "GOV", "HE", "WE", "HOM"]

# Display column mapping
print("Feature Index Mapping:")
for index, name in enumerate(features):
    print(f"{index}: {name}")

# Assign target and fixed predictors
target_col = 9
fixed_col1 = 0  # FTP
fixed_col2 = 8  # WE

# Identify candidate predictors (excluding target and fixed predictors)
candidate_cols = [i for i in range(len(features)) if i not in {target_col, fixed_col1, fixed_col2}]
print("\nPotential Predictor Indices:", candidate_cols)

# Extract dependent variable
target_values = dataset[:, target_col]

# Initialize tracking for best predictor
optimal_mse = float("inf")
optimal_r2 = float("-inf")
best_predictor = None
best_params = None

# Iterate over candidate predictors
print("\nEvaluating Candidate Predictors...\n")
for col in candidate_cols:
    # Prepare design matrix with intercept, fixed predictors, and candidate feature
    intercept = np.ones((dataset.shape[0], 1))  # Intercept column
    fixed_feat1 = dataset[:, fixed_col1].reshape(-1, 1)  # Fixed predictor 1
    fixed_feat2 = dataset[:, fixed_col2].reshape(-1, 1)  # Fixed predictor 2
    candidate_feat = dataset[:, col].reshape(-1, 1)  # Candidate feature

    # Stack the selected columns horizontally
    X_matrix = np.hstack((intercept, fixed_feat1, fixed_feat2, candidate_feat))

    # Compute regression parameters using the Normal Equation
    beta_params = np.linalg.inv(X_matrix.T @ X_matrix) @ X_matrix.T @ target_values

    # Generate predictions
    predictions = X_matrix @ beta_params

    # Calculate Mean Squared Error (MSE)
    mse_value = np.mean((target_values - predictions) ** 2)

    # Output results for current predictor
    print(f"Feature Evaluated: {features[col]} (Index {col})")
    print(f"Model Coefficients: {beta_params}\n")

    # Update best predictor if current one performs better
    if mse_value < optimal_mse:
        optimal_mse = mse_value
        best_predictor = col
        best_params = beta_params

# Display results of the best performing predictor
print("\n---------------------------")
print(f"Optimal Predictor: {features[best_predictor]} (Index {best_predictor})")
print(f"Minimal MSE: {optimal_mse:.4f}")
print(f"Optimal Model Coefficients: {best_params}")


Feature Index Mapping:
0: FTP
1: UEMP
2: MAN
3: LIC
4: GR
5: NMAN
6: GOV
7: HE
8: WE
9: HOM

Potential Predictor Indices: [1, 2, 3, 4, 5, 6, 7]

Evaluating Candidate Predictors...

Feature Evaluated: UEMP (Index 1)
Model Coefficients: [-7.98156958e+01  3.76695265e-01 -3.55387663e-02 -6.43074332e-01]

Feature Evaluated: MAN (Index 2)
Model Coefficients: [-1.09986829e+02  3.60165290e-01 -6.13921018e-02  6.44700813e-02]

Feature Evaluated: LIC (Index 3)
Model Coefficients: [-5.81244081e+01  1.84691260e-01  1.06849952e-01  1.64636819e-02]

Feature Evaluated: GR (Index 4)
Model Coefficients: [-5.74412199e+01  2.02762985e-01  7.04621169e-02  1.62152012e-02]

Feature Evaluated: NMAN (Index 5)
Model Coefficients: [-9.38741400e+01  2.29977228e-01 -5.74717104e-02  8.71595828e-02]

Feature Evaluated: GOV (Index 6)
Model Coefficients: [-7.38479369e+01  2.04970958e-01 -2.20430680e-02  2.16965508e-01]

Feature Evaluated: HE (Index 7)
Model Coefficients: [-75.11003345   0.31106152  -0.12617632   6.82

### What to Submit
You should submit a single .pdf file that contains the following:
1. A brief post-lab write-up that contains the following for each part of this assignment:

    a. Your paper design.
    
    b. A brief description of your model. Justify your selection of model parameters.
    
    c. An evaluation of your model, including evidence as appropriate.
    
    d. A brief (couple of sentences) reflection on your take-aways from this lab exercise.