# Assignment 1 - Linear Regression - Task 2 SciKit Implementation

Class: COMP 5630 - Machine Learning

Author: Chris Hinkson

Email: cmh0201@auburn.edu

This notebook will provide a quick implementation of scikit-learn's linear regression model on the data provided for task 2 to confirm my model's findings.

In [5]:
# System
import os

# Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Scikit
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

### DataLoader2 Class

This code cell will provide a DataLoader(2) class to load the provided xlsx data into a pandas dataframe, seperate features from target, and provide various utilities for getting subsets of data.

In [6]:
class DataLoader2:

	# Class Constructor
	def __init__(self, dataDirectoryPath: str="../data/") -> None:

		# Save directory path
		self.dataDirectoryPath = dataDirectoryPath

		# Announce data load
		print(f"Attempting to load training and testing data from directory `{self.dataDirectoryPath}`!")

		# Define path for excel file
		excelFilePath = os.path.join(self.dataDirectoryPath, "Housing_data_regression.xlsx")

		# Check if excel file exists
		if not os.path.exists(excelFilePath):
			raise FileNotFoundError(f"Attempted to load excel data but could not find file `{excelFilePath}`!")

		# Load training data from excel sheet
		self.trainDf = pd.read_excel(excelFilePath, sheet_name="Train")
		self.testDf = pd.read_excel(excelFilePath, sheet_name="Test")

		# Split into features and target
		self.trainFeatures = self.trainDf.drop(columns=["Price"])
		self.trainTarget = self.trainDf["Price"]
		self.testFeatures = self.testDf.drop(columns=["Local Price"])
		self.testTarget = self.testDf["Local Price"]

		# Print data load information
		print(f"Successfully loaded data from excel file `{excelFilePath}`!")
		print(f"-> Train Features Shape: {self.trainFeatures.shape}")
		print(f"-> Train Target Shape: {self.trainTarget.shape}")
		print(f"-> Test Features Shape: {self.testFeatures.shape}")
		print(f"-> Test Target Shape: {self.testTarget.shape}")

# Load data from the data directory
Task2DataLoader = DataLoader2(dataDirectoryPath="../data/")

Attempting to load training and testing data from directory `../data/`!
Successfully loaded data from excel file `../data/Housing_data_regression.xlsx`!
-> Train Features Shape: (20, 8)
-> Train Target Shape: (20,)
-> Test Features Shape: (5, 8)
-> Test Target Shape: (5,)


### Model Evaluation

This code cell will create a linear regression model using scikit-learn and test it using the provided data.

In [7]:
# Create model
Task2Model = LinearRegression()

# Train model
Task2Model.fit(Task2DataLoader.trainFeatures, Task2DataLoader.trainTarget)

# Test model
Task2Predictions = Task2Model.predict(Task2DataLoader.testFeatures)

# Evaluate model
Task2MSE = mean_squared_error(Task2DataLoader.testTarget, Task2Predictions)
Task2MAE = mean_absolute_error(Task2DataLoader.testTarget, Task2Predictions)
Task2R2 = r2_score(Task2DataLoader.testTarget, Task2Predictions)

# Print results
print(f"Task 2 SciKit Linear Regression Model Results:")
print(f"-> Mean Squared Error (MSE): {Task2MSE}")
print(f"-> Mean Absolute Error (MAE): {Task2MAE}")
print(f"-> R^2 Score: {Task2R2}")

Task 2 SciKit Linear Regression Model Results:
-> Mean Squared Error (MSE): 55438202899.023254
-> Mean Absolute Error (MAE): 149246.7174611826
-> R^2 Score: -0.5148225385301262
