# Lab 3: Time to Make, Train, and Evaluate a Linear Regression Model!

### **Objective**
This lab focuses on:
- Reading and pre-processing data
- Splitting the dataset into training and test sets
- Creating and training a linear regression model
- Making predictions
- Evaluating the model's performance using various metrics

---


## **1. Import Statements and Setup**

In [2]:
# Importing necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import numpy as np
import pandas as pd

### Load the dataset

In [3]:
# Load the dataset
file_path = "BostonHousing.csv"  # Update this path if necessary
df = pd.read_csv(file_path)

# Display the first few rows of the dataset
print("Dataset preview:")
print(df.head())

Dataset preview:
      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

        b  lstat  medv  
0  396.90   4.98  24.0  
1  396.90   9.14  21.6  
2  392.83   4.03  34.7  
3  394.63   2.94  33.4  
4  396.90   5.33  36.2  


## Step 2: Data Preprocessing

### 2.1: Separating Features (X) and Target (y)


In [4]:
# Assuming the target variable is in the last column
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Confirm separation
print("\nFeatures (X):")
print(X.head())
print("\nTarget (y):")
print(y.head())



Features (X):
      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

        b  lstat  
0  396.90   4.98  
1  396.90   9.14  
2  392.83   4.03  
3  394.63   2.94  
4  396.90   5.33  

Target (y):
0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: medv, dtype: float64


### 2.2: Checking Data Types and Null Values

In [5]:
print("\nColumn Data Types:")
print(df.dtypes)

# Check for null values
null_check = df.isnull().sum()
print("\nNull Value Check:")
print(null_check)

# Ensure there are no non-numeric columns
if X.select_dtypes(include=['object']).empty:
    print("\nNo non-numeric columns found.")
else:
    print("\nWarning: Non-numeric columns detected!")



Column Data Types:
crim       float64
zn         float64
indus      float64
chas         int64
nox        float64
rm         float64
age        float64
dis        float64
rad          int64
tax          int64
ptratio    float64
b          float64
lstat      float64
medv       float64
dtype: object

Null Value Check:
crim       0
zn         0
indus      0
chas       0
nox        0
rm         0
age        0
dis        0
rad        0
tax        0
ptratio    0
b          0
lstat      0
medv       0
dtype: int64

No non-numeric columns found.


## Step 3: Splitting the Dataset

In [6]:
# Splitting data into training and testing subsets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)

# Display shapes of resulting datasets
print("\nDataset Shapes:")
print(f"Training Features: {X_train.shape}, Training Target: {y_train.shape}")
print(f"Testing Features: {X_test.shape}, Testing Target: {y_test.shape}")



Dataset Shapes:
Training Features: (404, 13), Training Target: (404,)
Testing Features: (102, 13), Testing Target: (102,)


## Step 4: Model Training and Prediction


In [7]:
def build_model(X_train, X_test, y_train):
    # Create and train the linear regression model
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Make predictions
    y_predictions = model.predict(X_test)
    return model, y_predictions

# Train the model and get predictions
model, y_predictions = build_model(X_train, X_test, y_train)


## Step 5: Model Evaluation


In [8]:
# Evaluate the model
mae = mean_absolute_error(y_test, y_predictions)
mse = mean_squared_error(y_test, y_predictions)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_predictions)

# Display evaluation metrics
print("\nModel Evaluation Metrics:")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
print(f"R-squared (R2 Score): {r2:.2f}")

# Optional: Save evaluation results
results = {
    "MAE": mae,
    "MSE": mse,
    "RMSE": rmse,
    "R2 Score": r2
}
results_df = pd.DataFrame([results])
results_df.to_csv("model_evaluation_results.csv", index=False)



Model Evaluation Metrics:
Mean Absolute Error (MAE): 3.19
Mean Squared Error (MSE): 24.29
Root Mean Squared Error (RMSE): 4.93
R-squared (R2 Score): 0.67
