# No-Framework Linear Regression

### This implementation uses only Numpy to build a linear regression model

### Goal: Predict used car prices using gradient descent optimization

What we'll implement manually:
- Train/Test split
- Feature scaling (z-score normalization)
- Forward pass (predictions)
- Cost function (Mean Squared Error)
- Gradient computation
- Parameter updates (gradient descent)
- Evaluation metrics (MSE, RMSE, R^2)

In [3]:
# Numpy: Core library for numerical operations on arrays
# ONLY external dependency for the model itself
import numpy as np

# matplotlib: for creating visualizations of training progress and results
import matplotlib as plt

# os: for handling file paths in a cross-platform way
import os

# Set random seed for reporducibility
# Project-wide seed of 113
np.random.seed(113)

# Load cleaned data

- Load the pre-processed dataset that was cleaned in the data-preperation step
- This same file will be used by all 4 frameworks for fair comparison

In [4]:
# Define path to our cleaned dataset
DATA_PATH = os.path.join('..', '..', 'data', 'processed', 'vehicles_clean.csv')

# np.genfromtxt() reads CSV files into numpy arrays
# delimiter=',' specifies that columns are seperated by comas
# skip_header=1 skips the first row (column names)
# Gives us a 2D array where each row is a car, each column is a feature
data = np.genfromtxt(DATA_PATH, delimiter=',', skip_header=1)

# Verify the data loaded correctly
# shape should be (100000, 12)
print(f"Data shape: {data.shape}")
print(f"First row: {data[0]}")

Data shape: (100000, 12)
First row: [2.9990e+04 2.0140e+03 7.0000e+00 2.0000e+00 6.0000e+00 2.0000e+00
 2.6129e+04 0.0000e+00 2.0000e+00 0.0000e+00 8.0000e+00 1.7000e+01]


# Seperate features and Target
- Our columns are: price, year, manufacturer, condition, cylinders, fuel, odometer, title_status, transmission, drive, type, state
- price (column 0) is our TARGET - what we want to predict
- All other columns (1-11) are FEATURES - inputs to our model

In [6]:
# Extract target variable (price)
# data[:, 0] means "all rows, column 0"
y = data[:, 0]

# Extract features variables
# data[:, 1:] means "all rows, column 1 through the end"
X = data[:, 1:]

# Print shapes to verify seperation
print(f"Features (X) shape: {X.shape}")
print(f"Target (y) shape: {y.shape}")

Features (X) shape: (100000, 11)
Target (y) shape: (100000,)


In [7]:
# Define feature names for reference (matching our cleaned data columns)
FEATURE_NAMES = ['year', 'manufacturer', 'condition', 'cylinders', 'fuel', 'odometer', 'title_status', 'transmission', 'drive', 'type', 'state']
print(f"Feature Names: {FEATURE_NAMES}")

Feature Names: ['year', 'manufacturer', 'condition', 'cylinders', 'fuel', 'odometer', 'title_status', 'transmission', 'drive', 'type', 'state']
