## Linear Regression Application Using the California Housing Price Dataset
### Demonstrating Matrix-Vector and Matrix-Matrix Multiplication


In this notebook, we will demonstrate how concepts from Linear Algebra, such as matrix-vector multiplication and matrix-matrix multiplication, appear in Data Science and Machine Learning.

We will use the **California Housing dataset**, a classic dataset containing information about houses in California from the 1990 census.  
It includes features such as:
- Median income of households in a block
- Average number of rooms per house
- Average number of bedrooms
- Population and occupancy
- Latitude and longitude

and the target variable:
- **Median house value** (`MedHouseVal`).

---

### Why this dataset?
- It is small enough to understand, yet real-world.
- Perfect for demonstrating **linear regression** using the matrix equation:

\[
A $\cdot$ x = b
\]

where:  
- \( A \): Feature matrix (housing attributes)  
- \( x \): Weight vector (parameters we want to learn)  
- \( b \): Target vector (house prices)  

---

### Our Plan
1. Use a **subset of rows and columns** to keep the math simple and visual.  
2. Express linear regression as solving \( A \cdot x = b \).  
3. Learn weights (\(x\)) using linear algebra.  
4. Finally, **predict the price of a new house** using matrix-vector multiplication.  


In [8]:
from sklearn.datasets import fetch_california_housing
import pandas as pd

# Load dataset
housing = fetch_california_housing(as_frame=True)
df = housing.frame

# Show first 5 rows
df.head()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


### Representing the Problem in Matrix Form

We have chosen **3 features**:
1. `MedInc`  (Median income)  
2. `AveRooms` (Average number of rooms)  
3. `AveOccup` (Average occupancy)  

and only **5 houses** (rows).

This gives us:

- A **feature matrix** \( A \) of shape \( 5 \times 3 \):

$$
A =
\begin{bmatrix}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33} \\
a_{41} & a_{42} & a_{43} \\
a_{51} & a_{52} & a_{53} \\
\end{bmatrix}
$$


- A **weight vector** \( x \) (the model parameters we want to learn):

$$
x =
\begin{bmatrix}
w_1 \\
w_2 \\
w_3
\end{bmatrix}
$$

- A **target vector** \( b \) (house prices) of shape \( 5 \times 1 \):

$$
b =
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
b_5
\end{bmatrix}
$$

The linear regression model assumes:

$$
A \cdot x \approx b
$$



Our goal: **find weights \(x\) such that the predicted prices are close to the actual prices.**


In [24]:
import numpy as np

# Select 3 features and 5 houses
features = ["MedInc", "AveRooms", "AveOccup"]
target = "MedHouseVal"

df_small = df[features + [target]].head(5)

# Feature matrix A (5x3) and target vector b (5x1)
A = df_small[features].values
b = df_small[target].values.reshape(-1, 1)  # make it a column vector

print("Feature matrix A (5x3):\n", A)
print("\nTarget vector b (5x1):\n", b)
print("\nShapes: A =", A.shape, ", b =", b.shape)


Feature matrix A (5x3):
 [[8.3252     6.98412698 2.55555556]
 [8.3014     6.23813708 2.10984183]
 [7.2574     8.28813559 2.80225989]
 [5.6431     5.8173516  2.54794521]
 [3.8462     6.28185328 2.18146718]]

Target vector b (5x1):
 [[4.526]
 [3.585]
 [3.521]
 [3.413]
 [3.422]]

Shapes: A = (5, 3) , b = (5, 1)


### Matrix-Vector Multiplication in Regression

If we take a weight vector:

$$
x =
\begin{bmatrix}
w_1 \\
w_2 \\
w_3
\end{bmatrix}
$$

then multiplying the **feature matrix** \(A\) with \(x\) gives us the **predicted prices**:

$$
\hat{b} = A \cdot x
$$

Concretely, for the first house:

$$
\hat{b}_1 = (8.3252 \times w_1) + (6.9841 \times w_2) + (2.5555 \times w_3)
$$

This is just a **dot product** between the house’s features and the weight vector.

For all 5 houses together, it is simply a **matrix–vector multiplication**:

$$
\begin{bmatrix}
8.3252 & 6.9841 & 2.5555 \\
8.3014 & 6.2381 & 2.1098 \\
7.2574 & 8.2881 & 2.8022 \\
5.6431 & 5.8173 & 2.5479 \\
3.8462 & 6.2818 & 2.1814 \\
\end{bmatrix}
\cdot
\begin{bmatrix}
w_1 \\
w_2 \\
w_3
\end{bmatrix}
=
\begin{bmatrix}
\hat{b}_1 \\
\hat{b}_2 \\
\hat{b}_3 \\
\hat{b}_4 \\
\hat{b}_5
\end{bmatrix}
$$

So regression is basically:  
**“Find the weights $(w_1, w_2, w_3$) that make $( \hat{b} )$ as close as possible to the real prices \( b \).”**


In [36]:
# Random guess for weights (3x1)
x_guess = np.array([[5], [0.3], [0.1]])

# Predicted house prices with this guess
b_hat = A.dot(x_guess)

print("Random guess weights:\n", x_guess)
print("\nPredicted prices (b_hat):\n", b_hat)
print("\nActual prices (b):\n", b)


Random guess weights:
 [[5. ]
 [0.3]
 [0.1]]

Predicted prices (b_hat):
 [[43.97679365]
 [43.58942531]
 [39.05366667]
 [30.2155    ]
 [21.3337027 ]]

Actual prices (b):
 [[4.526]
 [3.585]
 [3.521]
 [3.413]
 [3.422]]


### Why a Random Guess Fails

From the previous step, we saw that multiplying \(A\) with a random weight vector \(x\) gave predictions $(\hat{b})\$ that were far from the actual prices \(b\).

This shows two key points:
1. **Matrix-vector multiplication works mechanically**: features × weights → predictions.  
2. But **choosing the right weights matters**: random guesses won't align predictions with reality.

So, how do we find the "best" \(x\)?  

We can’t solve \( A $\cdot$ x = b \) exactly here (since the system is *overdetermined*: 5 equations, 3 unknowns).  
Instead, we look for an \(x\) that minimizes the error:

$$
\min_x \| A \cdot x - b \|^2
$$

This is the **Least Squares solution** — the foundation of Linear Regression.


### Solving Linear Regression via Normal Equations

We want the weights \(x\) that minimize the squared error:

$$
\min_x \|A x - b\|^2
$$

Take derivative and set to zero:

$$
\frac{\partial}{\partial x}\|A x - b\|^2 = 2A^\top(Ax - b)=0
\;\;\Rightarrow\;\;
A^\top A\,x = A^\top b
$$

This is the **Normal Equation**.  
If $A^\top A$ is invertible (i.e., columns of $A$ are linearly independent), the solution is:

$$
x = (A^\top A)^{-1}A^\top b
$$

**Special case:**  
If \(A\) were a **square, invertible matrix**, then the normal equations reduce to:

$$
Ax = b
$$

which is just the standard solution of a system of linear equations.  
So, Least Squares generalizes the idea of solving linear systems to the case where we have *more equations than unknowns*.


In [56]:
# Normal Equations: x = (A^T A)^(-1) A^T b
AtA = A.T @ A          # (3x3)
Atb = A.T @ b          # (3x1)
x_ne = np.linalg.inv(AtA) @ Atb   # coefficients (3x1)

print("Weights from Normal Equations (x):\n", x_ne)

# Predictions on the training set
b_hat = A @ x_ne
print("\nPredicted prices:\n", b_hat.flatten())
print("\nActual prices:\n", b.flatten())

# Residuals
residuals = b.flatten() - b_hat.flatten()
print("\nResiduals:\n", residuals)


Weights from Normal Equations (x):
 [[ 0.17196178]
 [-0.01109443]
 [ 1.06340333]]

Predicted prices:
 [4.07171755 3.60192773 4.13597574 3.61535071 2.91148526]

Actual prices:
 [4.526 3.585 3.521 3.413 3.422]

Residuals:
 [ 0.45428245 -0.01692773 -0.61497574 -0.20235071  0.51051474]


### Predicting a New House Price

Now that we have learned the weight vector \(x\) from the Normal Equations, we can predict the price of a **new house** using the same linear model:

$$
\hat{b}_{\text{new}} = A_{\text{new}} \cdot x
$$

- \$(A_{\text{new}}$\) is a 1×3 row vector representing the features of the new house: `MedInc`, `AveRooms`, `AveOccup`.  
- \(x\) is the 3×1 weight vector learned from our training data.  
- The multiplication gives \$(\hat{b}_{\text{new}}\$), the predicted house price.

This demonstrates **matrix-vector multiplication in practice**: the same operation we used for all 5 training houses is applied here for a new, unseen example.


In [60]:
# Define a new house with the same 3 features: MedInc, AveRooms, AveOccup
new_house = np.array([[7.0, 5.5, 2.5]])  # shape (1x3)

# Predict price using the weights from Normal Equations
predicted_price = new_house @ x_ne  # (1x3) · (3x1) = (1x1)

print("New house features:", new_house.flatten())
print("Predicted house price:", predicted_price.item())


New house features: [7.  5.5 2.5]
Predicted house price: 3.801221386690359
