# Predicting House Prices

## Learning Goals:

Implement gradient descent from scratch using NumPy
Understand cost function optimization visually
Practice data preprocessing and feature scaling
Create professional documentation for your portfolio

Why This Project?
This is your foundation. Mastering linear regression from scratch will make everything else click. Plus, house price prediction is a classic, relatable problem that recruiters and hiring managers instantly understand.

# ðŸ“– Step 1: Create Your Notebook and Imports

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import seaborn as sns
from sklearn.datasets import fetch_california_housing

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)


# ðŸ“Š Step 2: Load and Explore the Data

In [6]:
# Load the dataset
housing = fetch_california_housing()

# Create a DataFrame for easier manipulation
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = pd.Series(housing.target, name='MedHouseVal')

print(f"Dataset shape: {X.shape}")
print(f"Features: {list(X.columns)}")
print(f"\nFirst few rows:")
X.head()

Dataset shape: (20640, 8)
Features: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']

First few rows:


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [13]:
# Display basic statistics
print("Dataset Statistics:")
print(X.describe())
#X.describe()

print(f"\nTarget variable (price) statistics:")
print(y.describe())
#y.describe()


# Check for missing values
#print(f"\nMissing values: {X.isnull().sum().sum()}")

Dataset Statistics:
             MedInc      HouseAge      AveRooms     AveBedrms    Population  \
count  20640.000000  20640.000000  20640.000000  20640.000000  20640.000000   
mean       3.870671     28.639486      5.429000      1.096675   1425.476744   
std        1.899822     12.585558      2.474173      0.473911   1132.462122   
min        0.499900      1.000000      0.846154      0.333333      3.000000   
25%        2.563400     18.000000      4.440716      1.006079    787.000000   
50%        3.534800     29.000000      5.229129      1.048780   1166.000000   
75%        4.743250     37.000000      6.052381      1.099526   1725.000000   
max       15.000100     52.000000    141.909091     34.066667  35682.000000   

           AveOccup      Latitude     Longitude  
count  20640.000000  20640.000000  20640.000000  
mean       3.070655     35.631861   -119.569704  
std       10.386050      2.135952      2.003532  
min        0.692308     32.540000   -124.350000  
25%        2.42974