## What is Gradient Descent?
Imagine you're blindfolded on a mountain and want to reach the bottom (lowest point). You:

    Feel the ground with your foot
    Take a step in the steepest downward direction
    Repeat until you reach the bottom

That's exactly what Gradient Descent does! It finds the best solution by taking small steps toward the optimal answer.

### Types of Gradient Descent 
1. Batch Gradient (Direct Solution)

Calculates the exact answer mathematically
Pros: Exact solution, no iterations needed
Cons: Slow with large datasets, doesn't work with all models

2. Stochastic Gradient Descent (SGD)

Uses ONE data point at a time
Pros: Fast, memory efficient, can escape local minima
Cons: Noisy, jumpy path

3. Mini-batch Gradient Descent

Uses SMALL BATCHES of data (e.g., 32 points)
Pros: Best of both worlds - speed and stability
Cons: Need to choose batch size

### Loading Data
Since in the last file, we used synthetic data, let us use Kaggle's California Housing Dataset this time

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.datasets import fetch_california_housing

np.random.seed(42)

california_housing = fetch_california_housing()
X = california_housing.data
y = california_housing.target

feature_names = california_housing.feature_names
df = pd.DataFrame(X, columns=feature_names)
df['MedHouseVal'] = y

print("Dataset shape: " ,df.shape)
print("Features in the dataset are: ")
for i, feature in enumerate(feature_names):
    print(f"{i+1}. {feature}")

print("\nDataset description:")
print("This dataset contains information about California housing from the 1990 census.")
print("Target: Median house value in hundreds of thousands of dollars")

print("\nFirst few rows:")
df.head()


Dataset shape:  (20640, 9)
Features in the dataset are: 
1. MedInc
2. HouseAge
3. AveRooms
4. AveBedrms
5. Population
6. AveOccup
7. Latitude
8. Longitude

Dataset description:
This dataset contains information about California housing from the 1990 census.
Target: Median house value in hundreds of thousands of dollars

First few rows:


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


### Exploratory Data Analysis
In simple terms, check for missing values in training and testing data, and remove them

In [2]:
print("Dataset Information: ")
print(train_df.info())
print(train_df.describe())

Dataset Information: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 15 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   ID       333 non-null    int64  
 1   crim     333 non-null    float64
 2   zn       333 non-null    float64
 3   indus    333 non-null    float64
 4   chas     333 non-null    int64  
 5   nox      333 non-null    float64
 6   rm       333 non-null    float64
 7   age      333 non-null    float64
 8   dis      333 non-null    float64
 9   rad      333 non-null    int64  
 10  tax      333 non-null    int64  
 11  ptratio  333 non-null    float64
 12  black    333 non-null    float64
 13  lstat    333 non-null    float64
 14  medv     333 non-null    float64
dtypes: float64(11), int64(4)
memory usage: 39.1 KB
None
               ID        crim          zn       indus        chas         nox  \
count  333.000000  333.000000  333.000000  333.000000  333.000000  333.000000   
mean   2

For simplicity of training our model, let us just use few important features instead of all:
1. Room Number
2. Lower Status
3. Pupil-teacher ratio