
# Week 4 – Intro to Machine Learning (Colab Format)

This week we dip into **machine learning** using scikit-learn. We'll cover:
- Loading data
- Splitting into train/test sets
- Training a simple model
- Evaluating it with easy-to-understand metrics

**No complex math** — just practical steps.

**How to use in Google Colab**
1. Download this notebook.
2. Open https://colab.research.google.com
3. File → Upload notebook → select this file.
4. Run cells top to bottom (Shift + Enter).

---

## 📚 Free Learning Resources
- Kaggle: [Intro to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning)  
- scikit-learn Tutorials → https://scikit-learn.org/stable/tutorial/index.html  



## 0) Setup


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score

pd.__version__, np.__version__



## 1) Load Example Dataset

We'll use scikit-learn's built-in **California Housing** dataset (small & clean).


In [None]:

from sklearn.datasets import fetch_california_housing

california = fetch_california_housing(as_frame=True)
df = california.frame
df.head()



## 2) Inspect Data


In [None]:

df.info()
df.describe()



## 3) Select Features & Target

We'll predict **median house value** based on a few features.


In [None]:

X = df[["MedInc", "AveRooms", "HouseAge", "AveOccup"]]  # Features
y = df["MedHouseVal"]  # Target



## 4) Split Data into Train/Test Sets


In [None]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape



## 5) Train a Linear Regression Model


In [None]:

model = LinearRegression()
model.fit(X_train, y_train)

print("Model coefficients:", model.coef_)
print("Model intercept:", model.intercept_)



## 6) Make Predictions


In [None]:

y_pred = model.predict(X_test)
y_pred[:5]



## 7) Evaluate the Model

We'll use **Mean Absolute Error (MAE)** and **R² score** (goodness of fit).


In [None]:

mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error:", round(mae, 3))
print("R² score:", round(r2, 3))



## 8) Visualize Predictions vs Actual


In [None]:

plt.figure(figsize=(6, 6))
plt.scatter(y_test, y_pred, alpha=0.3)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Predicted vs Actual Median House Value")
plt.show()



## 9) Mini Assignment – Try Another Model

Switch to **DecisionTreeRegressor** or **RandomForestRegressor** and compare results.

Example:
```python
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(max_depth=5, random_state=42)
model.fit(X_train, y_train)
```



## ✅ Week 4 Deliverables
- Load and inspect dataset
- Train/test split
- Train **Linear Regression** model
- Evaluate with MAE and R²
- Visualize predictions
- (Bonus) Try another model and compare

**Next (Week 5):** Classification models (e.g., predicting categories instead of numbers).



---

### 📤 Save Your Work to GitHub
1) File → Download → Download `.ipynb`  
2) In GitHub Desktop, **Show in Explorer** → copy the file into your `ai-journey` repo  
3) Commit: `Add Week 4 Colab notebook` → **Push origin`  
4) Add a new section in `README.md` with an **Open in Colab** badge pointing to:  
   `https://colab.research.google.com/github/YOUR_USERNAME/ai-journey/blob/main/Week_4_Intro_to_Machine_Learning_Colab.ipynb`
