# Unit 4 - Example 10: Linear Regression

## üìö Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## üîó Prerequisites

- ‚úÖ Basic Python
- ‚úÖ Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 05, Unit 4** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# Unit 4 - Example 10: Linear Regression


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

In [2]:
"""
Unit 4 - Example 10: Linear Regression
"""
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
print("=" * 70)
print("Example 10: Linear Regression")
print(" 10:  ")
print("=" * 70)

Example 10: Linear Regression
 10:  


## 


In [3]:
# 1. SIMPLE LINEAR REGRESSION


## 


In [4]:
print("\n1. Simple Linear Regression")
print("-" * 70)
np.random.seed(42)
house_size = np.linspace(1000, 4000, 100)
house_price = 50 * house_size + 100000 + np.random.normal(0, 30000, 100)
df_simple = pd.DataFrame({'size': house_size, 'price': house_price})
print("\nSample Data:")
print(df_simple.head())
X = df_simple[['size']]
y = df_simple['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model_simple = LinearRegression()
model_simple.fit(X_train, y_train)
y_train_pred = model_simple.predict(X_train)
y_test_pred = model_simple.predict(X_test)
print("\nModel Parameters:")
print(f"Intercept: {model_simple.intercept_:.2f}")
print(f"Coefficient: {model_simple.coef_[0]:.4f}")
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)
print(f"\nTraining R¬≤: {train_r2:.4f}")
print(f"Test R¬≤: {test_r2:.4f}")
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].scatter(X_train, y_train, alpha=0.6, label='Training Data')
axes[0].plot(X_train, y_train_pred, 'r-', linewidth=2, label='Regression Line')
axes[0].set_xlabel('House Size (sq ft)')
axes[0].set_ylabel('Price ($)')
axes[0].set_title('Simple Linear Regression Training')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[1].scatter(X_test, y_test, alpha=0.6, color='green', label='Test Data')
axes[1].plot(X_test, y_test_pred, 'r-', linewidth=2, label='Regression Line')
axes[1].set_xlabel('House Size (sq ft)')
axes[1].set_ylabel('Price ($)')
axes[1].set_title('Simple Linear Regression Test')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('10_linear_regression.png', dpi=300, bbox_inches='tight')
print("\n‚úì Plot saved")
plt.close()


1. Simple Linear Regression
----------------------------------------------------------------------

Sample Data:
          size          price
0  1000.000000  164901.424590
1  1030.303030  147367.222480
2  1060.606061  172460.959173
3  1090.909091  200236.350238
4  1121.212121  149036.004819

Model Parameters:
Intercept: 93385.15
Coefficient: 51.2084

Training R¬≤: 0.7209
Test R¬≤: 0.7837



‚úì Plot saved


## 


In [5]:
# 2. MULTIPLE LINEAR REGRESSION


## 


In [6]:
print("\n\n2. Multiple Linear Regression")
print("-" * 70)
np.random.seed(42)
n_samples = 200
data_multiple = {
'size': np.random.uniform(1000, 4000, n_samples),
'bedrooms': np.random.randint(2, 6, n_samples),
'age': np.random.uniform(0, 30, n_samples),
'location_score': np.random.uniform(1, 10, n_samples)
}
df_multiple = pd.DataFrame(data_multiple)
price = (50 * df_multiple['size'] + 30000 * df_multiple['bedrooms'] -
5000 * df_multiple['age'] + 15000 * df_multiple['location_score'] +
50000 + np.random.normal(0, 40000, n_samples))
df_multiple['price'] = price
X_multiple = df_multiple[['size', 'bedrooms', 'age', 'location_score']]
y_multiple = df_multiple['price']
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(
X_multiple, y_multiple, test_size=0.2, random_state=42)
model_multiple = LinearRegression()
model_multiple.fit(X_train_m, y_train_m)
y_train_pred_m = model_multiple.predict(X_train_m)
y_test_pred_m = model_multiple.predict(X_test_m)
print("\nCoefficients")
for feature, coef in zip(X_multiple.columns, model_multiple.coef_):
    print(f"  {feature}: {coef:.4f}")
test_r2_m = r2_score(y_test_m, y_test_pred_m)
print(f"\nTest R¬≤ Score: {test_r2_m:.4f}")
# Visualize
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(y_test_m, y_test_pred_m, alpha=0.6, color='green')
ax.plot([y_test_m.min(), y_test_m.max()], [y_test_m.min(), y_test_m.max()], 'r--', linewidth=2)
ax.set_xlabel('Actual Price ($)')
ax.set_ylabel('Predicted Price ($)')
ax.set_title(f'Multiple Regression: Predicted vs Actual (R¬≤ = {test_r2_m:.4f})')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('10_multiple_regression.png', dpi=300, bbox_inches='tight')
print("‚úì Multiple regression plot saved")
plt.close()
print("\n" + "=" * 70)
print("Summary")
print("=" * 70)
print("\nNext Steps: Continue to Example 11 for Classification")




2. Multiple Linear Regression
----------------------------------------------------------------------

Coefficients
  size: 44.4460
  bedrooms: 29230.2999
  age: -4661.6167
  location_score: 13817.9991

Test R¬≤ Score: 0.8521
‚úì Multiple regression plot saved

Summary

Next Steps: Continue to Example 11 for Classification


## üö´ When Linear Regression Hits a Dead End | ÿπŸÜÿØŸÖÿß ÿ™Ÿàÿßÿ¨Ÿá ÿßŸÑÿßŸÜÿ≠ÿØÿßÿ± ÿßŸÑÿÆÿ∑Ÿä ÿ∑ÿ±ŸäŸÇ ŸÖÿ≥ÿØŸàÿØ

**BEFORE**: We've learned linear regression for predicting continuous values.

**AFTER**: We discover we need to predict categories/classes, not continuous values!

**Why this matters**: Linear regression works for continuous predictions, but many problems require categorical predictions!

---

### The Problem We've Discovered

We've learned:
- ‚úÖ How to use linear regression for continuous value prediction
- ‚úÖ How to build simple and multiple linear regression models
- ‚úÖ How to evaluate regression models

**But we have a problem:**
- ‚ùì **What if we need to predict categories (e.g., spam/not spam, yes/no)?**
- ‚ùì **What if the target variable is discrete, not continuous?**
- ‚ùì **What if we need classification, not regression?**

**The Dead End:**
- Linear regression predicts continuous values (prices, temperatures, etc.)
- But many problems require categorical predictions (classes, categories, labels)
- Linear regression doesn't work well for classification problems

---

### Demonstrating the Problem

Let's see what happens when we try to use linear regression for a classification problem:


In [7]:
print("\n" + "=" * 70)
print("üö´ DEMONSTRATING THE DEAD END: Linear Regression for Classification")
print("=" * 70)

# Create a classification problem (binary: 0 or 1)
np.random.seed(42)
n_samples = 100
X_class = np.random.randn(n_samples, 2)
# Create binary classification: class 0 or 1 based on a decision boundary
y_class = ((X_class[:, 0] + X_class[:, 1]) > 0).astype(int)

print(f"\nüìä Classification Problem Created:")
print(f"   - Features: 2 numerical features")
print(f"   - Target: Binary classification (0 or 1)")
print(f"   - Goal: Predict which class each sample belongs to")

# Try to use linear regression for classification
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

print(f"\n‚ö†Ô∏è  Attempting Linear Regression for Classification:")
lr_class = LinearRegression()
lr_class.fit(X_class, y_class)
y_pred_lr = lr_class.predict(X_class)

# Convert predictions to binary (threshold at 0.5)
y_pred_binary = (y_pred_lr > 0.5).astype(int)

accuracy_lr = accuracy_score(y_class, y_pred_binary)
print(f"   - Linear Regression Accuracy: {accuracy_lr:.2%}")

print(f"\nüí° The Problem:")
print(f"   - Linear regression outputs continuous values (e.g., 0.3, 0.7, 1.2)")
print(f"   - Classification needs discrete classes (0 or 1)")
print(f"   - We have to threshold the output, which is not ideal")
print(f"   - Linear regression doesn't model probabilities well")
print(f"   - For classification problems, we need classification algorithms!")

print(f"\nüìã Real-World Classification Problems:")
print(f"   1. Email: Spam (1) or Not Spam (0)")
print(f"   2. Medical: Disease (1) or Healthy (0)")
print(f"   3. Customer: Will Buy (1) or Won't Buy (0)")
print(f"   4. Image: Cat (1) or Dog (0)")
print(f"   - All require predicting categories, not continuous values!")

print(f"\n‚û°Ô∏è  Solution Needed:")
print(f"   - We need classification algorithms (Logistic Regression, Decision Trees, etc.)")
print(f"   - We need algorithms designed for categorical predictions")
print(f"   - We need proper classification metrics (accuracy, precision, recall)")
print(f"   - This leads us to Example 11: Classification Algorithms")

print("\n" + "=" * 70)



üö´ DEMONSTRATING THE DEAD END: Linear Regression for Classification

üìä Classification Problem Created:
   - Features: 2 numerical features
   - Target: Binary classification (0 or 1)
   - Goal: Predict which class each sample belongs to

‚ö†Ô∏è  Attempting Linear Regression for Classification:
   - Linear Regression Accuracy: 97.00%

üí° The Problem:
   - Linear regression outputs continuous values (e.g., 0.3, 0.7, 1.2)
   - Classification needs discrete classes (0 or 1)
   - We have to threshold the output, which is not ideal
   - Linear regression doesn't model probabilities well
   - For classification problems, we need classification algorithms!

üìã Real-World Classification Problems:
   1. Email: Spam (1) or Not Spam (0)
   2. Medical: Disease (1) or Healthy (0)
   3. Customer: Will Buy (1) or Won't Buy (0)
   4. Image: Cat (1) or Dog (0)
   - All require predicting categories, not continuous values!

‚û°Ô∏è  Solution Needed:
   - We need classification algorithms (Logist

### What We Need Next

**The Solution**: We need classification algorithms:
- **Logistic Regression**: For binary and multi-class classification
- **Decision Trees**: For non-linear classification boundaries
- **Other classifiers**: SVM, Random Forest, etc.
- **Classification metrics**: Accuracy, precision, recall, F1-score

**This dead end leads us to Example 11: Classification Algorithms**
- Example 11 will teach us classification algorithms
- We'll learn how to predict categories instead of continuous values
- This solves the classification problem that linear regression can't handle!
