# üéì AI Bootcamp Practice Exercises
## Week 5 - Day 2: Linear Regression

---

### üìö Topics Covered Today:
- Understanding the linear equation (y = mx + b)
- Cost Function (Mean Squared Error)
- Gradient Descent optimization
- Building Linear Regression from scratch with NumPy
- Visualizing predictions and model performance

---

### üìù Instructions:
- Read each **Example** carefully before attempting exercises
- Fill in the **TODO** sections with your code
- Run each cell to check your output
- Expected outputs are provided for reference
- Ask for help if you get stuck!

---

**Let's get started! üöÄ**

---
## üîπ Section 1: Introduction & Setup

**Linear Regression** is your first machine learning algorithm! It's a supervised learning technique used to predict a continuous value (like house prices, temperatures, or salaries) based on input features.

**Why Linear Regression?**
- üéØ **Simple & Interpretable**: Easy to understand and explain
- üìà **Foundation of ML**: Many advanced algorithms build on this concept
- üöÄ **Fast Training**: Computationally efficient
- üîç **Great Baseline**: Always try linear regression first!

**The Linear Equation:**
```
y = mx + b
```
Where:
- **y**: Predicted value (output)
- **m**: Slope (how much y changes per unit of x)
- **x**: Input feature
- **b**: Intercept (y-value when x = 0)

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Display settings
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (10, 6)

print(f"NumPy version: {np.__version__}")
print("‚úÖ Setup complete!")

---
## üîπ Section 2: Basic Exercises

### Exercise 1: Understanding the Linear Equation
**Learn:** Get comfortable with the equation y = mx + b

#### üìñ Example:

In [None]:
# Example: Calculate y for different values of x
# Let's say: House Price = $50,000 per bedroom + $100,000 base price

m = 50000  # slope (price per bedroom)
b = 100000  # intercept (base price)

# Predict for different number of bedrooms
bedrooms = np.array([1, 2, 3, 4, 5])
prices = m * bedrooms + b

print("Number of Bedrooms | Predicted Price")
print("-" * 40)
for bed, price in zip(bedrooms, prices):
    print(f"      {bed}            | ${price:,}")

# Expected Output:
# Number of Bedrooms | Predicted Price
# ----------------------------------------
#       1            | $150,000
#       2            | $200,000
# ...

#### ‚úèÔ∏è Your Turn:

In [None]:
# TODO: Salary Prediction
# A company pays $5,000 per year of experience + $30,000 base salary
# Calculate salaries for people with 0, 2, 5, 10, and 15 years of experience

# TODO: Define m (slope) and b (intercept)
m = # Your code here (5000)
b = # Your code here (30000)

# TODO: Create array of years of experience
experience_years = # Your code here (use np.array)

# TODO: Calculate predicted salaries
predicted_salaries = # Your code here

# Print results
print("Years of Experience | Predicted Salary")
print("-" * 45)
for years, salary in zip(experience_years, predicted_salaries):
    print(f"        {years:2d}          | ${salary:,}")

# Expected Output:
# Years of Experience | Predicted Salary
# ---------------------------------------------
#         0          | $30,000
#         2          | $40,000
# ...

---
### Exercise 2: Visualizing the Linear Relationship
**Learn:** See how changing m and b affects the line

#### üìñ Example:

In [None]:
# Example: Visualize different lines

x = np.linspace(0, 10, 50)

# Different slopes, same intercept
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(x, 0.5*x + 2, label='m=0.5, b=2', linewidth=2)
plt.plot(x, 1.0*x + 2, label='m=1.0, b=2', linewidth=2)
plt.plot(x, 2.0*x + 2, label='m=2.0, b=2', linewidth=2)
plt.title('Different Slopes (m)', fontsize=14, fontweight='bold')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True, alpha=0.3)

# Same slope, different intercepts
plt.subplot(1, 2, 2)
plt.plot(x, 1.0*x + 1, label='m=1.0, b=1', linewidth=2)
plt.plot(x, 1.0*x + 3, label='m=1.0, b=3', linewidth=2)
plt.plot(x, 1.0*x + 5, label='m=1.0, b=5', linewidth=2)
plt.title('Different Intercepts (b)', fontsize=14, fontweight='bold')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üìä Notice how:")
print("   - Slope (m) controls how steep the line is")
print("   - Intercept (b) controls where the line crosses the y-axis")

#### ‚úèÔ∏è Your Turn:

In [None]:
# TODO: Create your own visualization
# Plot three lines with:
# Line 1: m=3, b=1
# Line 2: m=-2, b=8
# Line 3: m=0, b=5 (horizontal line)

x = np.linspace(0, 10, 50)

# TODO: Calculate y values for each line
y1 = # Your code here
y2 = # Your code here
y3 = # Your code here

# TODO: Plot all three lines
plt.figure(figsize=(10, 6))
# Your plotting code here


plt.title('My Linear Functions', fontsize=14, fontweight='bold')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

---
### Exercise 3: Generate Sample Data
**Learn:** Create synthetic data to practice linear regression

#### üìñ Example:

In [None]:
# Example: Generate data with noise

# True parameters
true_m = 2.5
true_b = 5.0

# Generate x values
n_samples = 100
X = np.random.uniform(0, 10, n_samples)

# Generate y values with some noise
noise = np.random.normal(0, 2, n_samples)  # mean=0, std=2
y = true_m * X + true_b + noise

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6, s=50, label='Data points')
plt.plot(X, true_m * X + true_b, 'r--', linewidth=2, label=f'True line: y = {true_m}x + {true_b}')
plt.title('Sample Data with Noise', fontsize=14, fontweight='bold')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Generated {n_samples} data points")
print(f"True parameters: m = {true_m}, b = {true_b}")

#### ‚úèÔ∏è Your Turn:

In [None]:
# TODO: Generate your own dataset
# Create 50 data points with true_m=3, true_b=10, noise std=3

# Set parameters
true_m = # Your code here
true_b = # Your code here
n_samples = # Your code here

# Generate data
X = # np.random.uniform(0, 20, n_samples)
noise = # np.random.normal(0, 3, n_samples)
y = # true_m * X + true_b + noise

# Visualize
plt.figure(figsize=(10, 6))
# Your plotting code here


plt.show()

print(f"Data shape: X={X.shape}, y={y.shape}")

---
## üîπ Section 3: Intermediate Exercises

### Exercise 4: Cost Function (Mean Squared Error)
**Learn:** Measure how good our predictions are

#### üìñ Example:

In [None]:
# Example: Calculate MSE for different parameters

def calculate_mse(X, y, m, b):
    """
    Calculate Mean Squared Error
    MSE = (1/n) * Œ£(y_true - y_pred)¬≤
    """
    y_pred = m * X + b
    mse = np.mean((y - y_pred) ** 2)
    return mse

# Using the data from Exercise 3
# Try different parameter values
test_params = [
    (2.0, 5.0),
    (2.5, 5.0),  # True parameters
    (3.0, 5.0),
    (2.5, 3.0),
]

print("Testing different parameters:")
print("-" * 50)
print(f"{'m':<8} {'b':<8} {'MSE':<15}")
print("-" * 50)

for m, b in test_params:
    mse = calculate_mse(X, y, m, b)
    print(f"{m:<8.2f} {b:<8.2f} {mse:<15.2f}")

print("\nüìä Lower MSE = Better fit!")

#### ‚úèÔ∏è Your Turn:

In [None]:
# TODO: Complete the MSE function and test it

def my_mse(X, y, m, b):
    """
    Calculate Mean Squared Error
    
    Parameters:
    X: input features
    y: true values
    m: slope
    b: intercept
    
    Returns:
    mse: mean squared error
    """
    # TODO: Calculate predictions
    y_pred = # Your code here
    
    # TODO: Calculate squared errors
    squared_errors = # Your code here
    
    # TODO: Calculate mean
    mse = # Your code here
    
    return mse

# Test your function
test_mse = my_mse(X, y, 2.5, 5.0)
print(f"MSE with m=2.5, b=5.0: {test_mse:.2f}")

# TODO: Try finding better parameters by trial and error
# Test at least 5 different combinations of m and b
# Print which combination gives the lowest MSE

---
## üîπ Section 4: Advanced Exercises

### Exercise 5: Implement Linear Regression from Scratch
**Learn:** Build the complete algorithm using gradient descent

#### üìñ Example:

In [None]:
# Example: Complete Linear Regression implementation

class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.m = 0  # slope
        self.b = 0  # intercept
        self.cost_history = []
        
    def fit(self, X, y):
        """
        Train the model using gradient descent
        """
        n_samples = len(X)
        
        for i in range(self.n_iterations):
            # Make predictions
            y_pred = self.m * X + self.b
            
            # Calculate cost (MSE)
            cost = (1/n_samples) * np.sum((y - y_pred) ** 2)
            self.cost_history.append(cost)
            
            # Calculate gradients
            dm = -(2/n_samples) * np.sum(X * (y - y_pred))
            db = -(2/n_samples) * np.sum(y - y_pred)
            
            # Update parameters
            self.m = self.m - self.learning_rate * dm
            self.b = self.b - self.learning_rate * db
            
            # Print progress every 100 iterations
            if (i+1) % 100 == 0:
                print(f"Iteration {i+1}: Cost = {cost:.4f}, m = {self.m:.4f}, b = {self.b:.4f}")
    
    def predict(self, X):
        """
        Make predictions
        """
        return self.m * X + self.b

# Train the model
model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X, y)

print(f"\n‚úÖ Training complete!")
print(f"Final parameters: m = {model.m:.4f}, b = {model.b:.4f}")
print(f"True parameters: m = {true_m}, b = {true_b}")

#### üìä Visualize Training Progress:

In [None]:
# Plot cost function over iterations
plt.figure(figsize=(10, 6))
plt.plot(model.cost_history, linewidth=2)
plt.title('Cost Function Over Time', fontsize=14, fontweight='bold')
plt.xlabel('Iteration')
plt.ylabel('Cost (MSE)')
plt.grid(True, alpha=0.3)
plt.show()

print("üìâ The cost decreases over time - our model is learning!")

#### üìà Visualize Final Predictions:

In [None]:
# Plot predictions vs actual data
y_pred = model.predict(X)

plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6, s=50, label='Actual data')
plt.plot(X, y_pred, 'r-', linewidth=2, label=f'Predicted line: y = {model.m:.2f}x + {model.b:.2f}')
plt.plot(X, true_m * X + true_b, 'g--', linewidth=2, alpha=0.7, label=f'True line: y = {true_m}x + {true_b}')
plt.title('Linear Regression Results', fontsize=14, fontweight='bold')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Calculate final MSE
final_mse = np.mean((y - y_pred) ** 2)
print(f"Final MSE: {final_mse:.4f}")

#### ‚úèÔ∏è Your Turn:

In [None]:
# TODO: Implement a simplified version of Linear Regression
# Complete the missing parts

class MyLinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=500):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.m = 0
        self.b = 0
        self.costs = []
    
    def fit(self, X, y):
        n = len(X)
        
        for i in range(self.n_iterations):
            # TODO: Calculate predictions
            y_pred = # Your code here
            
            # TODO: Calculate MSE cost
            cost = # Your code here
            self.costs.append(cost)
            
            # TODO: Calculate gradients
            dm = # Your code here: -(2/n) * sum(X * (y - y_pred))
            db = # Your code here: -(2/n) * sum(y - y_pred)
            
            # TODO: Update parameters
            self.m = # Your code here
            self.b = # Your code here
    
    def predict(self, X):
        return self.m * X + self.b

# Test your implementation
my_model = MyLinearRegression(learning_rate=0.01, n_iterations=500)
my_model.fit(X, y)

print(f"Your model: m = {my_model.m:.4f}, b = {my_model.b:.4f}")
print(f"Example model: m = {model.m:.4f}, b = {model.b:.4f}")

# TODO: Visualize your results
# Plot the cost history and final predictions

---
## üîπ Section 5: Final Challenge

### Challenge: Real Dataset - Salary Prediction
**Task:** Apply Linear Regression to predict salaries based on years of experience

In [None]:
# Challenge: Salary Prediction Dataset

# Create realistic salary data
np.random.seed(42)
years_experience = np.array([1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
salary = np.array([39343, 46205, 37731, 43525, 39891, 56642, 60150, 54445, 64445, 57189, 63218, 55794, 56957, 57081, 61111, 67938, 66029, 83088, 81363, 93940, 91738, 98273, 101302, 113812, 109431, 105582, 116969, 112635, 122391, 121872])

print("üìä Salary Dataset")
print(f"Number of samples: {len(years_experience)}")
print(f"Experience range: {years_experience.min():.1f} - {years_experience.max():.1f} years")
print(f"Salary range: ${salary.min():,} - ${salary.max():,}")

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(years_experience, salary, alpha=0.6, s=80, color='blue', edgecolors='black')
plt.title('Salary vs Years of Experience', fontsize=14, fontweight='bold')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# TODO: Complete the challenge
# 1. Train a Linear Regression model on this data
# 2. Plot the cost function history
# 3. Visualize predictions vs actual data
# 4. Make predictions for someone with 5 and 15 years of experience
# 5. Calculate and print the final MSE

# Step 1: Train model
salary_model = # Your code here

# Step 2: Plot cost history
# Your code here

# Step 3: Visualize predictions
# Your code here

# Step 4: Make predictions
pred_5_years = # Your code here
pred_15_years = # Your code here

print(f"\nüí∞ Salary Predictions:")
print(f"5 years experience: ${pred_5_years:,.2f}")
print(f"15 years experience: ${pred_15_years:,.2f}")

# Step 5: Calculate MSE
# Your code here

---
## üîπ Section 6: Summary & Key Concepts

### üéØ What You Learned Today:

**1. Linear Regression Fundamentals:**
- The equation: `y = mx + b`
- Slope (m) controls the steepness
- Intercept (b) is the starting point
- Used for predicting continuous values

**2. Cost Function (MSE):**
- Measures prediction error
- MSE = Average of squared errors
- Lower MSE = Better model
- Goal: Minimize MSE

**3. Gradient Descent:**
- Optimization algorithm to find best parameters
- Iteratively updates m and b
- Learning rate controls step size
- Converges to minimum error

**4. Implementation:**
- Built Linear Regression from scratch using NumPy
- Trained models on synthetic and real data
- Visualized results and training progress
- Made predictions on new data

---

### üìå Key Takeaways:

‚úÖ **Linear Regression is simple yet powerful** - great baseline for any regression problem  
‚úÖ **Gradient Descent finds optimal parameters** - used in most ML algorithms  
‚úÖ **Visualization helps understanding** - always plot your data and results  
‚úÖ **Lower learning rate = slower but more stable** - higher = faster but may overshoot  
‚úÖ **More iterations = better convergence** - but watch for diminishing returns  

---

### üöÄ Next Steps:

- Tomorrow: **Polynomial Features and Regularization**
- Learn about overfitting and underfitting
- Explore Ridge and Lasso regression
- Build more complex models

---

### üí™ Keep Practicing!

Try these additional challenges:
1. Experiment with different learning rates
2. Add more noise to the data and see how it affects results
3. Try Linear Regression on different datasets
4. Compare your implementation with scikit-learn

**Congratulations on building your first ML algorithm from scratch! üéâ**