# 07. Implementing ML Models Using Scikit-learn | ÿ™ŸÜŸÅŸäÿ∞ ŸÜŸÖÿßÿ∞ÿ¨ ÿ™ÿπŸÑŸÖ ÿßŸÑÿ¢ŸÑÿ©
## Regression and Classification | ÿßŸÑÿßŸÜÿ≠ÿØÿßÿ± ŸàÿßŸÑÿ™ÿµŸÜŸäŸÅ

## üìö Learning Objectives

By completing this notebook, you will:
- Implement regression models with scikit-learn
- Implement classification models with scikit-learn
- Train and evaluate models using scikit-learn API
- Apply models to real datasets
- Understand the ML workflow

## üîó Prerequisites

- ‚úÖ Example 6: Data Preparation for ML (need prepared data!)
- ‚úÖ Understanding of ML concepts (supervised learning)
- ‚úÖ Basic scikit-learn knowledge

---

## Official Structure Reference

This notebook covers practical activities from **Course 05, Unit 4**:
- Implementing ML models using Scikit-learn library (regression, classification)
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 4 Practical Content

---

## üìö Prerequisites (What You Need First) | ÿßŸÑŸÖÿ™ÿ∑ŸÑÿ®ÿßÿ™ ÿßŸÑÿ£ÿ≥ÿßÿ≥Ÿäÿ©

**BEFORE starting this notebook**, you should have completed:
- ‚úÖ **Example 6: Data Preparation** - Data must be prepared for ML!
- ‚úÖ **Unit 2: Data Cleaning** - Clean data is essential!
- ‚úÖ **Understanding of ML**: What is training? What is prediction?

**If you haven't completed these**, you might struggle with:
- Understanding the ML workflow
- Knowing which model to use
- Understanding model evaluation

---

## üîó Where This Notebook Fits | ŸÖŸÉÿßŸÜ Ÿáÿ∞ÿß ÿßŸÑÿØŸÅÿ™ÿ±

**This is a key example in Unit 4: Introduction to Machine Learning**

**Why implementing ML models?**
- **After** preparing data, we build models
- **Foundation** for all ML work
- **Scikit-learn** is the standard Python ML library

**Builds on**: 
- üìì Example 6: Data Preparation (prepared data ready for models)

**Leads to**: 
- üìì Example 8: Supervised Learning (logistic regression)
- üìì Example 12: Model Evaluation (evaluate these models!)
- üìì All ML applications

**Why this order?**
1. Model implementation teaches you **the ML workflow** (essential)
2. Scikit-learn is **the standard library** (used everywhere)
3. These skills are **foundation** for all ML work

---

## The Story: Building Your First Model | ÿßŸÑŸÇÿµÿ©: ÿ®ŸÜÿßÿ° ŸÜŸÖŸàÿ∞ÿ¨ŸÉ ÿßŸÑÿ£ŸàŸÑ

Imagine you're learning to cook. **Before** creating complex dishes, you learn basic recipes - how to follow steps, use ingredients, check if it's done. **After** mastering basics, you can cook anything!

Same with ML: **Before** building complex models, we learn basic implementation - load data, train model, make predictions. **After** mastering basics, we can build any model!

---

## Why ML Model Implementation Matters | ŸÑŸÖÿßÿ∞ÿß ŸäŸáŸÖ ÿ™ŸÜŸÅŸäÿ∞ ŸÜŸÖÿßÿ∞ÿ¨ ÿ™ÿπŸÑŸÖ ÿßŸÑÿ¢ŸÑÿ©

ML model implementation is essential because:
- **Foundation**: All ML work starts with model implementation
- **Standard Library**: Scikit-learn is industry standard
- **Workflow**: Learn the complete ML pipeline
- **Practical Skills**: Build real models for real problems

**Common Student Questions:**
- **Q: Which model should I use?**
  - Answer: Depends on your problem type
  - Example: Predict price (regression) ‚Üí Linear Regression, Classify email (classification) ‚Üí Logistic Regression
  - Rule: Continuous output ‚Üí regression, Categorical output ‚Üí classification
  
- **Q: What's the ML workflow?**
  - Answer: Prepare data ‚Üí Split train/test ‚Üí Train model ‚Üí Predict ‚Üí Evaluate
  - Example: Load data ‚Üí train_test_split ‚Üí model.fit() ‚Üí model.predict() ‚Üí metrics
  - Tip: Always follow this workflow!

---

## Introduction

**Scikit-learn** provides a comprehensive toolkit for implementing machine learning models. This notebook demonstrates the complete ML workflow from data preparation to model evaluation.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
from sklearn.model_selection import train_test_split

print("‚úÖ Libraries imported!")
print("\n" + "=" * 70)
print("Implementing ML Models with Scikit-learn | ÿ™ŸÜŸÅŸäÿ∞ ŸÜŸÖÿßÿ∞ÿ¨ ÿ™ÿπŸÑŸÖ ÿßŸÑÿ¢ŸÑÿ©")
print("=" * 70)

# ============================================================================
# PART 1: PREPARE DATA | ÿ™ÿ≠ÿ∂Ÿäÿ± ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™
# ============================================================================
print("\nüìä PART 1: Prepare Data | ÿ™ÿ≠ÿ∂Ÿäÿ± ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™")
print("-" * 70)

# Create sample dataset
np.random.seed(42)
n = 200

# Regression dataset
X_reg = np.random.rand(n, 1) * 10
y_reg = 2 * X_reg.flatten() + 1 + np.random.randn(n) * 0.5

# Classification dataset
X_clf = np.random.randn(n, 2)
y_clf = (X_clf[:, 0] + X_clf[:, 1] > 0).astype(int)

print("‚úÖ Sample datasets created:")
print(f"   Regression: {X_reg.shape[0]} samples, {X_reg.shape[1]} features")
print(f"   Classification: {X_clf.shape[0]} samples, {X_clf.shape[1]} features")

# ============================================================================
# PART 2: REGRESSION MODELS | ŸÜŸÖÿßÿ∞ÿ¨ ÿßŸÑÿßŸÜÿ≠ÿØÿßÿ±
# ============================================================================
print("\n" + "=" * 70)
print("PART 2: Regression Models | ŸÜŸÖÿßÿ∞ÿ¨ ÿßŸÑÿßŸÜÿ≠ÿØÿßÿ±")
print("=" * 70)

# Split data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

# Linear Regression
print("\n‚úÖ Example 1: Linear Regression")
print("-" * 70)
model_lr = LinearRegression()
model_lr.fit(X_train_reg, y_train_reg)
y_pred_lr = model_lr.predict(X_test_reg)
mse_lr = mean_squared_error(y_test_reg, y_pred_lr)

print(f"   Model: Linear Regression")
print(f"   Mean Squared Error: {mse_lr:.4f}")
print(f"   Coefficient: {model_lr.coef_[0]:.4f}")
print(f"   Intercept: {model_lr.intercept_:.4f}")

# Decision Tree Regression
print("\n‚úÖ Example 2: Decision Tree Regression")
print("-" * 70)
model_dt_reg = DecisionTreeRegressor(random_state=42)
model_dt_reg.fit(X_train_reg, y_train_reg)
y_pred_dt = model_dt_reg.predict(X_test_reg)
mse_dt = mean_squared_error(y_test_reg, y_pred_dt)

print(f"   Model: Decision Tree Regression")
print(f"   Mean Squared Error: {mse_dt:.4f}")

# ============================================================================
# PART 3: CLASSIFICATION MODELS | ŸÜŸÖÿßÿ∞ÿ¨ ÿßŸÑÿ™ÿµŸÜŸäŸÅ
# ============================================================================
print("\n" + "=" * 70)
print("PART 3: Classification Models | ŸÜŸÖÿßÿ∞ÿ¨ ÿßŸÑÿ™ÿµŸÜŸäŸÅ")
print("=" * 70)

# Split data
X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(
    X_clf, y_clf, test_size=0.2, random_state=42
)

# Logistic Regression
print("\n‚úÖ Example 3: Logistic Regression")
print("-" * 70)
model_log = LogisticRegression(random_state=42)
model_log.fit(X_train_clf, y_train_clf)
y_pred_log = model_log.predict(X_test_clf)
acc_log = accuracy_score(y_test_clf, y_pred_log)

print(f"   Model: Logistic Regression")
print(f"   Accuracy: {acc_log:.4f} ({acc_log*100:.1f}%)")

# Decision Tree Classifier
print("\n‚úÖ Example 4: Decision Tree Classifier")
print("-" * 70)
model_dt_clf = DecisionTreeClassifier(random_state=42)
model_dt_clf.fit(X_train_clf, y_train_clf)
y_pred_dt_clf = model_dt_clf.predict(X_test_clf)
acc_dt = accuracy_score(y_test_clf, y_pred_dt_clf)

print(f"   Model: Decision Tree Classifier")
print(f"   Accuracy: {acc_dt:.4f} ({acc_dt*100:.1f}%)")

# Random Forest
print("\n‚úÖ Example 5: Random Forest Classifier")
print("-" * 70)
model_rf = RandomForestClassifier(n_estimators=100, random_state=42)
model_rf.fit(X_train_clf, y_train_clf)
y_pred_rf = model_rf.predict(X_test_clf)
acc_rf = accuracy_score(y_test_clf, y_pred_rf)

print(f"   Model: Random Forest Classifier")
print(f"   Accuracy: {acc_rf:.4f} ({acc_rf*100:.1f}%)")

# ============================================================================
# PART 4: SCIKIT-LEARN WORKFLOW | ÿ≥Ÿäÿ± ÿπŸÖŸÑ Scikit-learn
# ============================================================================
print("\n" + "=" * 70)
print("PART 4: Scikit-learn Workflow | ÿ≥Ÿäÿ± ÿπŸÖŸÑ Scikit-learn")
print("=" * 70)

print("""
‚úÖ Standard ML Workflow:

1. Import Model
   from sklearn.linear_model import LinearRegression

2. Create Instance
   model = LinearRegression()

3. Fit on Training Data
   model.fit(X_train, y_train)

4. Predict on Test Data
   y_pred = model.predict(X_test)

5. Evaluate Performance
   from sklearn.metrics import mean_squared_error
   mse = mean_squared_error(y_test, y_pred)

üí° This workflow works for ALL scikit-learn models!
""")

# ============================================================================
# SUMMARY | ÿßŸÑŸÖŸÑÿÆÿµ
# ============================================================================
print("\n" + "=" * 70)
print("Summary | ÿßŸÑŸÖŸÑÿÆÿµ")
print("=" * 70)
print("""
‚úÖ What you learned:
   1. Regression Models: Linear Regression, Decision Tree, Random Forest
   2. Classification Models: Logistic Regression, Decision Tree, Random Forest
   3. Scikit-learn Workflow: Import ‚Üí Create ‚Üí Fit ‚Üí Predict ‚Üí Evaluate
   4. Model Evaluation: MSE for regression, Accuracy for classification

üéØ Key Takeaways:
   - Regression: Predict continuous values (price, temperature)
   - Classification: Predict categories (spam/not spam, yes/no)
   - Workflow: Same for all models (fit ‚Üí predict ‚Üí evaluate)
   - Scikit-learn: Consistent API for all models

üìö Next Steps:
   - Example 8: Supervised Learning (more details on logistic regression)
   - Example 12: Model Evaluation (comprehensive evaluation techniques)
   - Try different models and compare performance
""")
print("‚úÖ Scikit-learn ML models concepts understood!")

‚úÖ Libraries imported!

Implementing ML Models with Scikit-learn | ÿ™ŸÜŸÅŸäÿ∞ ŸÜŸÖÿßÿ∞ÿ¨ ÿ™ÿπŸÑŸÖ ÿßŸÑÿ¢ŸÑÿ©

üìä PART 1: Prepare Data | ÿ™ÿ≠ÿ∂Ÿäÿ± ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™
----------------------------------------------------------------------
‚úÖ Sample datasets created:
   Regression: 200 samples, 1 features
   Classification: 200 samples, 2 features

PART 2: Regression Models | ŸÜŸÖÿßÿ∞ÿ¨ ÿßŸÑÿßŸÜÿ≠ÿØÿßÿ±

‚úÖ Example 1: Linear Regression
----------------------------------------------------------------------
   Model: Linear Regression
   Mean Squared Error: 0.2712
   Coefficient: 2.0052
   Intercept: 1.0073

‚úÖ Example 2: Decision Tree Regression
----------------------------------------------------------------------
   Model: Decision Tree Regression
   Mean Squared Error: 0.3473

PART 3: Classification Models | ŸÜŸÖÿßÿ∞ÿ¨ ÿßŸÑÿ™ÿµŸÜŸäŸÅ

‚úÖ Example 3: Logistic Regression
----------------------------------------------------------------------
   Model: Logistic Regression
   Accuracy: 1