# Week 3, Day 7: Review and Feedback Session

## Session Overview
This session will review the key concepts covered in Week 3 and provide practice exercises to reinforce learning:

1. Introduction to Machine Learning
2. Linear Regression
3. Logistic Regression
4. Decision Trees and Random Forests
5. Support Vector Machines

## Learning Objectives
- Reinforce key ML concepts
- Practice model selection
- Master model evaluation
- Prepare for Week 4

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import mean_squared_error, accuracy_score, classification_report

## 1. Model Selection Review

In [None]:
def model_selection_review():
    # Generate different types of data
    np.random.seed(42)
    n_samples = 300
    
    # Linear relationship
    X_linear = np.random.rand(n_samples, 1) * 10
    y_linear = 2 * X_linear + 1 + np.random.randn(n_samples, 1) * 0.5
    
    # Nonlinear relationship
    X_nonlinear = np.random.rand(n_samples, 1) * 10
    y_nonlinear = np.sin(X_nonlinear) + np.random.randn(n_samples, 1) * 0.2
    
    # Classification data
    X_class = np.random.randn(n_samples, 2)
    y_class = (X_class[:, 0] + X_class[:, 1] > 0).astype(int)
    
    # Visualize datasets
    plt.figure(figsize=(15, 5))
    
    plt.subplot(131)
    plt.scatter(X_linear, y_linear)
    plt.title('Linear Relationship')
    
    plt.subplot(132)
    plt.scatter(X_nonlinear, y_nonlinear)
    plt.title('Nonlinear Relationship')
    
    plt.subplot(133)
    plt.scatter(X_class[:, 0], X_class[:, 1], c=y_class)
    plt.title('Classification Data')
    
    plt.tight_layout()
    plt.show()
    
    print("Which model would you choose for each dataset?")
    print("1. Linear data: Linear Regression")
    print("2. Nonlinear data: Decision Trees or SVMs with nonlinear kernel")
    print("3. Classification data: Logistic Regression or SVM")

model_selection_review()

## 2. Model Evaluation Review

In [None]:
def model_evaluation_review():
    # Generate sample classification data
    np.random.seed(42)
    n_samples = 1000
    
    # Create features
    X = np.random.randn(n_samples, 3)
    y = (np.sum(X**2, axis=1) > 3).astype(int)
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train different models
    models = {
        'Logistic Regression': LogisticRegression(random_state=42),
        'Decision Tree': DecisionTreeClassifier(random_state=42),
        'Random Forest': RandomForestClassifier(random_state=42),
        'SVM': SVC(random_state=42)
    }
    
    # Evaluate models
    results = []
    for name, model in models.items():
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        
        accuracy = accuracy_score(y_test, y_pred)
        results.append({'Model': name, 'Accuracy': accuracy})
        
        print(f"\n{name} Results:")
        print(f"Accuracy: {accuracy:.4f}")
        print("\nClassification Report:")
        print(classification_report(y_test, y_pred))
    
    # Visualize results
    results_df = pd.DataFrame(results)
    plt.figure(figsize=(10, 6))
    sns.barplot(data=results_df, x='Model', y='Accuracy')
    plt.title('Model Comparison')
    plt.xticks(rotation=45)
    plt.show()

model_evaluation_review()

## 3. Comprehensive Practice Exercise

In [None]:
def comprehensive_exercise():
    # Generate synthetic dataset
    np.random.seed(42)
    n_samples = 500
    
    # Create features
    X = np.random.randn(n_samples, 4)
    # Add some noise and nonlinearity
    y = (X[:, 0]**2 + np.exp(X[:, 1]) + X[:, 2] * X[:, 3] + np.random.randn(n_samples) * 0.1)
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Scale features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    print("Exercise Tasks:")
    print("1. Analyze the relationship between features and target")
    print("2. Select appropriate model(s)")
    print("3. Train and evaluate models")
    print("4. Optimize the best performing model")
    print("5. Provide insights and recommendations")
    
    # Your solution here

comprehensive_exercise()

## Week 3 Review Quiz

### Multiple Choice Questions

1. Which model is best for linear regression?
   - a) Decision Trees
   - b) Linear Regression
   - c) SVM
   - d) Random Forest

2. What is the output range of logistic regression?
   - a) -∞ to +∞
   - b) 0 to 1
   - c) -1 to 1
   - d) 0 to ∞

3. Which model is most interpretable?
   - a) Neural Networks
   - b) Decision Trees
   - c) Random Forest
   - d) SVM

4. What does the C parameter in SVM control?
   - a) Learning rate
   - b) Regularization
   - c) Number of trees
   - d) Kernel type

5. Which metric is used for regression problems?
   - a) Accuracy
   - b) Precision
   - c) MSE
   - d) F1-score

6. What is bagging?
   - a) Feature selection
   - b) Ensemble method
   - c) Data cleaning
   - d) Model evaluation

7. Which kernel is best for nonlinear data?
   - a) Linear
   - b) RBF
   - c) Identity
   - d) None

8. What is overfitting?
   - a) High bias
   - b) High variance
   - c) Low accuracy
   - d) Low precision

9. Which is NOT a hyperparameter in Random Forest?
   - a) n_estimators
   - b) max_depth
   - c) learning_rate
   - d) min_samples_split

10. What is cross-validation used for?
    - a) Feature scaling
    - b) Model evaluation
    - c) Data cleaning
    - d) Feature selection

Answers: 1-b, 2-b, 3-b, 4-b, 5-c, 6-b, 7-b, 8-b, 9-c, 10-b

## Week 3 Summary

### Key Concepts Covered:
1. Machine Learning fundamentals and workflow
2. Supervised Learning algorithms
3. Model selection and evaluation
4. Parameter tuning and optimization

### Preparation for Week 4:
- Review challenging concepts
- Practice model implementation
- Prepare for unsupervised learning
- Review Python and scikit-learn

### Additional Resources:
- Scikit-learn documentation: https://scikit-learn.org/stable/
- Machine Learning crash course: https://developers.google.com/machine-learning/crash-course
- Feature engineering guide: https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/