# Introduction to Supervised Learning

**Learning Objectives:**
- Understand the fundamental concepts of supervised learning
- Learn about different types of supervised learning problems
- Implement basic classification and regression algorithms
- Evaluate model performance using appropriate metrics
- Apply supervised learning to real-world datasets

**Expected Duration:** 60-90 minutes

**Prerequisites:**
- Basic Python programming
- Understanding of basic statistics
- Familiarity with NumPy and Pandas

## 1. What is Supervised Learning?

Supervised learning is a type of machine learning where algorithms learn from labeled training data. The goal is to learn a mapping function that can predict the output for new, unseen data.

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.svm import SVC, SVR
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    mean_squared_error, mean_absolute_error, r2_score,
    confusion_matrix, classification_report
)
from sklearn.datasets import load_iris, load_boston, make_classification, make_regression
import ipywidgets as widgets
from IPython.display import display, HTML

# Set random seed for reproducibility
np.random.seed(42)

# Set style for visualizations
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

## 2. Types of Supervised Learning

### 2.1 Classification
Predicting discrete categories or classes

In [None]:
# Create synthetic classification dataset
X_class, y_class = make_classification(
    n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
    n_classes=3, n_clusters_per_class=1, random_state=42
)

print(f"Classification dataset shape: {X_class.shape}")
print(f"Classes: {np.unique(y_class)}")
print(f"Class distribution: {np.bincount(y_class)}")

# Visualize the data
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_class[:, 0], X_class[:, 1], c=y_class, cmap='viridis', alpha=0.6)
plt.title('Feature Space Visualization')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.subplot(1, 2, 2)
sns.countplot(x=y_class)
plt.title('Class Distribution')
plt.xlabel('Class')
plt.ylabel('Count')

plt.tight_layout()
plt.show()

### 2.2 Regression
Predicting continuous numerical values

In [None]:
# Create synthetic regression dataset
X_reg, y_reg = make_regression(
    n_samples=1000, n_features=20, n_informative=15, noise=0.1,
    random_state=42
)

print(f"Regression dataset shape: {X_reg.shape}")
print(f"Target range: [{y_reg.min():.2f}, {y_reg.max():.2f}]")
print(f"Target statistics: Mean={y_reg.mean():.2f}, Std={y_reg.std():.2f}")

# Visualize the data
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_reg[:, 0], y_reg, alpha=0.6)
plt.title('Feature vs Target Relationship')
plt.xlabel('Feature 1')
plt.ylabel('Target')

plt.subplot(1, 2, 2)
plt.hist(y_reg, bins=30, alpha=0.7, edgecolor='black')
plt.title('Target Distribution')
plt.xlabel('Target Value')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

## 3. The Supervised Learning Workflow

### 3.1 Data Preparation
Loading and preprocessing data for modeling

In [None]:
# Load a real-world dataset - Iris classification
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

print("Iris Dataset:")
print(f"Shape: {X_iris.shape}")
print(f"Features: {feature_names}")
print(f"Target classes: {target_names}")

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_iris, y_iris, test_size=0.2, random_state=42, stratify=y_iris
)

print(f"\nTraining set shape: {X_train.shape}")
print(f"Test set shape: {X_test.shape}")

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("\nData preprocessing completed!")

### 3.2 Model Training
Training multiple algorithms and comparing their performance

In [None]:
# Define models to compare
models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42, n_estimators=100),
    'SVM': SVC(random_state=42, probability=True)
}

# Train and evaluate models
results = {}

for name, model in models.items():
    # Train the model
    model.fit(X_train_scaled, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test_scaled)
    y_pred_proba = model.predict_proba(X_test_scaled) if hasattr(model, 'predict_proba') else None
    
    # Calculate metrics
    results[name] = {
        'accuracy': accuracy_score(y_test, y_pred),
        'precision': precision_score(y_test, y_pred, average='weighted'),
        'recall': recall_score(y_test, y_pred, average='weighted'),
        'f1_score': f1_score(y_test, y_pred, average='weighted')
    }
    
    print(f"\n{name} Results:")
    print(f"Accuracy: {results[name]['accuracy']:.4f}")
    print(f"Precision: {results[name]['precision']:.4f}")
    print(f"Recall: {results[name]['recall']:.4f}")
    print(f"F1-Score: {results[name]['f1_score']:.4f}")

# Visualize results
results_df = pd.DataFrame(results).T
plt.figure(figsize=(12, 6))
results_df.plot(kind='bar', figsize=(12, 6))
plt.title('Model Performance Comparison')
plt.ylabel('Score')
plt.xlabel('Model')
plt.xticks(rotation=45)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

### 3.3 Model Evaluation
Comprehensive evaluation of the best performing model

In [None]:
# Select best model
best_model_name = results_df['accuracy'].idxmax()
best_model = models[best_model_name]

print(f"Best performing model: {best_model_name}")

# Make predictions with best model
y_pred = best_model.predict(X_test_scaled)
y_pred_proba = best_model.predict_proba(X_test_scaled) if hasattr(best_model, 'predict_proba') else None

# Confusion Matrix
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=target_names, yticklabels=target_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')

# Classification Report
plt.subplot(1, 2, 2)
report = classification_report(y_test, y_pred, target_names=target_names, output_dict=True)
report_df = pd.DataFrame(report).iloc[:-1, :].T
sns.heatmap(report_df.iloc[:, :-1], annot=True, fmt='.3f', cmap='YlOrRd')
plt.title('Classification Report')

plt.tight_layout()
plt.show()

print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred, target_names=target_names))

## 4. Interactive Model Exploration

### 4.1 Parameter Tuning Widget
Explore how different parameters affect model performance

In [None]:
# Create interactive widgets for parameter exploration
def train_and_evaluate_model(model_type, test_size, max_depth=None, n_estimators=100):
    """Train and evaluate model with given parameters"""
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X_iris, y_iris, test_size=test_size, random_state=42, stratify=y_iris
    )
    
    # Scale features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Create and train model
    if model_type == 'Logistic Regression':
        model = LogisticRegression(random_state=42, max_iter=1000)
    elif model_type == 'Decision Tree':
        model = DecisionTreeClassifier(random_state=42, max_depth=max_depth)
    elif model_type == 'Random Forest':
        model = RandomForestClassifier(random_state=42, n_estimators=n_estimators, max_depth=max_depth)
    elif model_type == 'SVM':
        model = SVC(random_state=42, probability=True)
    
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    
    return accuracy, precision, recall, f1

# Create widgets
model_widget = widgets.Dropdown(
    options=['Logistic Regression', 'Decision Tree', 'Random Forest', 'SVM'],
    value='Random Forest',
    description='Model:'
)

test_size_widget = widgets.FloatSlider(
    value=0.2, min=0.1, max=0.5, step=0.05,
    description='Test Size:'
)

max_depth_widget = widgets.IntSlider(
    value=5, min=1, max=20, step=1,
    description='Max Depth:'
)

n_estimators_widget = widgets.IntSlider(
    value=100, min=10, max=200, step=10,
    description='N Estimators:'
)

output_widget = widgets.Output()

def on_button_click(b):
    with output_widget:
        output_widget.clear_output()
        
        accuracy, precision, recall, f1 = train_and_evaluate_model(
            model_widget.value, test_size_widget.value, 
            max_depth_widget.value, n_estimators_widget.value
        )
        
        print(f"Model: {model_widget.value}")
        print(f"Test Size: {test_size_widget.value}")
        print(f"Accuracy: {accuracy:.4f}")
        print(f"Precision: {precision:.4f}")
        print(f"Recall: {recall:.4f}")
        print(f"F1-Score: {f1:.4f}")

button = widgets.Button(description="Train Model")
button.on_click(on_button_click)

# Display widgets
display(widgets.VBox([
    model_widget, test_size_widget, max_depth_widget, 
    n_estimators_widget, button, output_widget
]))

### 4.2 Feature Importance Analysis
Understand which features contribute most to predictions

In [None]:
# Train a Random Forest for feature importance
rf_model = RandomForestClassifier(random_state=42, n_estimators=100)
rf_model.fit(X_train_scaled, y_train)

# Get feature importance
feature_importance = rf_model.feature_importances_
feature_importance_df = pd.DataFrame({
    'feature': feature_names,
    'importance': feature_importance
}).sort_values('importance', ascending=False)

# Visualize feature importance
plt.figure(figsize=(10, 6))
sns.barplot(data=feature_importance_df, x='importance', y='feature')
plt.title('Feature Importance (Random Forest)')
plt.xlabel('Importance')
plt.tight_layout()
plt.show()

print("Feature Importance Ranking:")
for idx, row in feature_importance_df.iterrows():
    print(f"{row['feature']}: {row['importance']:.4f}")

## 5. Real-World Application: Customer Churn Prediction

Let's apply supervised learning to a practical business problem

In [None]:
# Create synthetic customer churn dataset
def create_customer_churn_data(n_samples=1000):
    """Create synthetic customer churn dataset"""
    np.random.seed(42)
    
    # Customer features
    age = np.random.normal(40, 15, n_samples)
    tenure = np.random.exponential(24, n_samples)
    monthly_charges = np.random.normal(70, 25, n_samples)
    total_charges = tenure * monthly_charges + np.random.normal(0, 100, n_samples)
    contract_type = np.random.choice([0, 1, 2], n_samples, p=[0.5, 0.3, 0.2])  # Month-to-month, 1-year, 2-year
    
    # Create churn based on features
    churn_prob = 1 / (1 + np.exp(-(
        -3 + 0.02 * age - 0.05 * tenure + 0.01 * monthly_charges - 0.8 * contract_type
    )))
    churn = np.random.binomial(1, churn_prob)
    
    # Create DataFrame
    df = pd.DataFrame({
        'age': age,
        'tenure': tenure,
        'monthly_charges': monthly_charges,
        'total_charges': total_charges,
        'contract_type': contract_type,
        'churn': churn
    })
    
    return df

# Create and explore the dataset
churn_df = create_customer_churn_data(1000)
print("Customer Churn Dataset:")
print(churn_df.head())
print(f"\nDataset shape: {churn_df.shape}")
print(f"Churn rate: {churn_df['churn'].mean():.2%}")

# Visualize relationships
plt.figure(figsize=(15, 10))

plt.subplot(2, 3, 1)
sns.boxplot(data=churn_df, x='churn', y='age')
plt.title('Age vs Churn')

plt.subplot(2, 3, 2)
sns.boxplot(data=churn_df, x='churn', y='tenure')
plt.title('Tenure vs Churn')

plt.subplot(2, 3, 3)
sns.boxplot(data=churn_df, x='churn', y='monthly_charges')
plt.title('Monthly Charges vs Churn')

plt.subplot(2, 3, 4)
sns.countplot(data=churn_df, x='contract_type', hue='churn')
plt.title('Contract Type vs Churn')
plt.xticks([0, 1, 2], ['Month-to-month', '1-year', '2-year'])

plt.subplot(2, 3, 5)
sns.scatterplot(data=churn_df, x='tenure', y='monthly_charges', hue='churn', alpha=0.6)
plt.title('Tenure vs Monthly Charges')

plt.subplot(2, 3, 6)
churn_df['churn'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Churn Distribution')

plt.tight_layout()
plt.show()

In [None]:
# Prepare data for modeling
X_churn = churn_df.drop('churn', axis=1)
y_churn = churn_df['churn']

# Split data
X_train_churn, X_test_churn, y_train_churn, y_test_churn = train_test_split(
    X_churn, y_churn, test_size=0.2, random_state=42, stratify=y_churn
)

# Scale features
scaler_churn = StandardScaler()
X_train_churn_scaled = scaler_churn.fit_transform(X_train_churn)
X_test_churn_scaled = scaler_churn.transform(X_test_churn)

# Train models
churn_models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Random Forest': RandomForestClassifier(random_state=42, n_estimators=100),
    'Gradient Boosting': RandomForestClassifier(random_state=42, n_estimators=100)
}

churn_results = {}

for name, model in churn_models.items():
    model.fit(X_train_churn_scaled, y_train_churn)
    y_pred = model.predict(X_test_churn_scaled)
    y_pred_proba = model.predict_proba(X_test_churn_scaled)[:, 1] if hasattr(model, 'predict_proba') else None
    
    churn_results[name] = {
        'accuracy': accuracy_score(y_test_churn, y_pred),
        'precision': precision_score(y_test_churn, y_pred),
        'recall': recall_score(y_test_churn, y_pred),
        'f1_score': f1_score(y_test_churn, y_pred),
        'auc_roc': roc_auc_score(y_test_churn, y_pred_proba) if y_pred_proba is not None else None
    }

# Display results
churn_results_df = pd.DataFrame(churn_results).T
print("Churn Prediction Model Performance:")
display(churn_results_df)

# Visualize results
plt.figure(figsize=(12, 6))
churn_results_df.drop('auc_roc', axis=1).plot(kind='bar', figsize=(12, 6))
plt.title('Churn Prediction Model Performance')
plt.ylabel('Score')
plt.xlabel('Model')
plt.xticks(rotation=45)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## 6. Key Concepts Summary

### What We Learned:
1. **Types of Supervised Learning**: Classification (discrete outputs) vs Regression (continuous outputs)
2. **Model Training**: How algorithms learn from labeled data
3. **Evaluation Metrics**: Accuracy, precision, recall, F1-score for classification
4. **Cross-Validation**: Ensuring models generalize well to unseen data
5. **Feature Importance**: Understanding which features drive predictions
6. **Real-World Applications**: Applying supervised learning to business problems

### Best Practices:
- Always split data into training and test sets
- Use appropriate evaluation metrics for your problem
- Scale features when using distance-based algorithms
- Perform hyperparameter tuning for optimal performance
- Consider class imbalance in classification problems
- Validate models on multiple metrics, not just accuracy

### Next Steps:
- Explore more advanced algorithms (XGBoost, LightGBM)
- Learn about ensemble methods and stacking
- Study feature engineering techniques
- Dive into hyperparameter optimization
- Learn about model deployment and monitoring

## 7. Exercises and Challenges

### Exercise 1: Model Comparison
Train at least 3 different classification models on the Iris dataset and compare their performance using multiple metrics.

### Exercise 2: Hyperparameter Tuning
Use GridSearchCV or RandomizedSearchCV to find the best hyperparameters for a Random Forest classifier.

### Exercise 3: Feature Engineering
Create new features from the customer churn dataset and see if they improve model performance.

### Exercise 4: Imbalanced Data
Create an imbalanced dataset and apply techniques like SMOTE or class weights to handle the imbalance.

### Exercise 5: Cross-Validation
Implement k-fold cross-validation and compare the results with a simple train-test split.

**Challenge**: Build a complete machine learning pipeline that includes data preprocessing, model training, evaluation, and deployment considerations.

## 8. Further Learning Resources

### Books:
- "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron
- "Pattern Recognition and Machine Learning" by Christopher Bishop
- "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

### Online Courses:
- Andrew Ng's Machine Learning Course (Coursera)
- Fast.ai Practical Deep Learning course
- Google's Machine Learning Crash Course

### Documentation:
- [Scikit-learn Documentation](https://scikit-learn.org/stable/)
- [Pandas Documentation](https://pandas.pydata.org/docs/)
- [NumPy Documentation](https://numpy.org/doc/)

### Practice Platforms:
- Kaggle (kaggle.com)
- DataCamp (datacamp.com)
- LeetCode (leetcode.com)

### Community:
- Stack Overflow
- Reddit r/MachineLearning
- Towards Data Science (Medium)