## 2. Types of Machine Learning

### Overview
Machine Learning is fundamentally categorized by **how algorithms learn** from data.

### 2.1 **Supervised Learning**
- **Core Concept:** Learn from labeled examples to predict outcomes
- **Training Process:** Model learns mapping from features → labels
- **Data Required:** Large amount of labeled training data
- **Feedback:** Explicit target values guide learning

#### Classification Tasks
- **Goal:** Predict discrete categories/classes
- **Output:** Class label (e.g., SPAM/NOT_SPAM, Cat/Dog/Bird)
- **Examples:**
  - Email spam detection (Spam vs Not Spam)
  - Medical diagnosis (Disease vs Healthy)
  - Image classification (identify objects)
  - Sentiment analysis (Positive/Negative)
- **Algorithms:** Logistic Regression, Decision Trees, SVM, Neural Networks, Random Forest
- **Metrics:** Accuracy, Precision, Recall, F1-Score, Confusion Matrix

#### Regression Tasks
- **Goal:** Predict continuous numerical values
- **Output:** Real number (e.g., price, temperature, salary)
- **Examples:**
  - House price prediction based on features (size, location, age)
  - Stock price forecasting
  - Temperature prediction
  - Salary estimation
- **Algorithms:** Linear Regression, Polynomial Regression, SVR, Neural Networks
- **Metrics:** MSE (Mean Squared Error), RMSE, MAE (Mean Absolute Error), R² Score

#### Real-World Examples:
```python
# Classification Example
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target  # Labeled: 0, 1, 2 (three iris species)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
print(f"Classification Accuracy: {accuracy_score(y_test, predictions):.2%}")

# Regression Example
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Sample: predict house prices
X_houses = np.array([[1000], [1500], [2000], [2500]])  # sq ft
y_prices = np.array([200000, 300000, 400000, 500000])  # price in $

reg = LinearRegression()
reg.fit(X_houses, y_prices)
predicted_price = reg.predict([[1800]])
print(f"Predicted price for 1800 sqft: ${predicted_price[0]:,.0f}")
```


### 2.2 **Unsupervised Learning**
- **Core Concept:** Discover hidden patterns without labels
- **Training Process:** Find structure and relationships in unlabeled data
- **Data Required:** Unlabeled data (abundant, cheaper)
- **No Explicit Feedback:** Model must find patterns independently

#### Clustering
- **Goal:** Group similar data points together
- **Output:** Group assignments (clusters)
- **Examples:**
  - Customer segmentation (identify customer groups for targeted marketing)
  - Gene clustering (group genes with similar expressions)
  - Document clustering (group similar documents/news articles)
  - Image clustering (group similar images)
- **Algorithms:** K-Means, Hierarchical Clustering, DBSCAN, Gaussian Mixture Models
- **Distance Metrics:** Euclidean distance, Cosine similarity, Manhattan distance

#### Dimensionality Reduction
- **Goal:** Reduce number of features while preserving important information
- **Benefits:** Visualization, noise reduction, computational efficiency
- **Examples:**
  - PCA (Principal Component Analysis): reduce 10 features to 3 without losing much info
  - t-SNE: visualize high-dimensional data in 2D
  - Feature selection: identify most important features
- **Algorithms:** PCA, t-SNE, Autoencoders, Feature Selection

#### Association Rules
- **Goal:** Find relationships and patterns between variables
- **Examples:**
  - Market basket analysis (customers who buy X also buy Y)
  - Recommendation systems
- **Output:** Rules like "If customer buys milk and bread, they likely buy butter"
- **Algorithms:** Apriori, Eclat, FP-Growth

```python
# Unsupervised: Clustering Example
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Generate sample data
X, _ = make_blobs(n_samples=100, centers=3, random_state=42)

# K-Means clustering
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(X)
print(f"Cluster centers:\n{kmeans.cluster_centers_}")
print(f"Sample point belongs to cluster: {clusters[0]}")

# Dimensionality Reduction Example
from sklearn.decomposition import PCA

# Reduce to 2D for visualization
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print(f"Original shape: {X.shape}")
print(f"Reduced shape: {X_reduced.shape}")
print(f"Explained variance ratio: {pca.explained_variance_ratio_}")
```

### 2.3 **Reinforcement Learning**
- **Core Concept:** Learn through trial and error with rewards and penalties
- **Training Process:** Agent takes actions, receives rewards/penalties, learns optimal behavior
- **Key Component:** Environment, Agent, State, Action, Reward
- **No Labeled Data:** Learns from interaction feedback
- **Goal:** Maximize cumulative reward over time

#### How It Works:
1. **Agent** observes the **State** of environment
2. Agent takes an **Action**
3. Environment provides **Reward** (positive/negative)
4. Agent transitions to new **State**
5. Repeat: Learn which actions lead to maximum rewards

#### Real-World Applications:
- **Autonomous Vehicles:** Learn optimal driving behavior (reward: safe arrival, penalty: collision)
- **Game AI:** AlphaGo learned to beat humans at Go through self-play
- **Robotics:** Robot learns to grasp objects through trial and error
- **Trading Systems:** Learn optimal trading strategies (reward: profit, penalty: loss)
- **Recommendation Systems:** Learn which recommendations users prefer

#### Key Algorithms:
- **Q-Learning:** Learn Q-values (expected future reward for each action)
- **SARSA:** On-policy learning (learns from actual actions taken)
- **Deep Q-Networks (DQN):** Combine Q-Learning with neural networks
- **Policy Gradient Methods:** Directly learn the policy (mapping state → action)
- **Actor-Critic Methods:** Combination of policy and value learning

```python
# Simplified Reinforcement Learning Example: Grid World
import numpy as np

class GridWorld:
    def __init__(self, grid_size=5):
        self.grid_size = grid_size
        self.agent_pos = [0, 0]
        self.goal_pos = [grid_size-1, grid_size-1]
        
    def step(self, action):
        """
        Actions: 0=up, 1=down, 2=left, 3=right
        Returns: new_state, reward
        """
        new_pos = self.agent_pos.copy()
        
        if action == 0:  # up
            new_pos[0] = max(0, new_pos[0] - 1)
        elif action == 1:  # down
            new_pos[1] = min(self.grid_size-1, new_pos[1] + 1)
        elif action == 2:  # left
            new_pos[0] = max(0, new_pos[0] - 1)
        elif action == 3:  # right
            new_pos[0] = min(self.grid_size-1, new_pos[0] + 1)
        
        self.agent_pos = new_pos
        
        # Reward: +10 for reaching goal, -1 for each step (encourage efficiency)
        if self.agent_pos == self.goal_pos:
            reward = 10
        else:
            reward = -1
            
        return tuple(self.agent_pos), reward

# Simple Q-Learning example
env = GridWorld()
Q = {}  # Q-table: maps (state, action) → expected future reward
learning_rate = 0.1
discount = 0.9

# Training loop
for episode in range(100):
    state = (0, 0)
    for step in range(20):
        # Epsilon-greedy: explore randomly sometimes, exploit best action other times
        if np.random.random() < 0.1:  # 10% explore
            action = np.random.randint(0, 4)
        else:  # 90% exploit
            q_values = [Q.get((state, a), 0) for a in range(4)]
            action = np.argmax(q_values)
        
        new_state, reward = env.step(action)
        
        # Q-Learning update
        max_next_q = max([Q.get((new_state, a), 0) for a in range(4)])
        old_q = Q.get((state, action), 0)
        Q[(state, action)] = old_q + learning_rate * (
            reward + discount * max_next_q - old_q
        )
        
        state = new_state

print(f"Learned Q-values: {len(Q)} state-action pairs")
```

### Comparison Table

| Aspect | Supervised | Unsupervised | Reinforcement |
|--------|-----------|-------------|---------------|
| **Data Type** | Labeled | Unlabeled | Interactive Environment |
| **Training Goal** | Predict output | Find patterns | Maximize rewards |
| **Feedback** | Explicit labels | Self-discovery | Trial and error |
| **Examples** | Classification, Regression | Clustering, Dimensionality reduction | Game playing, Robotics |
| **Data Requirement** | Large labeled dataset | Abundant unlabeled data | Simulation environment |
| **Complexity** | Moderate | High | Very High |
| **Interpretability** | Generally high | Medium | Low (black box) |

---