## 4. Online Learning

### Definition
**Online Learning** trains models **incrementally**, updating parameters as new data arrives **one sample or small batch at a time**. The model continuously evolves and adapts to new information.

### How It Works

```
[Sample 1] → [Model Update 1]
                     ↓
[Sample 2] → [Model Update 2]
                     ↓
[Sample 3] → [Model Update 3]
                     ↓
         ... continuous updates ...
                     ↓
[Real-time Predictions]
```

Each new data point immediately affects model parameters.

### Characteristics

1. **Streaming Data:** Processes data as it arrives
2. **Incremental Updates:** Updates model for each sample or mini-batch
3. **Memory Efficient:** Only keeps current batch in memory
4. **Adaptive:** Quickly responds to data changes
5. **Real-time Ready:** Can make predictions immediately
6. **Continuous Learning:** Model never "finished" - always improving

### Advantages of Online Learning

#### 1. **Efficiency** 🚀
- **Memory:** Only one sample/batch in memory at a time
- **Computation:** Less resource-intensive than batch learning
- **Storage:** Don't need to store entire dataset
- **Cost:** Can run on modest hardware (even Raspberry Pi)


In [None]:
# Example: Memory-efficient streaming
import numpy as np
from sklearn.linear_model import SGDClassifier

# Instead of loading 1 million samples at once
# Process them in mini-batches of 1000

model = SGDClassifier(loss='log', random_state=42)

total_samples = 1_000_000
batch_size = 1_000

for i in range(0, total_samples, batch_size):
    # Generate mini-batch (in real scenario, stream from source)
    X_batch = np.random.rand(batch_size, 100)
    y_batch = np.random.randint(0, 2, batch_size)
    
    # Update model with current batch only
    if i == 0:
        model.partial_fit(X_batch, y_batch, classes=[0, 1])
    else:
        model.partial_fit(X_batch, y_batch)
    
    if i % 100_000 == 0:
        print(f"Processed {i:,} samples, Memory efficient!")

# Same model trained on 1M samples without loading all at once
accuracy = model.score(X_batch, y_batch)
print(f"Final accuracy: {accuracy:.4f}")


#### 2. **Adaptability** 🔄
- Quickly responds to new patterns and concept drift
- Learns immediately when market conditions change
- Can detect anomalies in real-time
- Stays current without waiting for retraining


In [None]:
# Example: Adapting to concept drift
from sklearn.datasets import make_classification
import numpy as np

# Simulate concept drift: data distribution changes over time
X_train = []
y_train = []

model = SGDClassifier(loss='log', warm_start=False)

# Phase 1: Initial data distribution
for t in range(100):
    X_t = np.random.rand(50, 20)
    y_t = (X_t[:, 0] > 0.5).astype(int)  # Rule: feature 0 > 0.5
    
    if t == 0:
        model.partial_fit(X_t, y_t, classes=[0, 1])
    else:
        model.partial_fit(X_t, y_t)

accuracy_phase1 = model.score(X_t, y_t)
print(f"Phase 1 Accuracy: {accuracy_phase1:.4f}")

# Phase 2: Concept drift - rules change
for t in range(100, 200):
    X_t = np.random.rand(50, 20)
    # NEW RULE: feature 1 > 0.5 (completely different!)
    y_t = (X_t[:, 1] > 0.5).astype(int)
    
    # Model adapts immediately to new pattern
    model.partial_fit(X_t, y_t)

accuracy_phase2 = model.score(X_t, y_t)
print(f"Phase 2 Accuracy: {accuracy_phase2:.4f} (adapted to new rule)")


#### 3. **Real-time Performance** ⚡
- Minimal latency between data and predictions
- Suitable for mission-critical applications
- Can respond to immediate opportunities/threats
- Predictions always based on latest data

Real-world example:
```
Stock Trading:
- Online ML: Model learns new market pattern in milliseconds
- Makes adjusted trading decision immediately
- Captures gains before others notice the pattern

Fraud Detection:
- Online ML: New fraud pattern detected in real-time
- Transaction blocked immediately
- Prevents financial loss

Recommendation:
- Online ML: User clicks on product
- Model instantly learns preference
- Next recommendation reflects this immediate interest
```

#### 4. **No Retraining Needed**
- Model updates continuously
- Never need full retraining from scratch
- Faster adaptation compared to batch retraining


In [None]:
# Compare: Batch vs Online for new data
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import SGDClassifier
import time

# Batch approach
X_initial = np.random.rand(100_000, 50)
y_initial = np.random.randint(0, 2, 100_000)

batch_model = RandomForestClassifier(n_estimators=100)

start = time.time()
batch_model.fit(X_initial, y_initial)  # Initial training: slow
initial_time = time.time() - start
print(f"Batch: Initial training time: {initial_time:.2f}s")

# New data arrives
X_new = np.random.rand(10_000, 50)
y_new = np.random.randint(0, 2, 10_000)

start = time.time()
# Must retrain on ALL data again!
batch_model.fit(np.vstack([X_initial, X_new]), 
                np.hstack([y_initial, y_new]))
retrain_time = time.time() - start
print(f"Batch: Retraining time: {retrain_time:.2f}s")

# Online approach
online_model = SGDClassifier(n_estimators=100)

start = time.time()
online_model.fit(X_initial, y_initial)
online_initial_time = time.time() - start
print(f"\nOnline: Initial training time: {online_initial_time:.2f}s")

start = time.time()
# Just update with new data!
online_model.partial_fit(X_new, y_new)
update_time = time.time() - start
print(f"Online: Update time: {update_time:.4f}s (much faster!)")


### When to Use Online Learning

#### 1. **Streaming Data**
- Financial data (stock prices, transactions)
- Sensor data (IoT devices, weather stations)
- Social media feeds
- Website analytics
- Network traffic monitoring

#### 2. **Real-time Applications**
- Fraud detection in financial transactions
- Spam email filtering (learns new spam patterns instantly)
- Recommendation systems (learns user preferences immediately)
- Anomaly detection in network security
- Self-driving cars (adapt to road conditions in real-time)

#### 3. **Concept Drift Scenarios**
- Where data patterns change over time
- Fashion trends evolving
- User behavior changing with seasons
- Market conditions shifting
- Disease patterns evolving

#### 4. **Memory Constraints**
- Limited RAM available
- Processing on edge devices
- Mobile applications
- Embedded systems
- IoT applications on Raspberry Pi

#### 5. **Data at Scale**
- Millions of samples arriving daily
- Cannot store complete history
- Need models that handle infinite data streams
- Storage costs prohibitive

### Disadvantages of Online Learning

#### 1. **Complexity in Implementation** 🔴
- Requires handling continuous data streams
- Managing state and incremental updates complex
- Error handling more involved
- Harder to debug (data constantly changing)


In [None]:
# Example: Complexity of managing streaming state
class OnlineLearningPipeline:
    def __init__(self):
        self.model = SGDClassifier(loss='log', warm_start=False)
        self.scaler = StandardScaler()
        self.feature_stats = None  # Must track statistics!
        self.model_version = 0
        self.training_samples = 0
        
    def process_stream(self, X_batch, y_batch):
        """
        Handle streaming data with proper state management
        """
        try:
            # Update feature statistics
            if self.feature_stats is None:
                self.feature_stats = {
                    'mean': X_batch.mean(axis=0),
                    'std': X_batch.std(axis=0)
                }
            else:
                # Incrementally update statistics (complex!)
                alpha = 0.95  # smoothing factor
                self.feature_stats['mean'] = (
                    alpha * self.feature_stats['mean'] + 
                    (1-alpha) * X_batch.mean(axis=0)
                )
            
            # Normalize features
            if self.feature_stats['std'].sum() > 0:
                X_normalized = (X_batch - self.feature_stats['mean']) / (
                    self.feature_stats['std'] + 1e-8
                )
            else:
                X_normalized = X_batch
            
            # Update model
            if self.training_samples == 0:
                self.model.partial_fit(X_normalized, y_batch, classes=[0, 1])
            else:
                self.model.partial_fit(X_normalized, y_batch)
            
            self.training_samples += len(X_batch)
            self.model_version += 1
            
            return {
                'status': 'success',
                'samples_processed': self.training_samples,
                'version': self.model_version
            }
            
        except Exception as e:
            # Error handling: Must gracefully handle mid-stream errors
            print(f"Error processing batch: {e}")
            return {'status': 'failed', 'error': str(e)}

# Usage
pipeline = OnlineLearningPipeline()
for i in range(10):
    X = np.random.rand(100, 20)
    y = np.random.randint(0, 2, 100)
    result = pipeline.process_stream(X, y)
    print(f"Batch {i}: {result}")


#### 2. **Accuracy Variability** 📉
- May underfit compared to batch learning on same data
- Sensitive to noisy mini-batches
- Requires careful hyperparameter tuning
- Risk of overfitting to recent noise
- Order of data can affect final model (unlike batch)


In [None]:
# Example: Online learning affected by data order
from sklearn.linear_model import SGDClassifier

online_model1 = SGDClassifier(loss='log', random_state=42)
online_model2 = SGDClassifier(loss='log', random_state=42)

# Create synthetic data with trend
X1_early = np.random.rand(500, 20)
y1_early = (X1_early[:, 0] > 0.5).astype(int)

X1_late = np.random.rand(500, 20)
y1_late = (X1_late[:, 0] > 0.3).astype(int)  # Different rule!

# Scenario 1: Easy data first, hard data later
X1_ordered = np.vstack([X1_early, X1_late])
y1_ordered = np.hstack([y1_early, y1_late])

online_model1.fit(X1_ordered, y1_ordered)

# Scenario 2: Hard data first, easy data later
X2_ordered = np.vstack([X1_late, X1_early])
y2_ordered = np.hstack([y1_late, y1_early])

online_model2.fit(X2_ordered, y2_ordered)

acc1 = online_model1.score(X1_late, y1_late)
acc2 = online_model2.score(X1_late, y1_late)
print(f"Model 1 (easy→hard) accuracy: {acc1:.4f}")
print(f"Model 2 (hard→easy) accuracy: {acc2:.4f}")
print(f"Difference (order matters!): {abs(acc1-acc2):.4f}")


#### 3. **Data Dependency**
- Performance heavily depends on data quality
- If noisy data arrives, model learns noise
- Continuous supply of data needed for good performance
- Cannot recover from receiving bad data

#### 4. **Hyperparameter Sensitivity** 🎯
- Learning rate becomes critical
- Batch size affects convergence
- More difficult to tune than batch learning
- Requires experimentation with streaming data


In [None]:
# Example: Learning rate impact on online learning
import matplotlib.pyplot as plt

learning_rates = [0.001, 0.01, 0.1, 0.5, 1.0]
accuracies = []

for lr in learning_rates:
    model = SGDClassifier(eta0=lr, learning_rate='constant', random_state=42)
    
    for i in range(100):
        X = np.random.rand(100, 20)
        y = np.random.randint(0, 2, 100)
        
        if i == 0:
            model.partial_fit(X, y, classes=[0, 1])
        else:
            model.partial_fit(X, y)
    
    final_acc = model.score(X, y)
    accuracies.append(final_acc)

print("Learning Rate vs Final Accuracy:")
for lr, acc in zip(learning_rates, accuracies):
    print(f"  LR={lr}: {acc:.4f}")


### Online Learning Tools and Libraries


In [None]:
# River: Modern library for online learning
# Install: pip install river

from river import linear_model, preprocessing, compose, metrics
from sklearn.datasets import make_classification
import numpy as np

# Create online learning pipeline
model = compose.Pipeline(
    preprocessing.StandardScaler(),
    linear_model.LogisticRegression()
)

# Stream data one sample at a time
metric = metrics.Accuracy()

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

for i, (xi, yi) in enumerate(zip(X, y)):
    # Make prediction on sample
    y_pred = model.predict_one(xi)
    
    # Update metric
    if y_pred is not None:
        metric.update(yi, y_pred)
    
    # Learn from sample
    model.learn_one(xi, yi)
    
    if i % 200 == 0:
        print(f"Sample {i}: Accuracy = {metric.get():.4f}")

print(f"Final Accuracy: {metric.get():.4f}")

# Scikit-learn: partial_fit for online learning
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler

X_train = np.random.rand(10000, 20)
y_train = np.random.randint(0, 2, 10000)

model = SGDClassifier(loss='log', n_jobs=-1)

# Train in mini-batches
for batch_start in range(0, len(X_train), 100):
    batch_end = min(batch_start + 100, len(X_train))
    X_batch = X_train[batch_start:batch_end]
    y_batch = y_train[batch_start:batch_end]
    
    if batch_start == 0:
        model.partial_fit(X_batch, y_batch, classes=[0, 1])
    else:
        model.partial_fit(X_batch, y_batch)

print(f"Trained on {len(X_train)} samples using online learning")


---
