# Stochastic Processes for Machine Learning
## Time Series Analysis and Sequential Data Modeling

Welcome to the **mathematics of randomness over time**! Stochastic processes provide the theoretical foundation for understanding how random systems evolve, making them essential for time series analysis, sequential prediction, and dynamic modeling.

### What You'll Master
By the end of this notebook, you'll understand:
1. **Markov chains** - Memoryless random walks
2. **Hidden Markov models** - Observable effects of hidden states
3. **Gaussian processes** - Infinite-dimensional Bayesian models
4. **Brownian motion** - Continuous random walks
5. **Martingales** - Fair games and convergence theory
6. **Time series analysis** - ARIMA, state space models

### Why This is Essential
- **Financial modeling** - Stock prices, risk management
- **Signal processing** - Filtering noise from signals
- **Natural language** - Sequential text generation
- **Reinforcement learning** - Markov decision processes

### Real-World Applications
- **Speech recognition**: Hidden Markov models for phonemes
- **Algorithmic trading**: Mean reversion and momentum strategies
- **Weather forecasting**: Stochastic differential equations
- **Genomics**: DNA sequence analysis with Markov models

Let's dive into the beautiful theory of randomness through time! ⏰

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import norm, multivariate_normal
from scipy.linalg import expm
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel, Matern
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("Set2")
np.random.seed(42)

print("⏰ Stochastic Processes toolkit loaded!")
print("Ready to model randomness through time!")

## 1. Markov Chains: The Foundation of Sequential Modeling

### What is a Markov Chain?
A **Markov chain** is a sequence of random events where the probability of each event depends only on the state of the previous event - the **Markov property**.

**Mathematical Definition**:
```
P(X_{n+1} = j | X_n = i, X_{n-1} = i_{n-1}, ..., X_0 = i_0) = P(X_{n+1} = j | X_n = i)
```

### Key Components
1. **State space**: Set of all possible states S = {1, 2, ..., N}
2. **Transition matrix**: P_{ij} = P(X_{n+1} = j | X_n = i)
3. **Initial distribution**: π_0 = P(X_0 = i)
4. **Stationary distribution**: π such that π = πP

### Properties
- **Memoryless**: Future depends only on present, not past
- **Time-homogeneous**: Transition probabilities don't change over time
- **Ergodic**: Long-run behavior is independent of starting state

### Types of States
- **Transient**: May never return once left
- **Recurrent**: Will return with probability 1
- **Absorbing**: Once entered, never left (P_{ii} = 1)
- **Periodic**: Returns only at regular intervals

### Real-World Examples
- **Weather**: Sunny → Cloudy → Rainy → Sunny...
- **Stock market**: Bull → Bear → Bull...
- **Customer behavior**: Browse → Cart → Purchase → Return...
- **Gene expression**: Active → Inactive → Active...

In [None]:
def demonstrate_markov_chains():
    """Explore Markov chains with practical examples"""
    
    print("🔗 Markov Chains: Modeling Sequential Dependencies")
    print("=" * 52)
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    # 1. Simple weather model
    print("\n1. Weather Prediction Model")
    print("   States: Sunny, Cloudy, Rainy")
    
    # Transition matrix
    # States: 0=Sunny, 1=Cloudy, 2=Rainy
    P_weather = np.array([
        [0.7, 0.2, 0.1],  # From Sunny
        [0.3, 0.4, 0.3],  # From Cloudy
        [0.2, 0.6, 0.2]   # From Rainy
    ])
    
    states = ['Sunny', 'Cloudy', 'Rainy']
    
    # Visualize transition matrix
    im = axes[0, 0].imshow(P_weather, cmap='Blues', aspect='auto')
    axes[0, 0].set_xticks(range(3))
    axes[0, 0].set_yticks(range(3))
    axes[0, 0].set_xticklabels(states)
    axes[0, 0].set_yticklabels(states)
    axes[0, 0].set_xlabel('To State')
    axes[0, 0].set_ylabel('From State')
    axes[0, 0].set_title('Weather Transition Matrix')
    
    # Add probability values
    for i in range(3):
        for j in range(3):
            axes[0, 0].text(j, i, f'{P_weather[i, j]:.1f}', 
                           ha='center', va='center', color='white' if P_weather[i, j] > 0.4 else 'black')
    
    # Calculate stationary distribution
    eigenvals, eigenvecs = np.linalg.eig(P_weather.T)
    stationary_idx = np.argmin(np.abs(eigenvals - 1.0))
    stationary_dist = np.real(eigenvecs[:, stationary_idx])
    stationary_dist = stationary_dist / stationary_dist.sum()
    
    print(f"   Transition probabilities:")
    for i, state_from in enumerate(states):
        print(f"   {state_from}: {dict(zip(states, P_weather[i]))}")
    print(f"   Stationary distribution: {dict(zip(states, stationary_dist))}")
    
    # 2. Simulate weather sequence
    print("\n2. Weather Sequence Simulation")
    
    def simulate_markov_chain(P, initial_state, n_steps):
        """Simulate a Markov chain"""
        states = [initial_state]
        current_state = initial_state
        
        for _ in range(n_steps - 1):
            # Sample next state based on transition probabilities
            current_state = np.random.choice(len(P), p=P[current_state])
            states.append(current_state)
        
        return np.array(states)
    
    # Simulate 100 days starting from sunny
    weather_sequence = simulate_markov_chain(P_weather, 0, 100)
    weather_names = [states[i] for i in weather_sequence]
    
    # Plot sequence
    days = np.arange(len(weather_sequence))
    colors = ['gold', 'lightgray', 'blue']
    for i, state in enumerate(states):
        mask = weather_sequence == i
        axes[0, 1].scatter(days[mask], weather_sequence[mask], 
                         c=colors[i], label=state, alpha=0.7, s=20)
    
    axes[0, 1].set_xlabel('Day')
    axes[0, 1].set_ylabel('Weather State')
    axes[0, 1].set_title('100-Day Weather Simulation')
    axes[0, 1].set_yticks(range(3))
    axes[0, 1].set_yticklabels(states)
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # Calculate empirical frequencies
    empirical_freq = np.bincount(weather_sequence) / len(weather_sequence)
    print(f"   Empirical frequencies: {dict(zip(states, empirical_freq))}")
    print(f"   Convergence to stationary: Close? {np.allclose(empirical_freq, stationary_dist, atol=0.1)}")
    
    # 3. Convergence to stationary distribution
    print("\n3. Convergence Analysis")
    print("   How quickly does the chain reach equilibrium?")
    
    # Start with extreme initial distribution
    initial_dist = np.array([1.0, 0.0, 0.0])  # Always sunny
    
    # Evolve distribution over time
    n_steps = 20
    distributions = [initial_dist]
    current_dist = initial_dist.copy()
    
    for step in range(n_steps):
        current_dist = current_dist @ P_weather
        distributions.append(current_dist.copy())
    
    distributions = np.array(distributions)
    
    # Plot convergence
    steps = np.arange(n_steps + 1)
    for i, state in enumerate(states):
        axes[0, 2].plot(steps, distributions[:, i], 'o-', label=f'{state}', linewidth=2)
        axes[0, 2].axhline(y=stationary_dist[i], color=axes[0, 2].get_lines()[-1].get_color(), 
                         linestyle='--', alpha=0.7)
    
    axes[0, 2].set_xlabel('Time Steps')
    axes[0, 2].set_ylabel('Probability')
    axes[0, 2].set_title('Convergence to Stationary Distribution')
    axes[0, 2].legend()
    axes[0, 2].grid(True, alpha=0.3)
    
    # Calculate mixing time (time to get within 0.01 of stationary)
    distances = np.linalg.norm(distributions - stationary_dist, axis=1)
    mixing_time = np.argmax(distances < 0.01)
    print(f"   Mixing time (ε=0.01): {mixing_time} steps")
    
    # 4. Random walk on a graph
    print("\n4. Random Walk on a Graph")
    print("   Modeling diffusion processes")
    
    # Create a simple graph (cycle)
    n_nodes = 6
    P_graph = np.zeros((n_nodes, n_nodes))
    
    # Each node connects to its neighbors with equal probability
    for i in range(n_nodes):
        P_graph[i, (i-1) % n_nodes] = 0.5  # Left neighbor
        P_graph[i, (i+1) % n_nodes] = 0.5  # Right neighbor
    
    # Simulate random walk
    walk_length = 1000
    walk = simulate_markov_chain(P_graph, 0, walk_length)
    
    # Plot walk trajectory
    time_steps = np.arange(len(walk))
    axes[1, 0].plot(time_steps, walk, 'b-', alpha=0.7, linewidth=1)
    axes[1, 0].scatter(time_steps[::50], walk[::50], c='red', s=30, zorder=5)
    axes[1, 0].set_xlabel('Time Step')
    axes[1, 0].set_ylabel('Node')
    axes[1, 0].set_title('Random Walk on Cycle Graph')
    axes[1, 0].set_yticks(range(n_nodes))
    axes[1, 0].grid(True, alpha=0.3)
    
    # Calculate visit frequencies
    visit_freq = np.bincount(walk, minlength=n_nodes) / len(walk)
    stationary_graph = np.ones(n_nodes) / n_nodes  # Uniform for symmetric graph
    
    print(f"   Theoretical stationary: uniform ({1/n_nodes:.3f} each)")
    print(f"   Empirical frequencies: {visit_freq}")
    
    # 5. Absorbing Markov chain
    print("\n5. Absorbing Markov Chain")
    print("   Modeling processes with terminal states")
    
    # Simple model: student progress
    # States: 0=Freshman, 1=Sophomore, 2=Junior, 3=Senior, 4=Graduate, 5=Dropout
    P_student = np.array([
        [0.0, 0.8, 0.0, 0.0, 0.0, 0.2],  # Freshman
        [0.0, 0.0, 0.7, 0.0, 0.0, 0.3],  # Sophomore
        [0.0, 0.0, 0.0, 0.75, 0.0, 0.25], # Junior
        [0.0, 0.0, 0.0, 0.0, 0.85, 0.15], # Senior
        [0.0, 0.0, 0.0, 0.0, 1.0, 0.0],  # Graduate (absorbing)
        [0.0, 0.0, 0.0, 0.0, 0.0, 1.0]   # Dropout (absorbing)
    ])
    
    student_states = ['Fresh.', 'Soph.', 'Junior', 'Senior', 'Grad.', 'Drop.')
    
    # Simulate many student careers
    n_students = 1000
    outcomes = []
    
    for _ in range(n_students):
        # Start as freshman
        path = simulate_markov_chain(P_student, 0, 10)  # Max 10 years
        
        # Find final state
        if 4 in path:  # Graduated
            outcomes.append('Graduate')
        elif 5 in path:  # Dropped out
            outcomes.append('Dropout')
        else:
            outcomes.append('Still studying')
    
    # Count outcomes
    outcome_counts = pd.Series(outcomes).value_counts()
    outcome_percentages = outcome_counts / n_students * 100
    
    # Plot outcomes
    outcome_names = outcome_counts.index
    colors_outcome = ['green', 'red', 'orange']
    axes[1, 1].bar(outcome_names, outcome_percentages, color=colors_outcome, alpha=0.7)
    axes[1, 1].set_ylabel('Percentage of Students')
    axes[1, 1].set_title('Student Career Outcomes')
    axes[1, 1].tick_params(axis='x', rotation=45)
    axes[1, 1].grid(True, alpha=0.3)
    
    # Add percentage labels
    for i, (name, pct) in enumerate(zip(outcome_names, outcome_percentages)):
        axes[1, 1].text(i, pct + 1, f'{pct:.1f}%', ha='center', va='bottom')
    
    print(f"   Simulation results ({n_students} students):")
    for outcome, pct in outcome_percentages.items():
        print(f"   {outcome}: {pct:.1f}%")
    
    # 6. Page rank algorithm
    print("\n6. PageRank: Markov Chains for Web Search")
    print("   Random surfer model")
    
    # Simple web graph
    # 4 web pages with links
    n_pages = 4
    
    # Link matrix (who links to whom)
    links = np.array([
        [0, 1, 1, 0],  # Page 0 links to 1, 2
        [1, 0, 1, 1],  # Page 1 links to 0, 2, 3
        [1, 0, 0, 1],  # Page 2 links to 0, 3
        [0, 1, 1, 0]   # Page 3 links to 1, 2
    ])
    
    # Convert to transition matrix
    # Each page distributes its rank equally among its outlinks
    P_pagerank = links.astype(float)
    row_sums = P_pagerank.sum(axis=1)
    P_pagerank = P_pagerank / row_sums[:, np.newaxis]
    
    # Add damping factor (random jump)
    damping = 0.85
    n = len(P_pagerank)
    P_pagerank = damping * P_pagerank + (1 - damping) / n * np.ones((n, n))
    
    # Calculate PageRank (stationary distribution)
    eigenvals, eigenvecs = np.linalg.eig(P_pagerank.T)
    pagerank_idx = np.argmin(np.abs(eigenvals - 1.0))
    pagerank = np.real(eigenvecs[:, pagerank_idx])
    pagerank = pagerank / pagerank.sum()
    
    # Plot PageRank scores
    pages = [f'Page {i}' for i in range(n_pages)]
    bars = axes[1, 2].bar(pages, pagerank, color='skyblue', alpha=0.7)
    axes[1, 2].set_ylabel('PageRank Score')
    axes[1, 2].set_title('PageRank Algorithm Results')
    axes[1, 2].grid(True, alpha=0.3)
    
    # Add score labels
    for bar, score in zip(bars, pagerank):
        height = bar.get_height()
        axes[1, 2].text(bar.get_x() + bar.get_width()/2., height + 0.01,
                       f'{score:.3f}', ha='center', va='bottom')
    
    print(f"   Link structure:")
    for i, row in enumerate(links):
        linked_pages = [j for j, has_link in enumerate(row) if has_link]
        print(f"   Page {i} → Pages {linked_pages}")
    print(f"   PageRank scores: {dict(zip(pages, pagerank))}")
    print(f"   Most important page: {pages[np.argmax(pagerank)]}")
    
    plt.tight_layout()
    plt.show()
    
    print("\n🎯 Key Markov Chain Concepts:")
    print("• Memoryless property: future depends only on present state")
    print("• Stationary distribution: long-run equilibrium probabilities")
    print("• Mixing time: how quickly the chain forgets its initial state")
    print("• Absorbing states: terminal states that trap the process")
    print("• PageRank: uses Markov chains to rank web pages")

demonstrate_markov_chains()