# Advanced Exploration Strategies

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## ðŸ”— Prerequisites

- âœ… Basic Python
- âœ… Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 09, Unit 4** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# Advanced Exploration Strategies
## AIAT 123 - Reinforcement Learning

## Learning Objectives

- Understand exploration-exploitation trade-off
- Implement Thompson Sampling
- Apply to real-world A/B testing
- Compare exploration strategies

## Real-World Context

Optimizing exploration in recommendation systems, A/B testing, and online learning.

**Industry Impact**: Improves recommendation systems by 10-30%.

In [1]:
%pip install numpy matplotlib scipy -q
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
print('âœ… Setup complete!')

Note: you may need to restart the kernel to use updated packages.


âœ… Setup complete!


## Part 1: Thompson Sampling


In [2]:
class ThompsonSampling:
    """
    Thompson Sampling for multi-armed bandit.
    
    Real-world: A/B testing, recommendation systems
    """
    def __init__(self, n_arms):
        self.n_arms = n_arms
        self.successes = np.zeros(n_arms)
        self.failures = np.zeros(n_arms)
    
    def select_arm(self):
        """Select arm using Thompson Sampling"""
        samples = []
        for arm in range(self.n_arms):
            # Sample from Beta distribution
            sample = beta.rvs(self.successes[arm] + 1, self.failures[arm] + 1)
            samples.append(sample)
        return np.argmax(samples)
    
    def update(self, arm, reward):
        """Update arm statistics"""
        if reward > 0:
            self.successes[arm] += 1
        else:
            self.failures[arm] += 1

print('âœ… Thompson Sampling implemented')

âœ… Thompson Sampling implemented


## Part 2: A/B Testing Application


In [3]:
# Simulate A/B testing scenario
# Real-world: Testing different website designs
n_trials = 1000
true_ctrs = [0.1, 0.15, 0.12, 0.18, 0.11]  # True click-through rates
n_arms = len(true_ctrs)

# Thompson Sampling
ts = ThompsonSampling(n_arms)
rewards_ts = []

for trial in range(n_trials):
    arm = ts.select_arm()
    reward = 1 if np.random.random() < true_ctrs[arm] else 0
    ts.update(arm, reward)
    rewards_ts.append(reward)

print(f'Thompson Sampling Results:')
print(f'Total reward: {sum(rewards_ts)}')
print(f'Optimal arm pulls: {np.sum([ts.select_arm() == np.argmax(true_ctrs) for _ in range(100)])}%')
print('\nâœ… A/B testing optimization demonstrated!')

Thompson Sampling Results:
Total reward: 193
Optimal arm pulls: 73%

âœ… A/B testing optimization demonstrated!


## Real-World Applications

- **E-commerce**: Product recommendation
- **Marketing**: Ad placement optimization
- **Healthcare**: Treatment selection
- **Finance**: Investment strategy selection

---

**End of Notebook**