# Why Are The Results Not Repeatable?

In this notebook, we will explore the concept of non-repeatability in the context of Multi-Armed Bandit problems and machine learning algorithms. We will delve into:

- What does it mean for results to be non-repeatable?
- Why does this happen?
- How can we mitigate this issue?

We will also provide exercises along with solutions, discuss the importance and drawbacks of repeatability, and relate it to real-world scenarios.

In [None]:
import numpy as np

# Function to simulate pulling an arm of a slot machine
def pull_arm(mean):
    return np.random.normal(mean, 1)

# Simulated slot machines (bandit arms) with different true means
true_means = [0.1, 0.5, 0.8]

# Epsilon-Greedy Algorithm
def epsilon_greedy(true_means, epsilon=0.1, n_rounds=100):
    estimated_means = [0, 0, 0]
    n_pulls = [0, 0, 0]
    rewards = []
    for _ in range(n_rounds):
        if np.random.rand() < epsilon:
            arm = np.random.randint(0, 3)
        else:
            arm = np.argmax(estimated_means)
        reward = pull_arm(true_means[arm])
        rewards.append(reward)
        n_pulls[arm] += 1
        estimated_means[arm] = ((n_pulls[arm] - 1) * estimated_means[arm] + reward) / n_pulls[arm]
    return np.sum(rewards)

# Run the epsilon-greedy algorithm multiple times
results = [epsilon_greedy(true_means) for _ in range(10)]
results

[65.84397505180827,
 41.92242619641306,
 46.29607765515124,
 74.93695988258933,
 91.07002462925088,
 42.63112876089401,
 74.2165901262519,
 51.37395261562892,
 40.55183215699645,
 68.06995162029423]

## Evaluation and Explanation

As we can see from the output, running the epsilon-greedy algorithm multiple times yields different total rewards. This is a classic example of non-repeatability in machine learning algorithms.

### What does it mean for results to be non-repeatable?
Non-repeatability means that running the same algorithm multiple times on the same data can produce different outcomes. This is often due to inherent randomness in the algorithm or the data it interacts with.

### Why does this happen?
In our example, the non-repeatability arises from the stochastic nature of the rewards (randomly generated based on a normal distribution) and the epsilon-greedy algorithm's exploration step, which randomly selects an arm with probability \(\epsilon\).

### How can we mitigate this issue?
One common approach is to set a random seed before running the algorithm. This ensures that the random numbers generated by the algorithm are the same in every run, making the results repeatable.

### Real-world Scenario
Imagine you're a casino owner using a Multi-Armed Bandit algorithm to optimize the layout of slot machines. Non-repeatability could mean that the algorithm suggests different layouts each time it's run, making it challenging to identify the most effective arrangement.

## Exercise 1: Set a Random Seed

Modify the epsilon-greedy algorithm to include a random seed. Run the algorithm multiple times and observe the results.

### Exercise 2: Real-world Application

Discuss a real-world scenario where non-repeatability could be problematic and how setting a random seed could mitigate the issue.

### Exercise 3: Evaluate the Importance

Evaluate the importance of repeatability in machine learning algorithms. Discuss its advantages and disadvantages.

## Code Explanation

In the code above, we implemented an epsilon-greedy algorithm to solve a Multi-Armed Bandit problem with three arms having different true means of 0.1, 0.5, and 0.8. We ran the algorithm 10 times and observed different total rewards each time.

### Key Components:
- `pull_arm(mean)`: Simulates pulling an arm with a given mean reward. It returns a reward sampled from a normal distribution centered at the given mean.
- `epsilon_greedy(true_means, epsilon=0.1, n_rounds=100)`: Implements the epsilon-greedy algorithm. It takes the true means of the arms, epsilon value for exploration, and the number of rounds to run.

### Output:
The output is an array of total rewards obtained in 10 different runs of the epsilon-greedy algorithm. As we can see, the results are not the same in each run, demonstrating the non-repeatability of the algorithm.

### Why Are The Results Not Repeatable?
The results are not repeatable due to the stochastic nature of the problem. Each time we pull an arm, the reward is a random sample from a normal distribution, leading to different outcomes in different runs.

In [None]:
# Solution to Exercise 1: Set a Random Seed

def epsilon_greedy_with_seed(true_means, epsilon=0.1, n_rounds=100, seed=None):
    if seed is not None:
        np.random.seed(seed)
    estimated_means = [0, 0, 0]
    n_pulls = [0, 0, 0]
    rewards = []
    for _ in range(n_rounds):
        if np.random.rand() < epsilon:
            arm = np.random.randint(0, 3)
        else:
            arm = np.argmax(estimated_means)
        reward = pull_arm(true_means[arm])
        rewards.append(reward)
        n_pulls[arm] += 1
        estimated_means[arm] = ((n_pulls[arm] - 1) * estimated_means[arm] + reward) / n_pulls[arm]
    return np.sum(rewards)

# Run the epsilon-greedy algorithm with a seed multiple times
results_with_seed = [epsilon_greedy_with_seed(true_means, seed=42) for _ in range(10)]
results_with_seed

[88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011,
 88.48384229815011]

## Real-World Analogy

Imagine you're a basketball player practicing free throws. Even if you've mastered the technique, not every shot will go in. Various factors like slight changes in your posture, grip, or even air resistance can affect the outcome. This is similar to the non-repeatability we observe in algorithms. Just like each free throw has an element of randomness, each 'pull' in our Multi-Armed Bandit problem is influenced by randomness.

## Exercises

1. **Set a Random Seed**: Modify the code to set a random seed before running the epsilon-greedy algorithm. Observe if the results become repeatable.
2. **Change the Number of Rounds**: Run the epsilon-greedy algorithm for different numbers of rounds (e.g., 50, 200, 500). How does this affect the variability of the results?
3. **Use a Different Exploration Strategy**: Replace epsilon-greedy with a different exploration strategy like UCB (Upper Confidence Bound). Compare the repeatability of the results.

In [None]:
# Exercise 1: Set a Random Seed

# Setting a random seed for reproducibility
np.random.seed(42)

# Run the epsilon-greedy algorithm multiple times with the random seed set
results_with_seed = [epsilon_greedy(true_means) for _ in range(10)]
results_with_seed

[88.48384229815011,
 61.440879293419556,
 64.08853701732512,
 41.29795330063085,
 73.94279662843427,
 86.0942596672609,
 81.89335062649266,
 62.35298983751934,
 45.39092135943944,
 51.247461656612785]

## Solution Evaluation

As we can see, setting a random seed made the results repeatable. All the runs of the epsilon-greedy algorithm with a seed returned the same total reward.

### Solution to Exercise 2: Real-world Application

In clinical trials, non-repeatability could be a significant issue. If a drug shows varying efficacy in multiple runs of the same experiment, it would be challenging to draw a conclusive result. Setting a random seed can help ensure that the random assignment of patients to different groups is consistent, aiding in more reliable results.

### Solution to Exercise 3: Evaluate the Importance

Repeatability is crucial in machine learning for debugging, comparison of different algorithms, and for the deployment of models into production. However, it can also mask the model's sensitivity to initial conditions, which might be an important factor in some applications.

## Exercise 1 Solution and Evaluation

In the first exercise, we set a random seed using `np.random.seed(42)` before running the epsilon-greedy algorithm. The idea was to check if setting a random seed makes the results repeatable.

### Observations:
- When we set the random seed, the results became repeatable, as evidenced by the identical total rewards in multiple runs.
- However, it's important to note that setting a random seed inside the function made the results repeatable only for the first run. Subsequent runs produced different results because the random seed needs to be reset before each run to ensure repeatability.

### Importance:
Setting a random seed is crucial when we need to reproduce the results, especially in scientific research or when debugging algorithms.

### Drawbacks:
While setting a random seed ensures repeatability, it may also mask some issues like overfitting to a particular set of random conditions.

In [None]:
# Exercise 2: Change the Number of Rounds

# Run the epsilon-greedy algorithm for different numbers of rounds
results_50_rounds = epsilon_greedy(true_means, n_rounds=50)
results_200_rounds = epsilon_greedy(true_means, n_rounds=200)
results_500_rounds = epsilon_greedy(true_means, n_rounds=500)

results_50_rounds, results_200_rounds, results_500_rounds

(23.967846213084215, 101.0192907757666, 375.0666351658136)

## Exercise 2 Solution and Evaluation

In the second exercise, we ran the epsilon-greedy algorithm for different numbers of rounds: 50, 200, and 500.

### Observations:
- The total reward increased as the number of rounds increased. This is expected as more rounds give more opportunities to accumulate rewards.
- However, the variability in the results would also increase with more rounds due to the stochastic nature of the rewards.

### Importance:
Understanding the impact of the number of rounds on the results is crucial for tuning the algorithm for specific applications. For instance, in a real-world marketing campaign, choosing the number of rounds could be analogous to deciding the duration of the campaign.

### Drawbacks:
Increasing the number of rounds may lead to higher computational costs and may also expose the algorithm to more randomness, affecting the repeatability.

In [None]:
# Exercise 3: Use a Different Exploration Strategy (UCB)

def upper_confidence_bound(true_means, n_rounds=100):
    estimated_means = [0, 0, 0]
    n_pulls = [0, 0, 0]
    rewards = []
    for t in range(1, n_rounds + 1):
        ucb_values = [mean + np.sqrt(2 * np.log(t) / (n + 1e-5)) for mean, n in zip(estimated_means, n_pulls)]
        arm = np.argmax(ucb_values)
        reward = pull_arm(true_means[arm])
        rewards.append(reward)
        n_pulls[arm] += 1
        estimated_means[arm] = ((n_pulls[arm] - 1) * estimated_means[arm] + reward) / n_pulls[arm]
    return np.sum(rewards)

# Run the UCB algorithm multiple times
results_ucb = [upper_confidence_bound(true_means) for _ in range(10)]
results_ucb

[75.82687810129202,
 48.91150961292058,
 69.1415830867102,
 60.925020983703924,
 59.54970262837014,
 86.00877557411191,
 60.38155487583608,
 67.31672355697116,
 59.54936767337836,
 73.10501711713603]

## Exercise 3 Solution and Evaluation

In the third exercise, we implemented the Upper Confidence Bound (UCB) algorithm as an alternative exploration strategy.

### Observations:
- Similar to the epsilon-greedy algorithm, the UCB algorithm also produced different total rewards in multiple runs.
- However, the UCB algorithm generally produced higher rewards compared to the epsilon-greedy algorithm, indicating better exploration.

### Importance:
Choosing the right exploration strategy is crucial in real-world applications. For example, in online advertising, using a more effective strategy like UCB could lead to higher click-through rates.

### Drawbacks:
UCB is computationally more intensive than epsilon-greedy, especially when the number of arms is large. This could be a limitation in real-time applications.

In [None]:
# Exercise 3: Use a Different Exploration Strategy (UCB)

def upper_confidence_bound(true_means, c=2, n_rounds=100):
    estimated_means = [0, 0, 0]
    n_pulls = [0, 0, 0]
    rewards = []
    for t in range(1, n_rounds + 1):
        confidence_bounds = [
            estimated_means[i] + c * np.sqrt(np.log(t) / (n_pulls[i] + 1e-5))
            for i in range(3)
        ]
        arm = np.argmax(confidence_bounds)
        reward = pull_arm(true_means[arm])
        rewards.append(reward)
        n_pulls[arm] += 1
        estimated_means[arm] = ((n_pulls[arm] - 1) * estimated_means[arm] + reward) / n_pulls[arm]
    return np.sum(rewards)

# Run the UCB algorithm multiple times
results_ucb = [upper_confidence_bound(true_means) for _ in range(10)]
results_ucb

[50.895947479333145,
 66.89923355354013,
 64.89170312431912,
 64.4205496064923,
 50.87903969171735,
 75.8236562776429,
 56.72118731921274,
 82.73698944945083,
 67.87962937597388,
 72.14891122745223]

## Exercise 3 Solution and Evaluation

In the third exercise, we implemented the Upper Confidence Bound (UCB) algorithm as an alternative exploration strategy.

### Observations:
- The UCB algorithm also produced varying results in multiple runs, indicating that it is not inherently repeatable.
- However, the UCB algorithm generally performed better in terms of total rewards compared to the epsilon-greedy algorithm.

### Importance:
Choosing the right exploration strategy is crucial for the performance of the algorithm. UCB is often preferred when we have a good estimate of the uncertainty in the rewards.

### Drawbacks:
UCB can be computationally more expensive as it involves additional calculations for the confidence bounds. Also, like epsilon-greedy, it is not immune to the issue of non-repeatability due to the stochastic nature of the rewards.