# Random Guessing vs. Optimized Guessing

This notebook compares error rates between random guessing and optimized guessing strategies based on known value distributions.

## Random Guessing

Let's first define a random guessing strategy where we assign equal probability to each possible value.

In [7]:
import numpy as np

# For random guessing, we assign equal probability to each value
possible_values = list(values_and_percentages.keys())
random_guessing = {value: 1/len(possible_values) for value in possible_values}
random_guessing

{0.0: 0.2, 0.25: 0.2, 0.5: 0.2, 0.75: 0.2, 1.0: 0.2}

## Error Calculation for Random Guessing

We'll calculate both the absolute error and mean squared error (MSE) for random guessing.

In [8]:
values_and_percentages = {
    0.0: 0.05,
    0.25: 0.2,
    0.5: 0.5,
    0.75: 0.2,
    1.0: 0.05
}


## Optimized Guessing

For optimized guessing, we want to minimize the expected error by using our knowledge of the true distribution. The optimal strategy depends on our error metric:
- For absolute error: the median minimizes the expected error
- For MSE: the mean minimizes the expected error

In [9]:
# Find the optimal guessing value
def find_median(distribution):
    # Sort values and create cumulative distribution
    values = sorted(distribution.keys())
    cum_prob = 0
    for value in values:
        cum_prob += distribution[value]
        if cum_prob >= 0.5:
            return value
    return values[-1]

def find_mean(distribution):
    return sum(value * prob for value, prob in distribution.items())

# Median minimizes absolute error
median_value = find_median(values_and_percentages)
# Mean minimizes MSE
mean_value = find_mean(values_and_percentages)

print(f"Optimal value for minimizing absolute error (median): {median_value}")
print(f"Optimal value for minimizing MSE (mean): {mean_value}")

Optimal value for minimizing absolute error (median): 0.5
Optimal value for minimizing MSE (mean): 0.5


In [10]:
# Calculate expected random guessing error
# For each possible prediction (guess), we calculate the weighted error across all actual values

def calculate_error_metrics(guessing_strategy, true_distribution):
    # Calculate expected absolute error
    abs_error = 0
    mse = 0
    
    for guess_value, guess_prob in guessing_strategy.items():
        for true_value, true_prob in true_distribution.items():
            # Contribution to expected absolute error
            abs_error += guess_prob * true_prob * abs(guess_value - true_value)
            # Contribution to expected MSE
            mse += guess_prob * true_prob * (guess_value - true_value)**2
    
    return abs_error, mse

# Calculate error for random guessing
random_abs_error, random_mse = calculate_error_metrics(random_guessing, values_and_percentages)

print(f"Random Guessing - Expected Absolute Error: {random_abs_error:.4f}")
print(f"Random Guessing - Expected MSE: {random_mse:.4f}")

Random Guessing - Expected Absolute Error: 0.3400
Random Guessing - Expected MSE: 0.1750


In [11]:
# Create optimized guessing strategies
optimized_abs_error_strategy = {value: 1.0 if value == median_value else 0.0 for value in possible_values}
optimized_mse_strategy = {value: 1.0 if value == mean_value else 0.0 for value in possible_values}

# If the mean is not exactly one of our possible values, we need a mixed strategy
if mean_value not in possible_values:
    # Find the closest values
    lower_value = max([v for v in possible_values if v < mean_value], default=possible_values[0])
    upper_value = min([v for v in possible_values if v > mean_value], default=possible_values[-1])
    
    # Linear interpolation to approximate the mean
    optimized_mse_strategy = {value: 0.0 for value in possible_values}
    if upper_value != lower_value:
        lower_weight = (upper_value - mean_value) / (upper_value - lower_value)
        upper_weight = (mean_value - lower_value) / (upper_value - lower_value)
        optimized_mse_strategy[lower_value] = lower_weight
        optimized_mse_strategy[upper_value] = upper_weight
    else:
        optimized_mse_strategy[lower_value] = 1.0

# Calculate errors for optimized strategies
optimized_abs_error, _ = calculate_error_metrics(optimized_abs_error_strategy, values_and_percentages)
_, optimized_mse = calculate_error_metrics(optimized_mse_strategy, values_and_percentages)

print(f"Optimized Guessing - Expected Absolute Error: {optimized_abs_error:.4f}")
print(f"Optimized Guessing - Expected MSE: {optimized_mse:.4f}")

Optimized Guessing - Expected Absolute Error: 0.1500
Optimized Guessing - Expected MSE: 0.0500


## Comparison

Let's compare the improvement of optimized guessing over random guessing.

In [12]:
abs_error_improvement = (random_abs_error - optimized_abs_error) / random_abs_error * 100
mse_improvement = (random_mse - optimized_mse) / random_mse * 100

print(f"Absolute Error Reduction: {abs_error_improvement:.2f}%")
print(f"MSE Reduction: {mse_improvement:.2f}%")

# Summary table
import pandas as pd

results = pd.DataFrame({
    'Strategy': ['Random Guessing', 'Optimized Guessing', 'Improvement (%)'],
    'Absolute Error': [random_abs_error, optimized_abs_error, abs_error_improvement],
    'MSE': [random_mse, optimized_mse, mse_improvement]
})

results

Absolute Error Reduction: 55.88%
MSE Reduction: 71.43%


Unnamed: 0,Strategy,Absolute Error,MSE
0,Random Guessing,0.34,0.175
1,Optimized Guessing,0.15,0.05
2,Improvement (%),55.882353,71.428571


$$\text{MSE}_{\text{opt}} = \sum_t P(t) (t - \mu)^2$$