# background


we are given:

Data: data.pkl → train_y (10,000), test_y (1,000). Boolean/0-1 outcomes.
and for each scenario we have the 'prophet' predictions:
```
scenario_five_prophets.pkl
scenario_one_and_two_prophets.pkl
scenario_six_prophets.pkl
scenario_three_and_four_prophets.pkl
```

we will use the game data and the 'prophets' (ie models) prediction to calculate stats on the predictors

important terms describing a model (a prophet):

True risk: ground truth of prophet probabilaty of being correct on **true** population - not the train or test set. 



The ERM algorithm simply chooses the prophet with the fewest training errors

## coding conventions:

when sampling games or prophets, sample uniformly with seed 3141



## examine data and prophets


In [None]:
import pickle

# Load the game data
with open('data.pkl', 'rb') as f:
    data = pickle.load(f)


with open('scenario_one_and_two_prophets.pkl', 'rb') as f:
    scenario_one_two = pickle.load(f)



In [None]:
data.keys()

dict_keys(['train_set', 'test_set'])

In [None]:
for key in data.keys():
    print(f'{key} with shape {data[key].shape=}')

train_set with shape data[key].shape=(10000,)
test_set with shape data[key].shape=(1000,)


In [None]:

for key in scenario_one_two.keys():
    print(f'{key} with shape {scenario_one_two[key].shape=}')

train_set with shape scenario_one_two[key].shape=(2, 10000)
test_set with shape scenario_one_two[key].shape=(2, 1000)
true_risk with shape scenario_one_two[key].shape=(2,)


In [None]:

scenario_one_two['true_risk']


array([0.2, 0.4])

lets' summarise the data:
data: a dict with keys and bool matrices of game reuslts:  

game data: a dict names data with keys: `dict_keys(['train_set', 'test_set'])`
```
train_set with shape data[key].shape=(10000,)
test_set with shape data[key].shape=(1000,)
```

prohpet data:
a dict scenario_one_two with keys: `dict_keys(['train_set', 'test_set', 'true_risk'])`
```
train_set with shape scenario_one_two[key].shape=(2, 10000) # bool
test_set with shape scenario_one_two[key].shape=(2, 1000)
true_risk with shape scenario_one_two[key].shape=(2,) # float
```

so in scenario 1 and 2 there are 2 prohpets

# general code:

things that run before all scenarios

In [None]:
import numpy as np
import pickle

np.random.seed(3141)

# Load the game data
with open('data.pkl', 'rb') as f:
    data = pickle.load(f)




In [None]:
## helper funcs

In [None]:
def erm_prophet(games, predictions):
    """Returns index of prophet with lowest error on games.
    
    Args:
        games: array of shape (n,) with true outcomes (boolean/0-1)
        predictions: array of shape (k, n) with k prophets' predictions
    
    Returns:
        int: index of prophet with minimum error
    """
    errors = (predictions != games).sum(axis=1)
    return errors.argmin()


In [None]:
def evaluate_prophet(prophet_data, selected_prophet_idx, game_data):
    """Evaluate approximation and estimation error.
    
    Args:
        prophet_data: dict with 'true_risk' array and 'test_set' predictions
        selected_prophet_idx: index of prophet selected by ERM
        game_data: dict with 'test_set' true outcomes
    
    Returns:
        dict: {'test_set_error', 'approximation_error', 'estimation_error'}
    """
    true_risks = prophet_data['true_risk']
    best_true_risk = true_risks.min()
    selected_true_risk = true_risks[selected_prophet_idx]
    
    approximation_error = best_true_risk
    estimation_error = selected_true_risk - best_true_risk
    
    # Evaluate on test set
    test_predictions = prophet_data['test_set'][selected_prophet_idx]
    test_outcomes = game_data['test_set']
    test_set_error = (test_predictions != test_outcomes).mean()
    
    return {
        'test_set_error': test_set_error,
        'approximation_error': approximation_error,
        'estimation_error': estimation_error
    }


In [None]:
def bootstrap_erm(game_data:dict, prophet_data:dict, n_trials:int, seed:int=3141, verbose: bool = False) -> dict:
    """Run bootstrap trials selecting prophet via ERM on single training games, return aggregated statistics
    
    it aggreagates results from &evaluate_prophet and calculates stats
    the mean of test set error over all trials
    the number of times in which the we chose the 'best' model (estiamtion error 0)
    the mean of the approximation errors
    the mean of the estiamtion errors
"""
    np.random.seed(seed)
    results = []
    
    for i in range(n_trials):
        game_idx = np.random.randint(0, len(game_data['train_set']))
        game_result = game_data['train_set'][game_idx:game_idx+1]
        prophet_preds = prophet_data['train_set'][:, game_idx:game_idx+1]
        
        selected = erm_prophet(game_result, prophet_preds)
        eval_result = evaluate_prophet(prophet_data, selected, game_data)
        results.append(eval_result)
    
        if verbose:
                print(f"Trial {i+1}: game={game_result[0]}, prophet={selected}, preds={prophet_preds[selected]}, test_err={eval_result['test_set_error']:.3f}, approx_err={eval_result['approximation_error']:.3f}, est_err={eval_result['estimation_error']:.3f}")
    
    test_errors = [r['test_set_error'] for r in results]
    estimation_errors = [r['estimation_error'] for r in results]
    approximation_errors = [r['approximation_error'] for r in results]
    
    return {
        'mean_test_error': np.mean(test_errors),
        'best_model_count': sum(e == 0 for e in estimation_errors),
        'mean_approximation_error': np.mean(approximation_errors),
        'mean_estimation_error': np.mean(estimation_errors)
    }

# part 1

Scenario 1: Two prophets, One game.
Load the file scenario one and two prophets.pkl
You have two prophets, one prophet with an 20% error and another prophet
with 40% error. You decide to evaluate each prophet on a single random game
(i.e. train set of size of 1), using the ERM algorithm to choose the best one.
Repeat this experiment 100 times each time selecting the best prophet using
the ERM algorithm. For each experiment:
1. Select a prophet based on the ERM algorithm Evaluate your selected
prophet on the test set and compute its average error.
Calculate the
approximation2 and estimation error3
In your report
• Report the average error of the selected prophets over the experiments.
• In how many experiments did you choose the best4 prophet?

In [None]:
import numpy as np

np.random.seed(3141)
game_idx = np.random.randint(0, 10000)

game_result = data['train_set'][game_idx]
prophet_predictions = scenario_one_two['train_set'][:, game_idx]

print(f"Game index: {game_idx}")
print(f"Game result: {game_result}")
print(f"Prophet predictions: {prophet_predictions}")

chosen = erm_prophet(game_result, prophet_predictions.reshape(-1, 1))
print(f"ERM chose prophet: {chosen}")


Game index: 3432
Game result: False
Prophet predictions: [False False]
ERM chose prophet: 0


In [None]:

evaluate_prophet(prophet_data=scenario_one_two,selected_prophet_idx=0)

(np.float64(0.2), np.float64(0.0))

In [None]:
bootstrap_erm(data,scenario_one_two,100,verbose=False)

{'mean_test_error': np.float64(0.21396),
 'best_model_count': np.int64(91),
 'mean_approximation_error': np.float64(0.19999999999999996),
 'mean_estimation_error': np.float64(0.018000000000000002)}

In [None]:

def Scenario_1():
    """
    Question 1.
    2 Prophets 1 Game.
    You may change the input & output parameters of the function as you wish.
    """
    np.random.seed(3141)
    with open('scenario_one_and_two_prophets.pkl', 'rb') as f:
        scenario_one_two = pickle.load(f)
    res = bootstrap_erm(data,scenario_one_two,100,verbose=False)

    return res

    



In [None]:
def Scenario_2():
    """
    Question 2.
    2 Prophets 10 Games.
    You may change the input & output parameters of the function as you wish.
    """
    ############### YOUR CODE GOES HERE ###############
    pass

In [None]:
def Scenario_3():
    """
    Question 3.
    500 Prophets 10 Games.
    You may change the input & output parameters of the function as you wish.
    """
    ############### YOUR CODE GOES HERE ###############
    pass

In [None]:
def Scenario_4():
    """
    Question 4.
    500 Prophets 1000 Games.
    You may change the input & output parameters of the function as you wish.
    """
    ############### YOUR CODE GOES HERE ###############
    pass

In [None]:
def Scenario_5():
    """
    Question 5.
    School of Prophets.
    You may change the input & output parameters of the function as you wish.
    """
    ############### YOUR CODE GOES HERE ###############
    pass

In [None]:
def Scenario_6():
    """
    Question 6.
    The Bias-Variance Tradeoff.
    You may change the input & output parameters of the function as you wish.
    """
    ############### YOUR CODE GOES HERE ###############
    pass

In [None]:
if __name__ == '__main__':
    

    print(f'Scenario 1 Results:')
    Scenario_1()

    print(f'Scenario 2 Results:')
    Scenario_2()

    print(f'Scenario 3 Results:')
    Scenario_3()

    print(f'Scenario 4 Results:')
    Scenario_4()

    print(f'Scenario 5 Results:')
    Scenario_5()

    print(f'Scenario 6 Results:')
    Scenario_6()