In [1]:
import sys
import os
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict

def custom_breakpointhook(*args, **kwargs):
    from IPython.core.debugger import set_trace; set_trace(*args, **kwargs)
sys.breakpointhook = custom_breakpointhook

In [2]:
def add(path):
    if path not in sys.path:
        sys.path.append(path)
        
if os.environ['USER'] == "dsweet2":
    add("/Users/dsweet2/Projects/yuchen/auction-gym/src")
else:
    add("/Users/yuchenji/PycharmProjects/auction-gym/src")

In [3]:
from main import parse_config, instantiate_agents, instantiate_auction
from ad_auction import AdAuction
from bo_bidder import BOBidder
from const_shading_bidder import ConstShadingBidder

%load_ext autoreload
%autoreload 1
%aimport ad_auction
%aimport bo_bidder
%aimport const_shading_bidder
%aimport main




In [4]:
def evaluate(ad_auction, num_iterations):
    trace = []
    for _ in range(num_iterations):
        r = ad_auction.run_episode()
        trace.append(r)
    return np.array(trace).mean()

# Advantage-E

Bidders get an advantage when they shade their bids, i.e. when they bid a little lower than the ad is really worth, because they save money if they win the auction. If they bid *too* low, however, they lose the auction, which is bad because they want the ad slot.

With `config-advantage-E.json` our agent shades it bid w/`EmpiricalShadedBidder` and the other agents bid the true cost (`TruthfulBidder`).

In [5]:
ad_auction = AdAuction("configs/config-advantage-E.json", warm_up_iterations=1000)
print(evaluate(ad_auction, num_iterations=10000))

SEED: 0
0.004258810330345611


The number printed above is our average reward for an episode.

[Actually, it's called "return" when talking about an episode. One step -- in this case, an auction -- yields a reward. When you take many steps in sequence, receiving a reward for each step, you can sum (or average) of the rewards and call it "return".]

In any case, the *return*, here, is `[our net utility - mean(other agent's net utility)] / (mean gross utility)` averaged over the `num_iterations` simulated auctions. The mean gross utility is a mean over all of the agents (including us). It's a measure of the total value generated by the auction -- the total revenue that's up for grabs. That value (i.e., revenue) is split between the bidders and the company that runs the auction.

Since the number returned by `evaluate()` is compared to the other agents in the auctions, we'llrefer to it as our *advantage*.


# TruthfulBidder vs. TO/EO/VO

Now it gets interesting. We bid truthfully (`TruthfulBidder`), and the *other* agents bid either:

- TO: Truthfully, also
- EO: Shading with `EmpiricalShadedBidder`
- VO: Shading with `ValueLearningBidder`

In the first case, TO, we receive exactly zero advantage. The actual number shown below for TO is non-zero b/c the auction process is noisy. That's why we need to run multiple times -- and that's why real auction experiments take so long to run. (And, guess what, Bayesian optimization was made for problems where evaluation takes a long time and is noisy.)

In the other two cases, EO and VO, we have negative advantage. It's better to shade your bid in a smart way than to bid full price all the time.

## Warm-up

Notice the argument `warm_up_iterations`. The tells `AdAuction` how many times to run the auction before doing any evaluation at all. During the warm-up time the learning agents (EO and VO) get a chance to learn about the dynamics of the auction. The auction simulator is interesting in that the agents are aware of each others' behavior via the auction, so they are all learning simultaneously, making bids, and learning even more from observing others' bids. If you look at the plots at the bottom of `auction-gym/src
/Getting Started with AuctionGym (2. Effects of Bid Shading).ipynb` you'll see that it can take a few iterations --each of which consists of many auction rounds -- before the plots settle down. Called a transient, it's there because the agents take time to learn.

We want to do our evaluations after the transient. That will simulate an engineer creating a new ad-bidding bot and trying to optimize it via experiment in an already-functioning ad auction market. Also, it's after the transient that the other agents are at their best.



In [6]:
for ttype in ["TO", "EO", "VO"]:
    ad_auction = AdAuction(f"configs/config-{ttype}.json",  warm_up_iterations=1000)
    print (ttype, evaluate(ad_auction, num_iterations=10000))

SEED: 0
TO -0.0005452502092600794
SEED: 0
EO -0.011657106026253928
SEED: 0
VO -0.0035311654468497353


# Plan

The goal is to optimize a bidder with Bayesian optimization. To do that we need:

1. A way to simulate a sequence of experiments. Done. See above.
2. A parameterized agent.
3. A Bayesian optimizer.


This week, work on step 2.

## BOBidder

We want to optimize parameters by observing rewards (or, in our case, "advantages"). When you optimize this way in RL, it's called *policy search*. If you look in the auction-gym codebase you'll find `PolicyLearningBidder`. That will be our starting point for BOBidder.

To get started, set up an evaluation of a hacked copy of `PolicyLearningBidder`:

- Make a copy of `PolicyLearningBidder` a new file (`bo_bidder.py`) in your Capstone directy.
- Completely remove the method `update()` -- even the signature. You can also remove the code that is referred to as "Option 1". We're only going to use "Option 2". Also, get rid of any references to `gamma` or any other variables that you're not using. `self.model` will be doing the bulk of the work.
- Copy `configs/config-advantage-E.json` to `configs/config-advantage-BO.json`. Inside, replace the `EmpiricalShadedBidder` with `BOBidder`.
- Try to get `evaluate()` to run on your new config file.

It probably won't perform well, but that's because it's not optimized. The first step is just getting it to run.

Next, you'll need to figure our how to get and set the parameters. Create two methods in `BOBidder`:
- `get_parameters(self) -> np.ndarray`, and
- `set_parameters(self, parameters: np.ndarray)`

and make them do the right things.  Fortunately, I have some code lying around that could help. See `ParametersDirect` in `parameters_direct.py`. It should help you get and set the parameters of `self.model`, which is a PyTorch module.


---

# BOBidder Test

Can we instantiate it and generate a bid?

In [7]:
rng = np.random.default_rng(17)
bob = BOBidder(rng)
bob.bid(1, None, .123)   # BOBidder does not use context. Maybe later.

0.10433203393220901

Can we run auctions with it?

In [8]:
extra_classes={'BOBidder': BOBidder}

ad_auction = AdAuction(
    f"configs/config-advantage-BO.json",  warm_up_iterations=10,
    extra_classes=extra_classes  # Let auction_gym know about our new bidder class
)
evaluate(ad_auction, num_iterations=100)

SEED: 0


0.06177991915116345

Now run a full-scale auction evaluation. Our BOBidder will run with randomly-initialized model parameters, and the rest of the bidders will be TruthfulBidders.

In [10]:
ad_auction = AdAuction("configs/config-advantage-BO.json", warm_up_iterations=1000, extra_classes=extra_classes)
print(evaluate(ad_auction, num_iterations=10000))

SEED: 0
0.06151882978200109


Apparently it's even better to shade your bid randomly than to bid truthfully!

## BOBidder vs Constant Bidder

Now create a new class, `ConstShadingBidder` (see `const_shading_bidder.py`), that just shading by a randomly-chosen constant, `gamma`.

We'll have one `BOBidder` compete against a bunch of `ConstShadingBidder`s, and we'll track the advantage of the `BOBidder`.

In [11]:
extra_classes={'BOBidder': BOBidder, 'ConstShadingBidder': ConstShadingBidder}

In [12]:
ad_auction = AdAuction(
    "configs/config-BO-vs-CO.json",
    warm_up_iterations=1,
    extra_classes=extra_classes,  # We need to tell auction_gym about our new bidders
    seed=17
)
print(evaluate(ad_auction, num_iterations=1000))

SEED: 17
ConstShadingBidder: gamma = 0.47384335650314746
ConstShadingBidder: gamma = 0.5525884678082529
ConstShadingBidder: gamma = 0.924251318419947
ConstShadingBidder: gamma = 0.8131539896897586
ConstShadingBidder: gamma = 0.1339331171426752
ConstShadingBidder: gamma = 0.41476587562765754
ConstShadingBidder: gamma = 0.6757789965345686
ConstShadingBidder: gamma = 0.8117210236657704
ConstShadingBidder: gamma = 0.16072071423238365
ConstShadingBidder: gamma = 0.7012552619418484
-0.23723001678964176


Is the result consistent, even though we don't have a fixed seed (i.e., seed='random')?

In [13]:
print(evaluate(ad_auction, num_iterations=1000))

-0.242905916937121


Yes. They're both around -0.24
With the same set of competitors (the 10 ConstShadingBidder agents) and the same BOBidder, we get approximately the same result, even though the auction process has randomness.

# Optimize it

To optimize our `BOBidder` we need to be able to:

- Propose parameters
- Set the parameters in BOBidder
- Evalute BOBidder with the parameters

You've already implemented the last two steps. We'll implement the first as just random parameter selection to get started. Later, the first step will be a Bayesian optimizer.

First, create an auction, find the `BOBidder`, and display its current parameters.

In [14]:
ad_auction = AdAuction(
    "configs/config-BO-vs-CO.json",
    warm_up_iterations=1,
    extra_classes=extra_classes,  # We need to tell auction_gym about our new bidders
    seed=17
)

bo_bidder = ad_auction.us().bidder
print ("PARAMETERS:", bo_bidder.get_parameters())

SEED: 17
ConstShadingBidder: gamma = 0.47384335650314746
ConstShadingBidder: gamma = 0.5525884678082529
ConstShadingBidder: gamma = 0.924251318419947
ConstShadingBidder: gamma = 0.8131539896897586
ConstShadingBidder: gamma = 0.1339331171426752
ConstShadingBidder: gamma = 0.41476587562765754
ConstShadingBidder: gamma = 0.6757789965345686
ConstShadingBidder: gamma = 0.8117210236657704
ConstShadingBidder: gamma = 0.16072071423238365
ConstShadingBidder: gamma = 0.7012552619418484
PARAMETERS: [-0.01156155 -0.00655526 -0.06522226  0.22283927  0.00949153 -0.13210738
  0.06465603  0.07124017 -0.11474831  0.0535613   0.00689558  0.08203357
  0.07121955 -0.00084424 -0.04082829  0.09103727  0.03131992 -0.14892773
 -0.05448692 -0.04833124 -0.02621545  0.22842921 -0.03772873  0.04773367]


Next, set the parameters to random values.

In [15]:
num_params = len(bo_bidder.get_parameters())
bo_bidder.set_parameters( .1*np.random.normal(size=(num_params,)) )

Now evaluate the `BOBidder` with the randomly-chosen parameters.

In [16]:
print(evaluate(ad_auction, num_iterations=1000))

-0.21485791372776306


To demonstrate optimization, we'll write a simple optimizer that just randomly pertubs the parameters over and over and keeps track of which parameter set performs best.

In [24]:
# Hyperparameters of our simple optimizer
num_rounds = 100
eps = .1

# Initialize the parameters, x_best, to all zeros.

x_best = np.zeros(shape=(num_params,))
y_best = -1e99


In [19]:
print (f"num_params = {num_params}")
x = x_best
for _ in range(num_rounds):
    # Evaluate the current x
    bo_bidder.set_parameters(x)
    y = evaluate(ad_auction, num_iterations=1000)
    
    # Keep track of the best so far
    if y > y_best:
        y_best = y
        x_best = x
    
    # Propose a new x that is a small perturbation
    #  of the best x.
    x = x_best + eps*np.random.normal(size=(num_params,))
    print (f"EVAL: y_best = {y_best:.4f} y = {y:.4f} x = {x[0]:.2f}, {x[1]:.2f}, ...")

num_params = 24
EVAL: y_best = -0.2193 y = -0.2193 x = -0.08, 0.01, ...
EVAL: y_best = -0.2043 y = -0.2043 x = -0.17, -0.07, ...
EVAL: y_best = -0.2043 y = -0.2437 x = -0.04, 0.08, ...
EVAL: y_best = -0.2043 y = -0.2808 x = -0.02, -0.05, ...
EVAL: y_best = -0.2043 y = -0.2435 x = -0.29, -0.01, ...
EVAL: y_best = -0.2043 y = -0.2410 x = -0.15, -0.04, ...
EVAL: y_best = -0.2043 y = -0.2337 x = -0.04, 0.09, ...
EVAL: y_best = -0.2043 y = -0.2597 x = -0.08, -0.07, ...
EVAL: y_best = -0.2043 y = -0.2588 x = 0.04, -0.15, ...
EVAL: y_best = -0.2043 y = -0.2414 x = 0.11, -0.04, ...
EVAL: y_best = -0.2043 y = -0.2244 x = -0.07, 0.01, ...
EVAL: y_best = -0.2043 y = -0.2116 x = -0.25, -0.11, ...
EVAL: y_best = -0.2043 y = -0.2699 x = -0.13, 0.01, ...
EVAL: y_best = -0.2043 y = -0.2589 x = -0.14, 0.01, ...
EVAL: y_best = -0.1798 y = -0.1798 x = -0.06, 0.01, ...
EVAL: y_best = -0.1798 y = -0.2425 x = -0.03, 0.14, ...
EVAL: y_best = -0.1798 y = -0.2217 x = -0.21, 0.04, ...
EVAL: y_best = -0.1798 y =

That's not bad for optimization by random perturbations.
Can we do better with Bayesian optimization? We'll find out.

# Bayesian Optimization

Please read through SKOpt's Bayesian optimization library [documentation](https://scikit-optimize.github.io/stable/auto_examples/bayesian-optimization.html) and see if you can apply `gp_minimize()` to this problem.

NB: The algorithm `gp_minimize()` minimizes -- hence the name :) -- but we want to *maximize* the output of `evaluate()`, so just pass the arithmetic inverse of the advantage, `-evaluate()`, to `gp_minimize()`, and it'll perform a maximization.



In [19]:
pip install scikit-optimize

Collecting scikit-optimize
  Downloading scikit_optimize-0.9.0-py2.py3-none-any.whl (100 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.3/100.3 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
Collecting pyaml>=16.9 (from scikit-optimize)
  Downloading pyaml-23.9.7-py3-none-any.whl.metadata (11 kB)
Downloading pyaml-23.9.7-py3-none-any.whl (23 kB)
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-23.9.7 scikit-optimize-0.9.0
[0mNote: you may need to restart the kernel to use updated packages.


In [17]:
from skopt import gp_minimize
from skopt.space import Real
from skopt.utils import use_named_args
import numpy as np

In [21]:
num_params = 24
param_space = [Real(-1.0, 1.0, name=f'param{i}') for i in range(num_params)]

best_advantage = None
best_round = 0

@use_named_args(param_space)
def objective(**params):
    global best_advantage, best_round

    # Convert parameter dictionary to array
    param_values = np.array(list(params.values()))
    
    # Update BOBidder's parameters
    bo_bidder.set_parameters(param_values)
 
    # Run the auction and compute the advantage
    advantage = evaluate(ad_auction, num_iterations=1000)
    print(f"Evaluating with params: {param_values}")

    # Update best advantage and round if this is the best we've seen
    current_advantage = -advantage
    if best_advantage is None or current_advantage > best_advantage:
        best_advantage = current_advantage
        best_round = len(result.x_iters)

    # Since gp_minimize minimizes the function, return the negative advantage
    return -advantage

# Perform Bayesian Optimization
result = gp_minimize(objective, param_space, n_calls=100, random_state=17)

# Best parameter values and advantage found
print("Best parameters: {}".format(result.x))
print("Best advantage: {}".format(best_advantage))
print("Best advantage achieved in round: {}".format(best_round))

Evaluating with params: [-0.63644446  0.68846768  0.77772965  0.61364266 -0.1795535  -0.79401228
 -0.560998    0.25951523  0.71984816 -0.91745487  0.48513244  0.14083472
 -0.93137361 -0.56897357 -0.02486197 -0.45458829 -0.76187313 -0.40073184
 -0.24429188 -0.9977522  -0.04473638  0.04934259 -0.90708559 -0.34982097]
Evaluating with params: [-0.1907998  -0.40648697  0.33026712  0.03524038  0.51679849 -0.64696791
  0.02498483  0.58737329  0.22723194 -0.87322206 -0.28755736  0.08310411
  0.60982207 -0.55383232  0.92714685  0.63297771 -0.34462403 -0.08025335
  0.52637657 -0.89988368  0.66224509 -0.98022388 -0.57380259 -0.64506759]
Evaluating with params: [ 0.84526771  0.70028166 -0.53726871  0.65638532 -0.423402    0.90453651
 -0.31674666  0.03960486  0.84542337 -0.97860873 -0.25565996  0.56276889
  0.00547609 -0.88116033 -0.54875404  0.91460548 -0.67069517 -0.28774941
  0.95350252 -0.09722062  0.33430573 -0.21382033  0.29970141  0.45030673]
Evaluating with params: [ 0.82599835 -0.45889336 

It achieved a best performance of 0.5408. This result is better than the best performance of the simple random optimizer.