In [None]:
import sys
import os
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict

def custom_breakpointhook(*args, **kwargs):
    from IPython.core.debugger import set_trace; set_trace(*args, **kwargs)
sys.breakpointhook = custom_breakpointhook

In [None]:
def add(path):
    if path not in sys.path:
        sys.path.append(path)
        
if os.environ['USER'] == "dsweet2":
    add("/Users/dsweet2/Projects/yuchen/auction-gym/src")
else:
    add("/Users/yuchenji/PycharmProjects/auction-gym/src")

In [None]:
from main import parse_config, instantiate_agents, instantiate_auction
from ad_auction import AdAuction
from bo_bidder import BOBidder
from const_shading_bidder import ConstShadingBidder

%load_ext autoreload
%autoreload 1
%aimport ad_auction
%aimport bo_bidder
%aimport const_shading_bidder
%aimport main


In [None]:
def evaluate(ad_auction, num_iterations):
    trace = []
    for _ in range(num_iterations):
        r = ad_auction.run_episode()
        trace.append(r)
    return np.array(trace).mean()

# Advantage-E

Bidders get an advantage when they shade their bids, i.e. when they bid a little lower than the ad is really worth, because they save money if they win the auction. If they bid *too* low, however, they lose the auction, which is bad because they want the ad slot.

With `config-advantage-E.json` our agent shades it bid w/`EmpiricalShadedBidder` and the other agents bid the true cost (`TruthfulBidder`).

In [None]:
ad_auction = AdAuction("configs/config-advantage-E.json", warm_up_iterations=1000)
print(evaluate(ad_auction, num_iterations=10000))

The number printed above is our average reward for an episode.

[Actually, it's called "return" when talking about an episode. One step -- in this case, an auction -- yields a reward. When you take many steps in sequence, receiving a reward for each step, you can sum (or average) of the rewards and call it "return".]

In any case, the *return*, here, is `[our net utility - mean(other agent's net utility)] / (mean gross utility)` averaged over the `num_iterations` simulated auctions. The mean gross utility is a mean over all of the agents (including us). It's a measure of the total value generated by the auction -- the total revenue that's up for grabs. That value (i.e., revenue) is split between the bidders and the company that runs the auction.

Since the number returned by `evaluate()` is compared to the other agents in the auctions, we'llrefer to it as our *advantage*.


# TruthfulBidder vs. TO/EO/VO

Now it gets interesting. We bid truthfully (`TruthfulBidder`), and the *other* agents bid either:

- TO: Truthfully, also
- EO: Shading with `EmpiricalShadedBidder`
- VO: Shading with `ValueLearningBidder`

In the first case, TO, we receive exactly zero advantage. The actual number shown below for TO is non-zero b/c the auction process is noisy. That's why we need to run multiple times -- and that's why real auction experiments take so long to run. (And, guess what, Bayesian optimization was made for problems where evaluation takes a long time and is noisy.)

In the other two cases, EO and VO, we have negative advantage. It's better to shade your bid in a smart way than to bid full price all the time.

## Warm-up

Notice the argument `warm_up_iterations`. The tells `AdAuction` how many times to run the auction before doing any evaluation at all. During the warm-up time the learning agents (EO and VO) get a chance to learn about the dynamics of the auction. The auction simulator is interesting in that the agents are aware of each others' behavior via the auction, so they are all learning simultaneously, making bids, and learning even more from observing others' bids. If you look at the plots at the bottom of `auction-gym/src
/Getting Started with AuctionGym (2. Effects of Bid Shading).ipynb` you'll see that it can take a few iterations --each of which consists of many auction rounds -- before the plots settle down. Called a transient, it's there because the agents take time to learn.

We want to do our evaluations after the transient. That will simulate an engineer creating a new ad-bidding bot and trying to optimize it via experiment in an already-functioning ad auction market. Also, it's after the transient that the other agents are at their best.



In [None]:
for ttype in ["TO", "EO", "VO"]:
    ad_auction = AdAuction(f"configs/config-{ttype}.json",  warm_up_iterations=1000)
    print (ttype, evaluate(ad_auction, num_iterations=10000))

# Plan

The goal is to optimize a bidder with Bayesian optimization. To do that we need:

1. A way to simulate a sequence of experiments. Done. See above.
2. A parameterized agent.
3. A Bayesian optimizer.


This week, work on step 2.

## BOBidder

We want to optimize parameters by observing rewards (or, in our case, "advantages"). When you optimize this way in RL, it's called *policy search*. If you look in the auction-gym codebase you'll find `PolicyLearningBidder`. That will be our starting point for BOBidder.

To get started, set up an evaluation of a hacked copy of `PolicyLearningBidder`:

- Make a copy of `PolicyLearningBidder` a new file (`bo_bidder.py`) in your Capstone directy.
- Completely remove the method `update()` -- even the signature. You can also remove the code that is referred to as "Option 1". We're only going to use "Option 2". Also, get rid of any references to `gamma` or any other variables that you're not using. `self.model` will be doing the bulk of the work.
- Copy `configs/config-advantage-E.json` to `configs/config-advantage-BO.json`. Inside, replace the `EmpiricalShadedBidder` with `BOBidder`.
- Try to get `evaluate()` to run on your new config file.

It probably won't perform well, but that's because it's not optimized. The first step is just getting it to run.

Next, you'll need to figure our how to get and set the parameters. Create two methods in `BOBidder`:
- `get_parameters(self) -> np.ndarray`, and
- `set_parameters(self, parameters: np.ndarray)`

and make them do the right things.  Fortunately, I have some code lying around that could help. See `ParametersDirect` in `parameters_direct.py`. It should help you get and set the parameters of `self.model`, which is a PyTorch module.


---

# BOBidder Test

Can we instantiate it and generate a bid?

In [None]:
rng = np.random.default_rng(17)
bob = BOBidder(rng)
bob.bid(1, None, .123)   # BOBidder does not use context. Maybe later.

Can we run auctions with it?

In [None]:
extra_classes={'BOBidder': BOBidder}

ad_auction = AdAuction(
    f"configs/config-advantage-BO.json",  warm_up_iterations=10,
    extra_classes=extra_classes  # Let auction_gym know about our new bidder class
)
evaluate(ad_auction, num_iterations=100)

Now run a full-scale auction evaluation. Our BOBidder will run with randomly-initialized model parameters, and the rest of the bidders will be TruthfulBidders.

In [None]:
ad_auction = AdAuction("configs/config-advantage-BO.json", warm_up_iterations=1000, extra_classes=extra_classes)
print(evaluate(ad_auction, num_iterations=10000))

Apparently it's even better to shade your bid randomly than to bid truthfully!

## BOBidder vs Constant Bidder

Now create a new class, `ConstShadingBidder` (see `const_shading_bidder.py`), that just shading by a randomly-chosen constant, `gamma`.

We'll have one `BOBidder` compete against a bunch of `ConstShadingBidder`s, and we'll track the advantage of the `BOBidder`.

In [None]:
extra_classes={'BOBidder': BOBidder, 'ConstShadingBidder': ConstShadingBidder}

In [None]:
ad_auction = AdAuction(
    "configs/config-BO-vs-CO.json",
    warm_up_iterations=1,
    extra_classes=extra_classes,  # We need to tell auction_gym about our new bidders
    seed=17
)
print(evaluate(ad_auction, num_iterations=1000))

Is the result consistent, even though we don't have a fixed seed (i.e., seed='random')?

In [None]:
print(evaluate(ad_auction, num_iterations=1000))

Yes. They're both around -0.24
With the same set of competitors (the 10 ConstShadingBidder agents) and the same BOBidder, we get approximately the same result, even though the auction process has randomness.

# Optimize it

To optimize our `BOBidder` we need to be able to:

- Propose parameters
- Set the parameters in BOBidder
- Evalute BOBidder with the parameters

You've already implemented the last two steps. We'll implement the first as just random parameter selection to get started. Later, the first step will be a Bayesian optimizer.

First, create an auction, find the `BOBidder`, and display its current parameters.

In [None]:
ad_auction = AdAuction(
    "configs/config-BO-vs-CO.json",
    warm_up_iterations=1,
    extra_classes=extra_classes,  # We need to tell auction_gym about our new bidders
    seed=17
)

bo_bidder = ad_auction.us().bidder
print ("PARAMETERS:", bo_bidder.get_parameters())

Next, set the parameters to random values.

In [None]:
num_params = len(bo_bidder.get_parameters())
bo_bidder.set_parameters( .1*np.random.normal(size=(num_params,)) )

Now evaluate the `BOBidder` with the randomly-chosen parameters.

In [None]:
print(evaluate(ad_auction, num_iterations=1000))

To demonstrate optimization, we'll write a simple optimizer that just randomly pertubs the parameters over and over and keeps track of which parameter set performs best.

In [None]:
# Hyperparameters of our simple optimizer
num_rounds = 100
eps = .1

# Initialize the parameters, x_best, to all zeros.

x_best = np.zeros(shape=(num_params,))
y_best = -1e99


In [None]:
print (f"num_params = {num_params}")
x = x_best
for _ in range(num_rounds):
    # Evaluate the current x
    bo_bidder.set_parameters(x)
    y = evaluate(ad_auction, num_iterations=1000)
    
    # Keep track of the best so far
    if y > y_best:
        y_best = y
        x_best = x
    
    # Propose a new x that is a small perturbation
    #  of the best x.
    x = x_best + eps*np.random.normal(size=(num_params,))
    print (f"EVAL: y_best = {y_best:.4f} y = {y:.4f} x = {x[0]:.2f}, {x[1]:.2f}, ...")

That's not bad for optimization by random perturbations.
Can we do better with Bayesian optimization? We'll find out.

# Bayesian Optimization

Please read through SKOpt's Bayesian optimization library [documentation](https://scikit-optimize.github.io/stable/auto_examples/bayesian-optimization.html) and see if you can apply `gp_minimize()` to this problem.

NB: The algorithm `gp_minimize()` minimizes -- hence the name :) -- but we want to *maximize* the output of `evaluate()`, so just pass the arithmetic inverse of the advantage, `-evaluate()`, to `gp_minimize()`, and it'll perform a maximization.

