In [1]:
import sys
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict

In [2]:
def add(path):
    if path not in sys.path:
        sys.path.append(path)
add("/Users/yuchenji/PycharmProjects/auction-gym/src")




In [3]:
import sys
sys.path.append('/Users/yuchenji/PycharmProjects/auction-gym/src')


In [4]:
from auction_gym.src import main


In [5]:
import sys
sys.path.append('/Users/yuchenji/PycharmProjects/auction-gym/src')
from main import parse_config, instantiate_agents, instantiate_auction

In [43]:
import main

In [59]:
from main import parse_config, instantiate_agents, instantiate_auction

-0.004233926115349491


In [51]:
from ad_auction import AdAuction

-0.004233926115349491


In [27]:
def evaluate(ad_auction, num_iterations):
    trace = []
    for _ in range(num_iterations):
        r = ad_auction.run_episode()
        trace.append(r)
    return np.array(trace).mean()

NameError: name 'Bidder' is not defined

# Advantage-E

Bidders get an advantage when they shade their bids, i.e. when they bid a little lower than the ad is really worth, because they save money if they win the auction. If they bid *too* low, however, they lose the auction, which is bad because they want the ad slot.

With `config-advantage-E.json` our agent shades it bid w/`EmpiricalShadedBidder` and the other agents bid the true cost (`TruthfulBidder`).

In [52]:
ad_auction = AdAuction("configs/config-advantage-BO.json", warm_up_iterations=1000)
print(evaluate(ad_auction, num_iterations=10000))

TO -0.000545250209260078
EO -0.011657106026253928
VO -0.0035311654468497353
EO -0.011657106026253928


The number printed above is our average reward for an episode.

[Actually, it's called "return" when talking about an episode. One step -- in this case, an auction -- yields a reward. When you take many steps in sequence, receiving a reward for each step, you can sum (or average) of the rewards and call it "return".]

In any case, the *return*, here, is `[our net utility - mean(other agent's net utility)] / (mean gross utility)` averaged over the `num_iterations` simulated auctions. The mean gross utility is a mean over all of the agents (including us). It's a measure of the total value generated by the auction -- the total revenue that's up for grabs. That value (i.e., revenue) is split between the bidders and the company that runs the auction.

Since the number returned by `evaluate()` is compared to the other agents in the auctions, we'llrefer to it as our *advantage*.


# TO/EO/VO

Now it gets interesting. We bid truthfully (`TruthfulBidder`), and the other agents bid either:

- TO: Truthfully, also
- EO: Shading with `EmpiricalShadedBidder`
- VO: Shading with `ValueLearningBidder`

In the first case, TO, we receive exactly zero advantage. The actual number shown below for TO is non-zero b/c the auction process is noisy. That's why we need to run multiple times -- and that's why real auction experiments take so long to run. (And, guess what, Bayesian optimization was made for problems where evaluation takes a long time and is noisy.)

In the other two cases, EO and VO, we have negative advantage. It's better to shade your bid in a smart way than to bid full price all the time.

## Warm-up

Notice the argument `warm_up_iterations`. The tells `AdAuction` how many times to run the auction before doing any evaluation at all. During the warm-up time the learning agents (EO and VO) get a chance to learn about the dynamics of the auction. The auction simulator is interesting in that the agents are aware of each others' behavior via the auction, so they are all learning simultaneously, making bids, and learning even more from observing others' bids. If you look at the plots at the bottom of `auction-gym/src
/Getting Started with AuctionGym (2. Effects of Bid Shading).ipynb` you'll see that it can take a few iterations --each of which consists of many auction rounds -- before the plots settle down. Called a transient, it's there because the agents take time to learn.

We want to do our evaluations after the transient. That will simulate an engineer creating a new ad-bidding bot and trying to optimize it via experiment in an already-functioning ad auction market. Also, it's after the transient that the other agents are at their best.



In [6]:
for ttype in ["TO", "EO", "VO"]:
    ad_auction = AdAuction(f"configs/config-{ttype}.json",  warm_up_iterations=1000)
    print (ttype, evaluate(ad_auction, num_iterations=10000))

NameError: name 'AdAuction' is not defined

# Plan

The goal is to optimize a bidder with Bayesian optimization. To do that we need:

1. A way to simulate a sequence of experiments. Done. See above.
2. A parameterized agent.
3. A Bayesian optimizer.


This week, work on step 2.

## BOBidder

We want to optimize parameters by observing rewards (or, in our case, "advantages"). When you optimize this way in RL, it's called *policy search*. If you look in the auction-gym codebase you'll find `PolicyLearningBidder`. Taht will be our starting point for BOBidder.

To get started, set up an evaluation of a hacked copy of `PolicyLearningBidder`:

- Make a copy of `PolicyLearningBidder` a new file (`bo_bidder.py`) in your Capstone directy.
- Completely remove the method `update()` -- even the signature. You can also remove the code that is referred to as "Option 1". We're only going to use "Option 2". Also, get rid of any references to `gamma` or any other variables that you're not using. `self.model` will be doing the bulk of the work.
- Copy `configs/config-advantage-E.json` to `configs/config-advantage-BO.json`. Inside, replace the `EmpiricalShadedBidder` with `BOBidder`.
- Try to get `evaluate()` to run on your new config file.

It probably won't perform well, but that's because it's not optimized. The first step is just getting it to run.

Next, you'll need to figure our how to get and set the parameters. Create two methods in `BOBidder`:
- `get_parameters(self) -> np.ndarray`, and
- `set_parameters(self, parameters: np.ndarray)`

and make them do the right things.  Fortunately, I have some code lying around that could help. See `ParametersDirect` in `parameters_direct.py`. It should help you get and set the parameters of `self.model`, which is a PyTorch module.
