# Active learning of an unknown demand (bonus: Altair plots)

You're about to launch your product next month, but you're not sure which prices to use.
You think the demand has a log log form:

$$ \log q = \alpha + \beta \log p + \varepsilon $$

but you're unsure of the values of $\alpha$ and $\beta$

¿What should your pricing strategy be in this case? ¿How should you go about learning about demand, but without sacrificing too much profit in the meantime?

We can distinguish between two broad types of learning: active and passive.

Passive means you're choosing the optimal price according to your beliefs each period, without thinking about how your beliefs will evolve in the future. You'll still update your beliefs if reality gives you a surprise (say, demand was much higher than what your beliefs expected), but you won't take into account how the way you choose prices affect the rate of learning.

Active means you choose the optimal price as a compromise between exploitation and exploration: do you wanna try to maximize the current profits or do you wanna explore to learn faster / make sure you get the right answer? Thus, in active learning you need to take into account not just the current profits, but also how the current price choice changes the evolution of beliefs. In practice, this involves earning a bit less profit now, but getting more in the long run.

Objective of blog post here. **What am I gonna do and why is it interesting**




One way to model this problem is through dynamic programming and, more particularly, with a Bellman equation. Thus, the problem of choosing a sequence of prices under active learning can be written as:

$$V_{b_t}(I_t) = max_{p_t \in P} \{ \pi(p_t, x_t) + \beta
                 \int V_{b_{t+1}}(I_{t+1}(x_{t+1}, I_t)) b_t(x_{t+1}| p_t, I_t )\; d x_{t+1}\}  $$


+ $ \pi(a_t, x_t)$ is the current period profit
+ $x_{t+1}$ in this case is the log demand ($log q$)
+ $I_t$ represents the information set of the firm at $t$
+ $b_t(x_{t+1}| p_t, I_t )$ represents the firm's belief about the value that $x_{t+1}$ (log demand) will take next period

To fully flesh out this model, I borrow the notation of Aguirregabiria &amp; Jeon (2018): ["Firms' Belief and Learning in Oligopoly Markets"](http://aguirregabiria.net/wpapers/survey_rio.pdf)

The first important specification is what is the form of the belief function $b()$. In their survey paper, Aguirregabiria &amp; Jeon consider four types of learning and belief function

1. Rational expectations
2. Bayesian learning
3. Adaptive learning
4. Reinforcement learning

In this blog post I will only talk about Bayesian learning, but you're welcome to check the paper for the other approaches Under the bayesian learning, the firm starts with some priors on how $x_{t+1}$ (log demand) evolves and then updates those priors as new information (i.e. prices chosen and observed demand) comes in.



With the amazing Giovanni Ballarin (link here) we are writing a package that estimates such value functions under different settings.

First we'll import the package and get the value function and policy functions to run our simulation


## Pretty Altair graph

The true lambda that generated the demand data was lambda3. Because of this, a firm that learns correctly is one that puts probability one to lambda3.
The graph shows that some firms learnt this quite fast (say, firm id X), while others took up to Z periods to converge (for example, firm id H).
If you select firm number H, you can see why this might have happened: the demand is random and so big error numbers might make the firm still think that the correct value is, say lambda2. 


Some explanation on how to write the Altair graph






## Solve for the value and policy function

In [4]:
!git clone https://github.com/cdagnino/LearningModels.git
!mkdir LearningModels/data


Cloning into 'LearningModels'...
remote: Enumerating objects: 280, done.[K
remote: Counting objects: 100% (280/280), done.[K
remote: Compressing objects: 100% (240/240), done.[K
remote: Total 280 (delta 147), reused 167 (delta 38), pack-reused 0[K
Receiving objects: 100% (280/280), 876.49 KiB | 1.11 MiB/s, done.
Resolving deltas: 100% (147/147), done.


In [11]:
#If you get No module named 'src', you might need to add the folder to your system path
!python LearningModels/examples/aguirregabiria_simple.py

doing 0 of 120
doing 10 of 120
doing 20 of 120
doing 30 of 120
doing 40 of 120
doing 50 of 120
doing 60 of 120
doing 70 of 120
doing 80 of 120
doing 90 of 120
doing 100 of 120
doing 110 of 120
doing 0 of 120
doing 10 of 120
doing 20 of 120
doing 30 of 120
doing 40 of 120
doing 50 of 120
doing 60 of 120
doing 70 of 120
doing 80 of 120
doing 90 of 120
doing 100 of 120
doing 110 of 120
doing 0 of 120
doing 10 of 120
doing 20 of 120
doing 30 of 120
doing 40 of 120
doing 50 of 120
doing 60 of 120
doing 70 of 120
doing 80 of 120
doing 90 of 120
doing 100 of 120
doing 110 of 120
doing 0 of 120
doing 10 of 120
doing 20 of 120
doing 30 of 120
doing 40 of 120
doing 50 of 120
doing 60 of 120
doing 70 of 120
doing 80 of 120
doing 90 of 120
doing 100 of 120
doing 110 of 120
doing 0 of 120
doing 10 of 120
doing 20 of 120
doing 30 of 120
doing 40 of 120
doing 50 of 120
doing 60 of 120
doing 70 of 120
doing 80 of 120
doing 90 of 120
doing 100 of 120
doing 110 of 120
doing 0 of 120
doing 10 of 120
doin

After 60 iterations we get an error of 0.004. We could let it run longer to get a smaller error, but it should be fine for our plotting purposes.

## Use the policy function to simulate

In [13]:
%matplotlib inline
import sys
sys.path.append("/Users/cd/Documents/github_reps/cdagnino.github.io/notebooks/LearningModels")

import matplotlib.pyplot as plt
import dill
import numpy as np
import pandas as pd
#file_n = "2018-10-1vfi_dict.dill"  
file_n = "2018-10-11vfi_dict.dill"
with open('LearningModels/data/' + file_n, 'rb') as file:
    data_d = dill.load(file)
    
    
import sys
sys.path.append('../')
import src

lambdas = src.generate_simplex_3dims(n_per_dim=data_d['n_of_lambdas_per_dim'])
price_grid = np.linspace(data_d['min_price'], data_d['max_price'])

policy = data_d['policy']
valueF = data_d['valueF']



lambdas_ext = src.generate_simplex_3dims(n_per_dim=15) #15 should watch value f iteration
print(lambdas_ext.shape)

#Interpolate policy (level price). valueF is already a function
policyF = src.interpolate_wguess(lambdas_ext, policy)

def one_run(lambda0=np.array([0.4, 0.4, 0.2]),
                             true_beta=src.betas_transition[2],
                             dmd_σϵ=src.const.σ_ɛ+0.05, time_periods=40):
    current_lambdas = lambda0
    d = {}
    d['level_prices'] = []
    d['log_dmd'] = []
    d['valueF'] = []
    d['lambda1'] = []
    d['lambda2'] = []
    d['lambda3'] = []
    d['t'] = []


    for t in range(time_periods):
        d['t'].append(t)
        d['lambda1'].append(current_lambdas[0])
        d['lambda2'].append(current_lambdas[1])
        d['lambda3'].append(current_lambdas[2])
        d['valueF'].append(valueF(current_lambdas[:2])[0])

        #0. Choose optimal price (last action of t-1)
        level_price = policyF(current_lambdas[:2]) #Check: Is this correctly defined with the first two elements?
        d['level_prices'].append(level_price[0])

        #1. Demand happens
        log_dmd = src.draw_true_log_dmd(level_price, true_beta, dmd_σε)
        d['log_dmd'].append(log_dmd[0])

        #2. lambda updates: log_dmd: Yes, level_price: Yes
        new_lambdas = src.update_lambdas(log_dmd, src.dmd_transition_fs, current_lambdas,
                       action=level_price, old_state=1.2)

        current_lambdas = new_lambdas
            
    return pd.DataFrame(d)

def many_runs(total_runs, **kwargs):
    dfs = []
    for run in range(total_runs):
        df = one_run(**kwargs)
        df['firm_id'] = run
        dfs.append(df)
        
    return pd.concat(dfs, axis=0)

all_firms = many_runs(7, time_periods=50)




(120, 3)


## Plot with your new BFF: Altair



In [15]:
import Altair as alt
all_firms['demand'] = np.e**(all_firms['log_dmd'])
selector = alt.selection_single(empty='all', fields=['firm_id'], on='mouseover')

base = alt.Chart(all_firms).properties(
    width=250,
    height=250
).add_selection(selector).transform_filter(
    selector
)

color_timeseries = alt.Color('firm_id:N', legend=None)

x_for_tseries = alt.X('t', scale=alt.Scale(domain=(0, 30)))

#alt.Y('level_prices', scale=alt.Scale(domain=(0, 3.5)))
timeseries1 = base.mark_line(strokeWidth=2).encode(
    x=x_for_tseries,
    y=alt.Y('level_prices'),
    color=color_timeseries
)

timeseries2 = base.mark_line(strokeWidth=2).encode(
    x=x_for_tseries,
    y=alt.Y('demand'),
    color=color_timeseries 
)

timeseries3 = base.mark_line(strokeWidth=2).encode(
    x=x_for_tseries,
    y=alt.Y('lambda3'),
    color=color_timeseries 
)

timeseries4 = base.mark_line(strokeWidth=2).encode(
    x=x_for_tseries,
    y=alt.Y('valueF', scale=alt.Scale(domain=(14, 22))),
    color=color_timeseries 
)


color = alt.condition(selector,
                      alt.Color('firm_id:N', legend=None, ),
                      alt.value('lightgray'))


legend = alt.Chart(all_firms).mark_point(size=400).encode(
    y=alt.Y('firm_id:N', axis=alt.Axis(orient='right')),
    color=color
).add_selection(
    selector
)


(timeseries1 | timeseries2) & (timeseries3 | timeseries4) | legend

ModuleNotFoundError: No module named 'Altair'