# Intro

Machine learning focuses on making accurate predictions. But we influence the world through decisions, not predictions. Predictions can be useful inputs to improve decision-making, but models that directly help us achieve goals will be more valuable than those that only make predictions.

Reinforcement learning aims to solve for optimal decision-making, but most mainstream business problems look little like existing RL work. Moreover, most RL research intentionally avoids reliance on humans' existing domain knowledge. I take the opposite approach, and show how human chosen structure expands the scope of tractible applications of RL.

My approach combines RL with Structural Equation Modeling, a technique that encodes human domain knowledge into a model (typically using multiple equations that describe different parts of the domain.)

My workflow is:
1. Use structural equation modeling to create a model of the business environment. 
2. Use real data to estimate the structural model
3. Treat the estimated model as a simulation environment, and apply reinforcement learning algorithms to find an optimal the decision policy in the simulator
4. Apply that decision policy to make optimized decisions in the real business environment

The transfer from a simulation model to the data generating environment is inspired by [world models](https://arxiv.org/abs/1803.10122) while the use of structural equation models is standard in structural microeconometrics (the field of my PhD research).

I believe the approach suggested here can improve how we make decisions in a wide range of business applications.

# Example Use Cases

### Airline Pricing
Airlines use machine learning models to help set ticket prices. A model predicts how many tickets the airline can sell each day for each upcoming flight for each candidate price. The models consider price, seasonablity, competitor prices, macroeconomic variables, etc. But even a perfect predictive model doesn't guarantee efficient price setting.

For example, consider a flight happening in 100 days which currently has 150 unsold seats. A predictive model says you can sell 1 ticket today for \\$300 or you could sell 2 tickets if you set the price at $250. Which price should you choose? 

Airlines currently convert predictive models into pricing decisions with heuristics (e.g. a timetable of how many tickets to sell at pre-specified periods before the flight, or a goal of selling up to a pre-specified demand elasticity.)

### Grocery Store Logistics
A grocery chain ran a [predictive modeling competition on Kaggle](https://www.kaggle.com/c/favorita-grocery-sales-forecasting) to improve demand forecasts. They aimed to stock match their purchases from wholesales to their retail sales. However, the predicted sales is not always the optimal amount to stock.

If you purchase exactly the amount you are predicted to sell, you will experience frequent stockouts (when your model underestimates demand), reducing sales volume and disappointing customers.  Similarly, some items will spoil when predicted sales exceed actual sales. Unless the model is exactly correct every time, you face a tradeoff.  The optimal decision would consider factors like
- Markup rate on each item
- Spoilage rate
- Cost of storage
- Value of ensuring customers find the items they want
- etc.

In practice, grocery store managers likely guess at how to make these tradeoffs, much as they may have guessed at how much of each food they would sell before adopting ML. But the approach in this notebook would help them make better decisions.


# Implemention Overview
This notebook focuses on the Airline Pricing example.

To illustrate the how resulting pricing policies perform in the original data generating environment, I use a simulation for the data generating process rather than using real data. Though I take a fixed dataset for trainng the predictive model in the conventional way. For illustrative simplicity, this example considers a market with only two airlines.

## Market Set-Up
We train an agent to set prices for Jetblue; the competitor, whose prices we cannot control, is called Delta. 

There are two types of information:
1. Information and processes that are known to the airline (such as the number of seats on each flight, days remaining before takeoff, etc.).
2. Information and processes that aren't directly know to the airlines (their competitor's pricing policy, and the exact demand for tickets on each day). 

The airline builds a model to predict factors they don't directly observe. Processes they don't directly observe are stored in a `CompetitiveConditions` object. When simulating data the real environment, I build a `CompetitiveConditions` from the true data generating process. When optimizing pricing, the airline use a `CompetitiveConditions` object based on their predictive model.

The other important class in the following code is the `Market`. A `Market` object holds some `CompetitiveConditions` as well as all information that airlines can directly observe. 

`Market` follows the OpenAI Gym API. So we can apply standard reinforcement learning tools to optimize the pricing policy from our model based environment.

# Step 1: Collect Data From Real Market

We define some parameters and import a function that determine the true data generating process. The exact market mechanisms (which the constants below affect) aren't central to the optimization workflow. So a description of the market details is postponed to the bottom of this notebook.

For now, you can safely treat the quantity-determining mechanism and it's parameters as a black box, much as the airlines do.

In [1]:
import altair as alt
import numpy as np
import pandas as pd

from sem_policy_opt.true_dgp import get_true_qty_demanded_fn

# Constants hidden from airlines
CUSTOMER_LEVEL_RANDOMNESS = 20
DEMAND_SIGNAL_NOISINESS = 10
MAX_DEMAND_LEVEL = 400
POTENTIAL_CUSTOMERS_PER_DAY = 20

# Constants known to airlines
SEATS_PER_FLIGHT = 250
SALES_WINDOW_LENGTH = 120


I used trial and error to find a reasonable pricing function. I use the following function for both airlines when creating "real" data.

In [2]:
def simple_price_fn(my_demand_signal, days_before_flight, my_seats_avail, competitor_full): 
    # Charge more if you have a lot of time to sell seats, if few seats are available, or if you have little competition
    # On net, prices may increase over time because low seat inventory overwhelms remaining time effect.
    formula_price = 50 + my_demand_signal + 0.6 * days_before_flight - my_seats_avail + 40 * int(competitor_full)
    # demand_signal is noisy and can thus be negative. Never price tickets below some price_floor
    price_floor = 10
    actual_price = max(formula_price, price_floor)
    return actual_price


# Run Real Market

Airlines have historical data they can use to build a model. Here, we run the "real" environment to create this data.

In [3]:
from sem_policy_opt.market import Market
from sem_policy_opt.market_conditions import CompetitiveConditions
from sem_policy_opt.diagnostics import run_env

real_market_conditions = CompetitiveConditions(delta_price_fn = simple_price_fn, 
                                               qty_fn=get_true_qty_demanded_fn(POTENTIAL_CUSTOMERS_PER_DAY, CUSTOMER_LEVEL_RANDOMNESS))

real_market = Market(real_market_conditions, MAX_DEMAND_LEVEL, DEMAND_SIGNAL_NOISINESS, SEATS_PER_FLIGHT, SALES_WINDOW_LENGTH) 

train_profits, train_data = run_env(real_market, simple_price_fn, n_times=1000)
val_profits, val_data = run_env(real_market, simple_price_fn, n_times=50)

# Step 2: Fit Machine Learning Model on Real Data

We fit a model that predicts Delta's price and the quantity sold as a function of
- Days remaining
- Jetblue's demand signal
- Jetblue's remaining number of seats available
- Whether Delta's flight is fully booked (i.e. whether Delta is still selling tickets)


In [4]:
from sem_policy_opt.keras_models import get_keras_model, prep_for_keras_model
from sem_policy_opt.diagnostics import r_squared

train_x, train_y = prep_for_keras_model(train_data)
val_x, val_y = prep_for_keras_model(val_data)
predictive_model = get_keras_model(train_x, train_y, val_x, val_y, verbose=1)

print(r_squared(predictive_model, val_data))

Train on 120000 samples, validate on 6000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Train on 120000 samples, validate on 6000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
{'delta_price_r2': 0.971098219340336, 'jb_qty_sold_r2': 0.6998958169507777, 'delta_qty_sold_r2': 0.6743210576470031}


# Step 3: Set Up Model-Based Market Simulator

Now we create a market based not on the true data generating processes (which the firms don't know), but instead based on the predictive model.

As a diagnostic, I compare predicted profits from using Jetblue's current pricing function in the training, validation and simulator data.

There are important dynamics from outside the ML model used here.

**TODO: DESCRIPTION OF STRUCTURAL EQUATION MODEL**

In [None]:
sim_market_conditions = CompetitiveConditions(predictive_model=predictive_model)
# use function returning simulated market with baselines API to facilitate parallelism
sim_market_maker = lambda: Market(sim_market_conditions, MAX_DEMAND_LEVEL, DEMAND_SIGNAL_NOISINESS, SEATS_PER_FLIGHT, SALES_WINDOW_LENGTH)
noisy_sim_market_maker = lambda: Market(sim_market_conditions, MAX_DEMAND_LEVEL, DEMAND_SIGNAL_NOISINESS, SEATS_PER_FLIGHT, SALES_WINDOW_LENGTH, summarize_on_episode_end=True)
sim_market = sim_market_maker()


simple_price_sim_profits, simple_price_sim_data = run_env(sim_market, simple_price_fn, n_times=50)

print("Mean profits in training data: {} \n"
      "Mean profits in val data: {} \n"
      "Mean profits in sim data: {} \n".format(train_profits.mean(), val_profits.mean(), simple_price_sim_profits.mean()))

Mean profits in training data: 42501.5398 
Mean profits in val data: 43340.108 
Mean profits in sim data: 45366.084 



# Diagnostics and Experiment With Different Pricing Strategies

We can use the simulator to test he performance of an arbitrary pricing function.

As a simple experiemnt, we first estimate revenue per flight when multiplying all prices from the existing pricing function by various constants.

Results shown below

In [None]:
from sem_policy_opt.diagnostics import test_pricing_multipliers

price_comparison = test_pricing_multipliers(simple_price_fn, np.linspace(0.5, 1.5, 5), sim_market, real_market)

print(price_comparison)
alt.Chart(price_comparison).mark_point().encode(
    x='mean_predicted_rev',
    y='mean_actual_rev',
    color='base_price_mult',
    tooltip='base_price_mult')


We'll look at a couple more diagnostics before running a reinforcement learning algorithm to optimize our pricing function.

Below, I show predicted quantities sold in a given day for various candidate jetblue prices (holding the number of days until the flight and Jetblue's demand signal constant.)

In [None]:
days_before_flight = jb_demand_signal = 150
pred_outcomes_diff_jb_prices = []
for jb_price in np.linspace(0, MAX_DEMAND_LEVEL, 6):
    # Some extra munging here do to messiness associated with multi-input / multi-output model.
    # each input fed in as separate array to facilitate hiding jetblue_price from prediction of delta_price
    prediction_data = prep_for_keras_model([days_before_flight, jb_demand_signal, jb_price], skip_y=True)
    prediction = predictive_model.predict(prediction_data)
    delta_price, jb_seats_sold, delta_seats_sold = [i[0][0] for i in prediction]
    pred_outcomes_diff_jb_prices.append({'jb_price': jb_price,
                                         'delta_price': delta_price,
                                         'jetblue_seats_sold': jb_seats_sold,
                                         'delta_seats_sold': delta_seats_sold})
pd.DataFrame(pred_outcomes_diff_jb_prices).set_index(['jb_price'])

It's reassuring that delta price is independent of **jb_price**. Delta must choose their price without seeing Jetblue's.  However, it's a shortcoming of the model that Delta is predicted to sell fewer seats as jetblue's price increases.

Even with shortcomings in the model, we can optimize a pricing policy against the model and find it improves profits in the real environment.

# Step 4: Optimize Policy Function
Everything below is currently in-progress. Looking at RL with the `stable_baselines` library, which is a better-maintained fork of OpenAI baselines.

In [None]:
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines.sac import SAC
from time import time
import os

parallelism_level = 1         # use os.cpu_count() if not using SAC. SAC doesn't allow parallelism
env = DummyVecEnv([sim_market_maker for _ in range(parallelism_level)]) # Env is vectorized market for parallelism

model = SAC(MlpPolicy, env)

start_time = time()
for num_updates in range(1, 10):
    model.learn(total_timesteps=5000)
    sim_market_rewards, _ = run_env(sim_market, model, n_times=2)
    real_market_rewards, _ = run_env(real_market, model, n_times=2)
    print("""{} learn calls executed in {:.0f} seconds. 
             Current score in sim: {:.0f}. Current score in real market: {:.0f}.""".format(
                                                                                num_updates, 
                                                                                time()-start_time, 
                                                                                sim_market_rewards.mean(), 
                                                                                real_market_rewards.mean()))

## Details

#### How The Market Works
Some number of customers (`POTENTIAL_CUSTOMERS_PER_DAY`) come to a website each day.  The customers' average willingness to pay for a flight on that day `demand_level`. The `demand level` on any given day is chosen from a distribution `uniform(0, MAX_DEMAND_LEVEL)`.  Each airline receives a signal about `demand_level` on that day, and the signal is the `demand_level` plus some noise that is distributed `N(0, DEMAND_SIGNAL_NOISINESS)`. This demand signal might represent a prediction of demand from a model considering seasonality, macroeconomics, etc. Additionally, each customer has idiosyncratic preferences, so their willingness to pay for a ticket on any given airline is `demand_level + customer_preference` where `customer_preference` is distributed `N(0, CUSTOMER_LEVEL_RANDOMNESS)`.  The customer considers the price for each of the two airlines and purchases a ticket from the airline that gives them the highest consumer surplus (their personal willingness to pay minus for a ticket on that airline minus the cost of a ticket on that airline).  If the customer's consumer surplus for both airlines is negative, they do not buy a ticket.
