# Market Making (RL)

*In this notebook, we will implement a simple market making strategy using reinforcement learning. The goal of the market maker is to maximize the profit by providing liquidity to the market. The market maker will place two limit orders, one buy order and one sell order, around the mid price. The market maker will adjust the price of the limit orders based on the current inventory and the recent price movements. The market maker will also adjust the size of the limit orders based on the current inventory and the recent price movements. The market maker will use reinforcement learning to learn the optimal price and size of the limit orders.*

## Setup

In [4]:
# Data loading and preprocessing
from utils.load_data import load_data
from utils.evaluate import evaluate_strategy, get_pnl

import plotly.graph_objects as go
from plotly.subplots import make_subplots


# Environment
from environment.env import Real_Data_Env

# Model
from strategies.rl import QLearning, RLStrategy, SARSA
from strategies.baselines import BestPosStrategy, StoikovStrategy


import numpy as np
import random

np.random.seed(13425)
random.seed(13425)

## Data Preparation
The data used for this notebook is the [Crypto Lake website](https://crypto-lake.com/). The data contains the order book data for the "BTC/USDT" trading pair as well as the trades on the Binance exchange. It contains 20 price levels for each order book side for each timestamp(~850k) of the day "1/10/2022" plus the trades that occured on that day.


In [13]:
market_event = load_data(max_depth = 5)
market_event_train = market_event[:1000000]
market_event_val = market_event[1000000:2000000]
market_event_test = market_event[2000000:]

del market_event

In [3]:
print(
    f"Spread: {market_event_train[39].orderbook.asks[0][0] - market_event_train[39].orderbook.bids[0][0]:.3f}$"
)

ask_prices, ask_sizes = zip(*market_event_train[39].orderbook.asks)
bid_prices, bid_sizes = zip(*market_event_train[39].orderbook.bids)

# Create traces for asks and bids
fig = make_subplots(rows=1, cols=2, subplot_titles=("Asks", "Bids"), shared_yaxes=True)

fig.add_trace(
    go.Bar(x=ask_prices, y=ask_sizes, marker=dict(color="red"), name="Asks"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Bar(x=bid_prices, y=bid_sizes, marker=dict(color="blue"), name="Bids"),
    row=1,
    col=2,
)
fig.update_layout(title_text="Order Book")
fig.update_xaxes(title_text="Dollar")
fig.update_yaxes(title_text="BTC", row=1, col=1)
fig.show()

Spread: 1.310$


# Training

## Parameters

In [3]:
EXECUTION_DELAY = 1e-4
MARKET_EVENT_DELAY = 1e-4

INITIAL_CASH = 0
MIN_POSITION = -1  # Example: minimum position size
MAX_POSITION = 1  # Example: maximum position size
INTERVAL_BTW_ORDERS = 5e-2
TRADE_SIZE = 0.001
MAKER_FEE = 0  # -0.00004

## RL Model

The reward is calculated as : 
$$
    r_t = Q_t * (m_{t} - m_{t-1}) + x_t^{\text{side}} * x_t^{\text{size}} * x_t^{\text{price}} - 100 ( e^{4 |\frac{Q_t - Q_{\text{min}}}{|Q_{\text{max}} - Q_{\text{min}}|} - 0.5|} -1)
$$

where $Q_t$ is the inventory at time t,

$m_t$ is the mid price at time t, 

$x_t^{\text{side}}$ is the side of the order at time t, 

$x_t^{\text{size}}$ is the size of the order at time t, 

$x_t^{\text{price}}$ is the price of the order at time t


In [8]:
ALPHA = 0.9
ALPHA_DECAY = 0.9999
GAMMA = 0.99
ORDER_BOOK_DEPTH = 5

NB_TRAINING = 5

### Q learning
An ensemble methods is used to reduced the variance of the Q values. The Q values are calculated as the average of the Q values of the ensemble. The Q values are updated using the following formula:

$$
    

In [6]:
QLearningTable = None

model = QLearning(ALPHA, ALPHA_DECAY, GAMMA)
strategy = RLStrategy(
    model=model,
    min_position=MIN_POSITION,
    max_position=MAX_POSITION,
    delay=INTERVAL_BTW_ORDERS,
    trade_size=TRADE_SIZE,
    maker_fee=MAKER_FEE,
    order_book_depth=ORDER_BOOK_DEPTH,
    log=False,
)

# Ensemble methods to reduce the variance of the Q-table
for i in range(NB_TRAINING):
    print("Training: {}/ {}".format(i, NB_TRAINING))
    env = Real_Data_Env(market_event_train, EXECUTION_DELAY, MARKET_EVENT_DELAY)
    strategy.run(env, "train", 1500000)

    if QLearningTable is None:
        QLearningTable = model.q_table
    else:
        QLearningTable += model.q_table

    strategy.reset()

QLearningTable /= NB_TRAINING

In [None]:
# Save
strategy.model.q_table = QLearningTable
strategy.save_q_table("model/ensemble_qlearning_table.npy")

In [None]:
len(env.market_event)

In [14]:
# Validation
strategy.log = True
env = Real_Data_Env(market_event_val, EXECUTION_DELAY, MARKET_EVENT_DELAY)


trades, market_updates, orders, updates = strategy.run(env, "test", 1000000)

evaluate_strategy(strategy, trades, updates, orders)

Simulation runned for 135.81s                                                                                                                                                                           


### SARSA

In [None]:
SARSATable = None

model = SARSA(ALPHA, ALPHA_DECAY, GAMMA)
strategy = RLStrategy(
    model=model,
    min_position=MIN_POSITION,
    max_position=MAX_POSITION,
    delay=INTERVAL_BTW_ORDERS,
    trade_size=TRADE_SIZE,
    maker_fee=MAKER_FEE,
    order_book_depth=ORDER_BOOK_DEPTH,
)

for _ in range(10):
    env = Real_Data_Env(market_event_train, EXECUTION_DELAY, MARKET_EVENT_DELAY)
    strategy.run(env, "train", 500000)

    if SARSATable is None:
        SARSATable = model.q_table
    else:
        SARSATable += model.q_table

    strategy.reset()

SARSATable /= 10