# **Paper: Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis** 
Authors: Wenhang Bao, Xiao-Yang Liu (Code available at : https://github.com/WenhangBao/MultiAgent-RL-for-Liquidation) \\
presented by Janet Wang (yw4fm)   

Choose *ddpg_agent.py*, *model.py*, *syntheticChrissAlmgren.py*, *utilsgiven.py*

In [17]:
#google drive
import sys 
from google.colab import drive
drive.mount('/content/drive/')

from google.colab import files
src = list(files.upload().values())[0]
open('utilsgiven.py','wb').write(src)
import utilsgiven


# Get the default financial and AC Model parameters
financial_params, ac_params = utilsgiven.get_env_param()

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


Saving ddpg_agent.py to ddpg_agent (1).py
Saving model.py to model (1).py
Saving syntheticChrissAlmgren.py to syntheticChrissAlmgren (2).py
Saving utilsgiven.py to utilsgiven (2).py


## Setup

We use the **Almgren-Chriss market impact model** defined in *syntheticChrissAlmgren.py* to solve the problem of finding an optimal liquidation strategy and experiment with multiple scenaios for the agents. In the context of reinforcement learning, the model serves as a trading environment where two agents make selling decisions and the environment returns price as information. \\
To get started, we set up the **default financial environment** in *syntheticChrissAlmgren.py* by inputting the parameters below. These default parameters will remain the same across all experiments. 

In [6]:
financial_params

0,1,2,3
Annual Volatility:,12%,Bid-Ask Spread:,0.125
Daily Volatility:,0.8%,Daily Trading Volume:,5000000.0


Then, we input parameters to the Almgren-Chriss market impact model to define a multi-agent scenario. Two agents are assumed to sell all shares within the given time frame.

We redefine the following parameters everytime as we run a new experiment to get arrays of expected shortfalls and trajectories and keep the rest as default: 

1.   Total Number of Shares for Agent 1 to Sell
2.   Total Number of Shares for Agent 2 to Sell
3.   Trader's Risk Aversion for Agent 1
4.   Trader's Risk Aversion for Agent 2

For example, the following parameters are defined for two competitive agents each responsible for selling 0.5 million shares at risk aversion $\lambda_{B_{1}}=\lambda_{B_{2}}=1e^{-6}$ to explore the cooperation v. competition relationship in *visualization.ipynb*. All outputs below follow this example.

We set "*Total Number of Shares for Agent 2 to Sell*" as 0.0001 to stimulate single-agent environment to set benchmark of comparison. The number of shares for Agent 2 must be greater than 0 to avoid division error in normalizatoin.

In [7]:
ac_params

0,1,2,3
Total Number of Shares for Agent1 to Sell:,500000,Fixed Cost of Selling per Share:,$0.062
Total Number of Shares for Agent2 to Sell:,500000,Trader's Risk Aversion for Agent 1:,1e-06
Starting Price per Share:,$50.00,Trader's Risk Aversion for Agent 2:,1e-06
Price Impact for Each 1% of Daily Volume Traded:,$2.5e-06,Permanent Impact Constant:,2.5e-07
Number of Days to Sell All the Shares:,60,Single Step Variance:,0.144
Number of Trades:,60,Time Interval between trades:,1.0


After setting up the environment, we model the liquidation process as ***MDP***. \

We adopt the ***Actor-Critic method*** that is parameterized with neural network to approximate both *Q-value* and action *a*. 

*  The **critic** learns the Q-value function. The critic network estimates the return *r* of the state and supplies knowledge of the performance to roll on the actions for Agent 1 and Agent 2. 
*  The **actor** updates the policy by information from critic. The actor **inputs** state *s* and return action *a* 

The paper referenced the **Deep Deterministic Policy Gradients (DDPG)** algorithm, which is an example of Actor-Critic method, to solve the optimal liquidation problem. Details are elaborated in *ddpg_agent.py*

In [8]:
import numpy as np

import syntheticChrissAlmgren as sca
from ddpg_agent import Agent

from collections import deque

For each agent defined in *ddpg_agent.py*:

*   Goal: minimizing expected shortfall (selling cost) 
*   State: [past return *r*, remaining number of trades *m*, remaining numebr of shares *l*]
*   Reward: difference between utility functions before and after sale. Utility functions are defined defined by risk aversion level $\lambda$ and trading trajectory (vector of shares remaning at each time step $k$)
*   Action: sell(1) or hold(0) at each time step $k$ as well as market prices
*   Policy: selling percentage *a* at state *s*
*   Q(*s,a*): expected reward achieved by action *a* at state *s* 

Each agent only observes limited state infoormation as they only their own remaining shares, but not others.

In [9]:
# Create simulation environment
env = sca.MarketEnvironment()

# Initialize Feed-forward DNNs for Actor and Critic models. 
agent1 = Agent(state_size=env.observation_space_dimension(), action_size=env.action_space_dimension(),random_seed = 1225)
agent2 = Agent(state_size=env.observation_space_dimension(), action_size=env.action_space_dimension(),random_seed = 108)
# Set the liquidation time
lqt = 60

# Set the number of trades
n_trades = 60

# Set trader's risk aversion
tr1 = 1e-6
tr2 = 1e-6

We then implement a reinforcement learning workflow to train the actor and crtic. We use DDPG algorithm to generate the optimal trading trajectory with minimum selling cost. Then, we feed in the states observed from the environment to Agent 1 and Agent 2. These two agents first performance based on these predicted actions. After the first round, the environment gains information and returns their new rewards and states. This process continues for 1,300 episodes in this case. 

For every 100 episode, we output the average shortfall of the two agents.

In [21]:
# Set the number of episodes to run the simulation
episodes = 1300
shortfall_list = []
shortfall_hist1 = np.array([])
shortfall_hist2 = np.array([])
shortfall_deque1 = deque(maxlen=100)
shortfall_deque2 = deque(maxlen=100)
for episode in range(episodes): 
    # Reset the enviroment
    cur_state = env.reset(seed = episode, liquid_time = lqt, num_trades = n_trades, lamb1 = tr1,lamb2 = tr2)

    # set the environment to make transactions
    env.start_transactions()

    for i in range(n_trades + 1):
      
        # Predict the best action for the current state based on DDPG. 
        cur_state1 = np.delete(cur_state,8)
        cur_state2 = np.delete(cur_state,7)
       
        action1 = agent1.act(cur_state1, add_noise = True)
        action2 = agent2.act(cur_state2, add_noise = True)
        
        # Action is performed and new state, reward, info are received. 
        new_state, reward1, reward2, done1, done2, info = env.step(action1,action2)
        
        # current state, action, reward, new state are stored in the experience replay
        new_state1 = np.delete(new_state,8)
        new_state2 = np.delete(new_state,7)
        agent1.step(cur_state1, action1, reward1, new_state1, done1)
        agent2.step(cur_state2, action2, reward2, new_state2, done2)
        # roll over new state
        cur_state = new_state

        if info.done1 and info.done2:
            shortfall_hist1 = np.append(shortfall_hist1, info.implementation_shortfall1)
            shortfall_deque1.append(info.implementation_shortfall1)
            
            shortfall_hist2 = np.append(shortfall_hist2, info.implementation_shortfall2)
            shortfall_deque2.append(info.implementation_shortfall2)
            break
        
    if (episode + 1) % 100 == 0: # print average shortfall over last 100 episodes
        print('\rEpisode [{}/{}]\tAverage Shortfall for Agent1: ${:,.2f}'.format(episode + 1, episodes, np.mean(shortfall_deque1)))        
        print('\rEpisode [{}/{}]\tAverage Shortfall for Agent2: ${:,.2f}'.format(episode + 1, episodes, np.mean(shortfall_deque2)))
        shortfall_list.append([np.mean(shortfall_deque1),np.mean(shortfall_deque2)])
print('\nAverage Implementation Shortfall for Agent1: ${:,.2f} \n'.format(np.mean(shortfall_hist1)))
print('\nAverage Implementation Shortfall for Agent2: ${:,.2f} \n'.format(np.mean(shortfall_hist2)))
#utilsgiven.plot_price_model()

Episode [100/1300]	Average Shortfall for Agent1: $342,037.37
Episode [100/1300]	Average Shortfall for Agent2: $343,009.38
Episode [200/1300]	Average Shortfall for Agent1: $321,481.76
Episode [200/1300]	Average Shortfall for Agent2: $321,917.95
Episode [300/1300]	Average Shortfall for Agent1: $318,075.91
Episode [300/1300]	Average Shortfall for Agent2: $317,478.14
Episode [400/1300]	Average Shortfall for Agent1: $371,157.72
Episode [400/1300]	Average Shortfall for Agent2: $373,268.12
Episode [500/1300]	Average Shortfall for Agent1: $337,774.91
Episode [500/1300]	Average Shortfall for Agent2: $343,546.48
Episode [600/1300]	Average Shortfall for Agent1: $347,867.41
Episode [600/1300]	Average Shortfall for Agent2: $348,298.31
Episode [700/1300]	Average Shortfall for Agent1: $302,789.39
Episode [700/1300]	Average Shortfall for Agent2: $296,596.55
Episode [800/1300]	Average Shortfall for Agent1: $305,151.05
Episode [800/1300]	Average Shortfall for Agent2: $301,542.19
Episode [900/1300]	Aver

We save the shortfall list of both Agent 1 and Agent 2 as a .npy file to input in visualization.ipynb. The output below is an example of the first few lines of *"1e-6_1e-6_cooporation_shorfall_list.npy"* when Agent 1 and Agent 2 are in corporative relationships trading at $\lambda_{1} = \lambda_{2} = 1e^{-6}$

In [14]:
shortfall = np.array(shortfall_list)
print(shortfall[0:5])
np.save('1e-6_1e-6_cooporation_shorfall_list.npy',shortfall)

[[1168737.12304209 1182497.03935972]
 [1281250.         1281250.        ]
 [1274753.8617609  1278818.425     ]
 [ 958446.85059046  996403.32558362]
 [ 321537.15559445  321944.70308262]]


The output below is the shares remaining at each step time $k$, also known as the trading trajectory of both agents. 

In [15]:
#print(tr1,tr2)
cur_state = env.reset(seed = episode, liquid_time = lqt, num_trades = n_trades, lamb1 = tr1,lamb2 = tr2)

    # set the environment to make transactions
env.start_transactions()

trajectory = np.zeros([n_trades+1,2])
for i in range(n_trades + 1):
    trajectory[i] = cur_state[7:]
    
    print(cur_state[7:])
        # Predict the best action for the current state. 
    cur_state1 = np.delete(cur_state,8)
    cur_state2 = np.delete(cur_state,7)
        #print(cur_state[5:])
    action1 = agent1.act(cur_state1, add_noise = True)
    action2 = agent2.act(cur_state2, add_noise = True)
        #print(action1,action2)
        # Action is performed and new state, reward, info are received. 
    new_state, reward1, reward2, done1, done2, info = env.step(action1,action2)
        
        # current state, action, reward, new state are stored in the experience replay
    new_state1 = np.delete(new_state,8)
    new_state2 = np.delete(new_state,7)
    agent1.step(cur_state1, action1, reward1, new_state1, done1)
    agent2.step(cur_state2, action2, reward2, new_state2, done2)
        # roll over new state
    cur_state = new_state

    if info.done1 and info.done2:
        shortfall_hist1 = np.append(shortfall_hist1, info.implementation_shortfall1)
        shortfall_deque1.append(info.implementation_shortfall1)
            
        shortfall_hist2 = np.append(shortfall_hist2, info.implementation_shortfall2)
        shortfall_deque2.append(info.implementation_shortfall2)
        break
        
if (episode + 1) % 100 == 0: # print average shortfall over last 100 episodes
    print('\rEpisode [{}/{}]\tAverage Shortfall for Agent1: ${:,.2f}'.format(episode + 1, episodes, np.mean(shortfall_deque1)))        
    print('\rEpisode [{}/{}]\tAverage Shortfall for Agent2: ${:,.2f}'.format(episode + 1, episodes, np.mean(shortfall_deque2)))



1e-06 1e-06
[1. 1.]
[0.761694 0.656324]
[0.603648 0.454928]
[0.44365  0.334226]
[0.305346 0.25539 ]
[0.20247  0.202642]
[0.13316  0.148788]
[0.09197 0.10399]
[0.064072 0.074902]
[0.044238 0.052522]
[0.03257  0.036602]
[0.02397  0.024466]
[0.018556 0.01732 ]
[0.013314 0.011942]
[0.009696 0.008204]
[0.006774 0.005622]
[0.004728 0.003898]
[0.003236 0.002704]
[0.00228  0.001762]
[0.00165 0.00114]
[0.001234 0.000778]
[0.000932 0.000566]
[0.000674 0.000392]
[0.000506 0.00029 ]
[0.000376 0.000212]
[0.000294 0.00015 ]
[0.000224 0.000108]
[1.66e-04 7.40e-05]
[1.14e-04 5.40e-05]
[8.2e-05 4.2e-05]
[5.8e-05 3.4e-05]
[4.0e-05 2.8e-05]
[2.6e-05 2.2e-05]
[1.8e-05 1.8e-05]
[1.2e-05 1.4e-05]
[8.e-06 1.e-05]
[6.e-06 8.e-06]
[4.e-06 6.e-06]
[2.e-06 4.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-06]
[2.e-06 2.e-

We save the trajectory of both Agent 1 and Agent 2 as a .npy file to input in visualization.ipynb later. The output below is an example of the first few lines of "*1e-6_1e-6_competition_shorfall_list.npy*" when Agent 1 and Agent 2 are in competitive relationships trading at $\lambda_{1} = \lambda_{2} = 1e^{-6}$

In [None]:
np.save('1e-6_1e-6_competition_trajectory_1500.npy',trajectory)

We will have to repeat the above several times by altering the ac_params in *syntheticChrissAlmgren.py* to generate .npy files of shortfall lists and trajectory lists. Here's a summary of the parameters that require changes and corresponding files generated categorized by experiments: 

Theorem 1：

| File name | Total Number of Shares for Agent1 to Sell |Total Number of Shares for Agent2 to Sell | risk level Agent1| risk level Agent2
| :- | :-: | :-: |:-: |:-: |
| *1e-6_shortfall_list.npy*  | 1,000,000 | 0.0001| 1e^{-6}| 0
| *1e-6_shortfall_list 0.3M.npy*   | 300,000 | 700,000| 1e^{-6}|1e^{-6}
| *1e-6_shortfall_list 0.7M.npy*  | 700,000 | 300,000| 1e^{-6}|1e^{-6}


Theorem 2:

| File name | Total Number of Shares for Agent1 to Sell |Total Number of Shares for Agent2 to Sell | risk level Agent1| risk level Agent2
| :- | :-: | :-: |:-: |:-: |
| *1e-6_shortfall_optimal.npy*   | 1,000,000 | 0.0001| 1e^{-4}| 0
| *1e-9_shortfall_optimal.npy*   | 0.0001 |1,000,000| 0|1e^{-9}
| *1e-4_le-9_trajectory.npy*   | 500,000 | 500,000| 1e^{-4}|1e^{-9}

Competition v. Corperation: 

| File name | Total Number of Shares for Agent1 to Sell |Total Number of Shares for Agent2 to Sell | risk level Agent1| risk level Agent2|reward|episodes
| :- | :-: | :-: |:-: |:-: |:-: |:-: |
| *1e-6_shortfall_list.npy*   | 1,000,000 | 0.0001| 1e^{-6}| 0
| *1e-6_le-6_competition_shortfall_list.npy*   | 500,000 | 500,000| 1e^{-6}| 1e^{-6}|competitive|1300
| *1e-6_le-6_corporatition_shortfall_list.npy*  | 500,000 | 500,000| 1e^{-6}| 1e^{-6}|corporative|1300
| *1e-6_le-6_competition_trajectory_1500.npy*  | 500,000 | 500,000| 1e^{-6}| 1e^{-6}|competition|1500


Liquidation Optimal Strategy:
1.  *1e-6_optimal.npy*
2.  *1e-6_trajectory_fixed-competitor.npy*
3.  *1e-6_trajectory_fixed-corporation.npy*

| File name | Total Number of Shares for Agent1 to Sell |Total Number of Shares for Agent2 to Sell | risk level Agent1| risk level Agent2|reward|episodes
| :- | :-: | :-: |:-: |:-: |:-: |:-: |
| *1e-6_optimal.npy*   | 1,000,000 | 0.0001| 1e^{-6}| 0
| *1e-6_trajectory_fixed-competitor.npy*   | 500,000 | 500,000| 1e^{-6}| 1e^{-6}|competitive|1300
| *1e-6_trajectory_fixed-corporation.npy*  | 500,000 | 500,000| 1e^{-6}| 1e^{-6}|corporative|1300




After getting all the files in the tables above, we will move on to *visualization.ipynb* to analyze and interpret the results.