### PPO AGENT:

#### Load the dataset and preprocess the dataframe in the required format.

Technical indicators created for the environment include:

- RSI
- MACD
- Stoch_k
- OBV
- Upper_BB
- ATR_1
- ATR_2
- ATR_5
- ATR_10
- ATR_20

In [81]:
## New Features to Help Identify Patterns and Make Predictions (added by me, Colin)

### 1 - Execution Speed
- Measures the time taken for orders to be executed after entering the market. Reflects market efficiency and liquidity.

### 2 - Order Size Distribution
- Reports the the distribution of order sizes in the market. It can help identify trends in trading behavior, such as the presence of large institutional orders or changes in market participation.

### 3 - Price Movement
- Tracks the percentage change in price over different time intervals. Helps spot trends and trading opportunities. 

### 4 - Order Book Entropy
- Measures the unpredictability or randomness in the order book. Reveals market uncertainty.

### 5 - Order Size Ratio
- Ratio of the average order size to the total traded volume. Provides insights into market liquidity and the impact of individual trades on the market.

***

### Order arrival/cancellation rate (not implemented)
- The rate at which new orders enter the market or existing orders are canceled. Indicates market activity and trader sentiment.

### Correlation to Top Movers (not implemented)
- Calculates the correlation between a stock's price movement and the top movers in the market. It can help identify stocks that are influenced by broader market trends or specific sector movements.

###  Market Sentiment (not implemented)
- Uses news sentiment analysis, social media sentiment, or other proprietary indicators to infers market participants' overall attitude towards the market. Provides context for interpreting market movements and identifying potential shifts in trend.

SyntaxError: unterminated string literal (detected at line 24) (138537568.py, line 24)

In [67]:
import pandas as pd
import numpy as np
import talib as ta

class TechnicalIndicators:
    def __init__(self, data):
        self.data = data

    def add_momentum_indicators(self):
        self.data['RSI'] = ta.RSI(self.data['Close'], timeperiod=14)
        self.data['MACD'], self.data['MACD_signal'], self.data['MACD_hist'] = ta.MACD(self.data['Close'], fastperiod=12, slowperiod=26, signalperiod=9)
        self.data['Stoch_k'], self.data['Stoch_d'] = ta.STOCH(self.data['High'], self.data['Low'], self.data['Close'],
                                                              fastk_period=14, slowk_period=3, slowd_period=3)

    def add_volume_indicators(self):
        self.data['OBV'] = ta.OBV(self.data['Close'], self.data['Volume'])

    def add_volatility_indicators(self):
        self.data['Upper_BB'], self.data['Middle_BB'], self.data['Lower_BB'] = ta.BBANDS(self.data['Close'], timeperiod=20)
        self.data['ATR_1'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=1)
        self.data['ATR_2'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=2)
        self.data['ATR_5'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=5)
        self.data['ATR_10'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=10)
        self.data['ATR_20'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=20)

    def add_trend_indicators(self):
        self.data['ADX'] = ta.ADX(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=14)
        self.data['+DI'] = ta.PLUS_DI(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=14)
        self.data['-DI'] = ta.MINUS_DI(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=14)
        self.data['CCI'] = ta.CCI(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=5)

    def add_other_indicators(self):
        self.data['DLR'] = np.log(self.data['Close'] / self.data['Close'].shift(1))
        self.data['TWAP'] = self.data['Close'].expanding().mean()
        self.data['VWAP'] = (self.data['Volume'] * (self.data['High'] + self.data['Low']) / 2).cumsum() / self.data['Volume'].cumsum()
    
    def add_new_indicators(self):
        self.data['Execution_Speed'] = self.data['ts_in_delta']/self.data['Volume']
        
        self.data['Running_Mean_Order_Size'] = self.data['size'].expanding().mean()
        self.data['Running_Std_Order_Size'] = self.data['size'].expanding().std()
        
        self.data['Price_Movement_1min'] = self.data['Close'].diff(periods=2).fillna(0)
        self.data['Price_Movement_5min'] = self.data['Close'].diff(periods=10).fillna(0)
        self.data['Price_Movement_20min'] = self.data['Close'].diff(periods=40).fillna(0)
        
        # Entropy of bids and asks:
        #### p = probability of bid (or ask)
        #### Entropy = sum(p*log(p))
        self.data['Total_Size'] = self.data['bid_sz_00'] + self.data['ask_sz_00']
        self.data['Bid_Probability'] = self.data['bid_sz_00'] / self.data['Total_Size']
        self.data['Ask_Probability'] = self.data['ask_sz_00'] / self.data['Total_Size']
        self.data['Bid_Probability'].replace(np.nan, 0, inplace=True)
        self.data['Ask_Probability'].replace(np.nan, 0, inplace=True)
        self.data['Entropy'] = - (data['Bid_Probability'] * np.log2(data['Bid_Probability']) +
                             self.data['Ask_Probability'] * np.log2(data['Ask_Probability']))
        self.data['Entropy'].replace(np.nan, 0, inplace=True)
        
        
        self.data['average_order_size'] = (self.data['bid_sz_00'] + self.data['ask_sz_00']) / 2
        self.data['order_size_ratio'] = self.data['average_order_size'] / self.data['Volume']
        

    def add_all_indicators(self):
        self.add_momentum_indicators()
        self.add_volume_indicators()
        self.add_volatility_indicators()
        self.add_trend_indicators()
        self.add_other_indicators()
        self.add_new_indicators()
        return self.data

In [82]:
data = pd.read_csv('./trade-data.tbbo.csv')

# Preprocessing to create necessary columns
data['price']=data['price']/1e9
data['bid_px_00']=data['bid_px_00']/1e9
data['ask_px_00']=data['ask_px_00']/1e9

data['Close'] = data['price']
data['Volume'] = data['size']
data['High'] = data[['bid_px_00', 'ask_px_00']].max(axis=1)
data['Low'] = data[['bid_px_00', 'ask_px_00']].min(axis=1)
data['Open'] = data['Close'].shift(1).fillna(data['Close'])


ti = TechnicalIndicators(data)
df_with_indicators = ti.add_all_indicators()
market_features_df = df_with_indicators[35:]

FileNotFoundError: [Errno 2] No such file or directory: './trade-data.tbbo.csv'

Checking the dataset:

In [70]:
# Show all columns in pandas
pd.set_option('display.max_columns', None)

market_features_df.head(5000)

Unnamed: 0,ts_recv,ts_event,rtype,publisher_id,instrument_id,action,side,depth,price,size,flags,ts_in_delta,sequence,bid_px_00,ask_px_00,bid_sz_00,ask_sz_00,bid_ct_00,ask_ct_00,symbol,Close,Volume,High,Low,Open,RSI,MACD,MACD_signal,MACD_hist,Stoch_k,Stoch_d,OBV,Upper_BB,Middle_BB,Lower_BB,ATR_1,ATR_2,ATR_5,ATR_10,ATR_20,ADX,+DI,-DI,CCI,DLR,TWAP,VWAP,Execution_Speed,Running_Mean_Order_Size,Running_Std_Order_Size,Price_Movement_1min,Price_Movement_5min,Price_Movement_20min,Total_Size,Bid_Probability,Ask_Probability,Entropy,average_order_size,order_size_ratio
35,1688371214386057385,1688371214385893078,1,2,32,T,N,0,194.05,50,130,164307,326232,194.00,194.30,3101,19,4,10,AAPL,194.05,50,194.30,194.00,194.05,54.544543,0.006271,-0.003130,0.009401,5.252525e+01,6.195286e+01,-266.0,194.065621,194.0170,193.968379,0.30,0.175078,0.098615,0.075141,0.072403,97.257397,30.435801,0.196362,166.666667,0.000000,194.020000,194.021894,3286.140000,52.638889,142.027795,0.00,0.05,0.00,3120,0.993910,0.006090,0.053576,1560.0,31.200000
36,1688371214386063777,1688371214385899379,1,2,32,T,N,0,194.05,50,130,164398,326233,194.00,194.30,3101,19,4,10,AAPL,194.05,50,194.30,194.00,194.05,54.544543,0.007108,-0.001082,0.008190,3.838384e+01,5.252525e+01,-266.0,194.068990,194.0200,193.971010,0.30,0.237539,0.138892,0.097627,0.083783,97.361721,22.989295,0.148320,83.333333,0.000000,194.020811,194.025188,3287.960000,52.567568,140.041966,0.00,0.05,0.00,3120,0.993910,0.006090,0.053576,1560.0,31.200000
37,1688371215804852019,1688371215804687301,1,2,32,T,B,0,194.21,10,130,164718,328131,194.00,194.21,3101,29,4,1,AAPL,194.21,10,194.21,194.00,194.05,85.890753,0.020446,0.003223,0.017223,4.040404e+01,4.377104e+01,-256.0,194.125889,194.0305,193.935111,0.21,0.223770,0.153114,0.108864,0.090094,97.458593,19.409454,0.125224,79.268293,0.000824,194.025789,194.025596,16471.800000,51.447368,138.309035,0.16,0.21,0.00,3130,0.990735,0.009265,0.075881,1565.0,156.500000
38,1688371219671476629,1688371219671312224,1,2,32,T,N,0,194.14,10,130,164405,331406,194.00,194.16,3101,400,4,1,AAPL,194.14,10,194.16,194.00,194.21,64.827662,0.025079,0.007594,0.017484,4.949495e+01,4.276094e+01,-266.0,194.142928,194.0375,193.932072,0.21,0.216885,0.164491,0.118978,0.096089,97.548546,16.622008,0.107240,-3.205128,-0.000360,194.028718,194.025873,16440.500000,50.384615,136.638327,0.09,0.14,0.00,3501,0.885747,0.114253,0.512613,1750.5,175.050000
39,1688371223368835585,1688371223368671235,1,2,32,T,B,0,194.13,10,130,164350,334235,194.00,194.13,3101,400,4,1,AAPL,194.13,10,194.13,194.00,194.14,62.470772,0.027625,0.011601,0.016025,5.757576e+01,4.915825e+01,-276.0,194.155247,194.0440,193.932753,0.14,0.178442,0.159593,0.121080,0.098285,97.632074,15.068361,0.097216,-113.095238,-0.000052,194.031250,194.026071,16435.000000,49.375000,135.026244,-0.08,0.08,0.00,3501,0.885747,0.114253,0.512613,1750.5,175.050000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5030,1688391020293267625,1688391020293098412,1,2,32,T,A,0,193.70,100,130,169213,15856950,193.70,193.75,100,376,1,3,AAPL,193.70,100,193.75,193.70,193.70,31.811325,-0.012130,-0.004826,-0.007304,3.505344e-13,5.555556e+00,872207.0,193.793410,193.7435,193.693590,0.05,0.049697,0.043135,0.035023,0.028844,30.895002,3.461891,10.081935,-41.666667,0.000000,193.633410,193.781079,1692.130000,226.037567,12262.488814,0.00,-0.07,-0.06,476,0.210084,0.789916,0.741643,238.0,2.380000
5031,1688391020299663410,1688391020299496955,1,2,32,T,A,0,193.70,198,0,166455,15857550,193.70,193.72,298,190,2,3,AAPL,193.70,198,193.72,193.70,193.70,31.811325,-0.013163,-0.006493,-0.006669,3.505344e-13,5.893557e-13,872207.0,193.793216,193.7405,193.687784,0.02,0.034849,0.038508,0.033521,0.028402,32.179550,3.301844,9.615834,-166.666667,0.000000,193.633423,193.781067,840.681818,226.031995,12261.270067,0.00,-0.06,-0.05,488,0.610656,0.389344,0.964375,244.0,1.232323
5032,1688391020299663410,1688391020299496955,1,2,32,T,A,0,193.70,100,130,166455,15857551,193.70,193.72,100,190,1,3,AAPL,193.70,100,193.72,193.70,193.70,31.811325,-0.013822,-0.007959,-0.005863,3.505344e-13,5.893557e-13,872207.0,193.792227,193.7375,193.682773,0.02,0.027424,0.034807,0.032169,0.027982,33.372345,3.145249,9.159790,-83.333333,0.000000,193.633436,193.781060,1664.550000,226.006954,12260.051805,0.00,-0.05,-0.05,290,0.344828,0.655172,0.929364,145.0,1.450000
5033,1688391020307445954,1688391020307281609,1,2,32,T,A,0,193.71,2,130,164345,15858143,193.71,193.72,2,390,1,4,AAPL,193.71,2,193.72,193.71,193.70,38.866410,-0.013383,-0.009044,-0.004339,4.166667e+00,1.388889e+00,872209.0,193.789955,193.7350,193.680045,0.02,0.023712,0.031845,0.030952,0.027583,34.479940,2.992413,8.714691,20.833333,0.000052,193.633452,193.781060,82172.500000,225.962455,12258.834185,0.01,-0.04,-0.04,392,0.005102,0.994898,0.046192,196.0,98.000000


In [71]:
df_with_indicators.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59271 entries, 0 to 59270
Data columns (total 59 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   ts_recv                  59271 non-null  int64  
 1   ts_event                 59271 non-null  int64  
 2   rtype                    59271 non-null  int64  
 3   publisher_id             59271 non-null  int64  
 4   instrument_id            59271 non-null  int64  
 5   action                   59271 non-null  object 
 6   side                     59271 non-null  object 
 7   depth                    59271 non-null  int64  
 8   price                    59271 non-null  float64
 9   size                     59271 non-null  int64  
 10  flags                    59271 non-null  int64  
 11  ts_in_delta              59271 non-null  int64  
 12  sequence                 59271 non-null  int64  
 13  bid_px_00                59271 non-null  float64
 14  ask_px_00             

#### Create the Trading Environment class for the PPO Agent

In [72]:
import gym
from gym import spaces
import numpy as np
import pandas as pd

class TradingEnvironment(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self, data, daily_trading_limit):
        super(TradingEnvironment, self).__init__()
        self.data = data
        self.daily_trading_limit = daily_trading_limit
        self.current_step = 0

        # Extract state columns
        self.state_columns = ['Close', 'Volume', 'RSI', 'MACD', 'MACD_signal', 'MACD_hist', 'Stoch_k', 'Stoch_d', 'OBV', 
                              'Upper_BB', 'Middle_BB', 'Lower_BB', 'ATR_1', 'ADX', '+DI', '-DI', 'CCI', 'Execution_Speed', 
                              'Running_Mean_Order_Size', 'Running_Std_Order_Size', 'Price_Movement_1min', 'Price_Movement_5min',
                              'Price_Movement_20min', 'Total_Size', 'Bid_Probability', 'Ask_Probability', 'Entropy', 
                              'average_order_size', 'order_size_ratio']


        # Initialize balance, shares held, and total shares traded
        self.balance = 10_000_000.0  # $10 million
        self.shares_held = 0
        self.total_shares_traded = 0

        # Define action space: [Hold, Buy, Sell]
        self.action_space = spaces.Discrete(3)

        # Define observation space based on state columns
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(len(self.state_columns),), dtype=np.float32
        )

    def reset(self):
        self.current_step = 0
        self.balance = 10_000_000.0  # $10 million
        self.shares_held = 0
        self.total_shares_traded = 0
        self.cumulative_reward = 0
        self.trades = []
        return self._next_observation()

    def _next_observation(self):
        return self.data[self.state_columns].iloc[self.current_step].values

    def step(self, action):
        expected_price = self.data.iloc[self.current_step]['ask_px_00']
        actual_price = self.data.iloc[self.current_step]['price']
        transaction_time = self.data.iloc[self.current_step]['ts_in_delta']
        self._take_action(action)
        reward = 0
        
        if self.current_step >= len(self.data) - 1:
            self.current_step = 0
        if action != 0:
            transaction_cost= self._calculate_transaction_cost(self.data.iloc[self.current_step]['Volume'], 0.3, self.data['Volume'].mean())
            reward = self._calculate_reward(expected_price, actual_price, transaction_time, transaction_cost)
            self.cumulative_reward += reward
            if self.trades:
                self.trades[-1]['reward'] = reward
                self.trades[-1]['transaction_cost'] = transaction_cost
                self.trades[-1]['slippage'] = expected_price - actual_price
                self.trades[-1]['time_penalty'] = 100*transaction_time/1e9
        done = self.current_step == len(self.data) - 1
        obs = self._next_observation()
        info = {
        'step': self.current_step,
        'action': action,
        'price': actual_price,
        'shares': self.trades[-1]['shares'] if self.trades else 0
    }
        self.current_step += 1

        return obs, reward, done, info

    def _take_action(self, action):
        current_price = self.data.iloc[self.current_step]['Close']
        current_time = pd.to_datetime(self.data.iloc[self.current_step]['ts_event'])
        trade_info = {'step': self.current_step, 'timestamp': current_time, 'action': action, 'price': current_price, 'shares': 0, 'reward': 0, 'transaction_cost': 0, 'slippage': 0, 'time_penalty': 0}

        if action == 1: # and self.total_shares_traded < self.daily_trading_limit:  # Buy
            shares_bought = (self.balance * np.random.uniform(0.001, 0.005)) // current_price
            self.balance -= shares_bought * current_price
            self.shares_held += shares_bought
            self.total_shares_traded += shares_bought
            trade_info['shares'] = shares_bought
            if(shares_bought>0):
                self.trades.append(trade_info)
        elif action == 2: # and self.total_shares_traded < self.daily_trading_limit:  # Sell
            shares_sold = min((self.balance * np.random.uniform(0.001, 0.005)) // current_price, self.shares_held)
            self.balance += shares_sold * current_price
            self.shares_held -= shares_sold
            self.total_shares_traded -= shares_sold
            trade_info['shares'] = shares_sold
            if(shares_sold>0):
                self.trades.append(trade_info)

    def _calculate_reward(self, expected_price, actual_price, transaction_time, transaction_cost):
        slippage = expected_price - actual_price
        time_penalty = 100*transaction_time/1e9
        reward = - (slippage + time_penalty + transaction_cost)
        return reward
    
    def _calculate_transaction_cost(self, volume, volatility, daily_volume):
        return volatility * np.sqrt(volume / daily_volume)
    
    def run(self):
        self.reset()
        for _ in range(len(self.data)):
            self.step()
        return self.cumulative_reward, self.trades

    def render(self, mode='human', close=False):
        print(f'Step: {self.current_step}')
        print(f'Balance: {self.balance}')
        print(f'Shares held: {self.shares_held}')
        print(f'Total shares traded: {self.total_shares_traded}')
        print(f'Total portfolio value: {self.balance + self.shares_held * self.data.iloc[self.current_step]["Close"]}')
        print(f'Cumulative reward: {self.cumulative_reward}')
        self.print_trades()

    def print_trades(self):
        # download all trades in a pandas dataframe using .csv
        trades_df = pd.DataFrame(self.trades)
        # Save a csv
        trades_df.to_csv('trades_ppo.csv', index=False)
        for trade in self.trades:
            print(f"Step: {trade['step']}, Timestamp: {trade['timestamp']}, Action: {trade['action']}, Price: {trade['price']}, Shares: {trade['shares']}, Reward: {trade['reward']}, Transaction Cost: {trade['transaction_cost']}, Slippage: {trade['slippage']}, Time Penalty: {trade['time_penalty']}")

#### Train the PPO Agent with the environment and for different tickers.

In [73]:
# Define the daily trading limit (total number of shares to trade per day)
daily_trading_limit = 1000

ticker = 'AAPL'  # Specify the ticker you want to trade
ticker_data = market_features_df[market_features_df['symbol'] == ticker]

env = TradingEnvironment(ticker_data, daily_trading_limit)  # Adjust window_size if needed

In [74]:
import pandas as pd
from stable_baselines3 import PPO

# Define the daily trading limit (total number of shares to trade per day)
daily_trading_limit = 1000

ticker = 'AAPL'  # Specify the ticker you want to trade
ticker_data = market_features_df[market_features_df['symbol'] == ticker]

# Create the trading environment
env = TradingEnvironment(ticker_data, daily_trading_limit)  # Adjust window_size if needed

# Define the best hyperparameters
best_hyperparameters = {'learning_rate': 0.0009931989008886031,'n_steps': 512,'batch_size': 128, 'gamma': 0.9916829193042708,'clip_range': 0.21127653449387027,'n_epochs': 6} # type: ignore

# Create the RL model with the best hyperparameters
model = PPO('MlpPolicy', env, verbose=1, **best_hyperparameters)

# Train the model
model.learn(total_timesteps=10000)

# Save the model
model.save("trading_agent")

# Evaluate the model
obs = env.reset()
for _ in range(len(ticker_data)):
    action, _states = model.predict(obs)
    obs, rewards, done, info = env.step(action)
    if done:
        break

# Render the final state
env.render()

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.




----------------------------
| time/              |     |
|    fps             | 137 |
|    iterations      | 1   |
|    time_elapsed    | 3   |
|    total_timesteps | 512 |
----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 140         |
|    iterations           | 2           |
|    time_elapsed         | 7           |
|    total_timesteps      | 1024        |
| train/                  |             |
|    approx_kl            | 0.010698488 |
|    clip_fraction        | 0.0869      |
|    clip_range           | 0.211       |
|    entropy_loss         | -1.09       |
|    explained_variance   | -1.29       |
|    learning_rate        | 0.000993    |
|    loss                 | 0.104       |
|    n_updates            | 6           |
|    policy_gradient_loss | -0.0113     |
|    value_loss           | 1           |
-----------------------------------------
----------------------------------------


------------------------------------------
| time/                   |              |
|    fps                  | 145          |
|    iterations           | 13           |
|    time_elapsed         | 45           |
|    total_timesteps      | 6656         |
| train/                  |              |
|    approx_kl            | 0.0069027543 |
|    clip_fraction        | 0.187        |
|    clip_range           | 0.211        |
|    entropy_loss         | -0.966       |
|    explained_variance   | -0.00139     |
|    learning_rate        | 0.000993     |
|    loss                 | 0.263        |
|    n_updates            | 72           |
|    policy_gradient_loss | -0.00415     |
|    value_loss           | 1.72         |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 145         |
|    iterations           | 14          |
|    time_elapsed         | 49          |
|    total_times

### TRADING BLOTTER:

#### Preprocess the data for the trading blotter:

In [75]:
import pandas as pd
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

INITIAL_CASH = 10_000_000  # $10 million

def preprocess_data(df):
    df['liquidity'] = df['bid_sz_00'] * df['bid_px_00'] + df['ask_sz_00'] * df['ask_px_00']
    return df

def calculate_rsi(data, window=14):
    delta = data.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    rs = gain / loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

def calculate_vol_and_liquidity(price_df, volume_df, window_size):
    # Calculate rolling statistics
    rolling_mean_vol = price_df.pct_change().rolling(window=window_size).mean()
    rolling_std_vol = price_df.pct_change().rolling(window=window_size).std()
    rolling_mean_liq = volume_df.rolling(window=window_size).mean()
    rolling_std_liq = volume_df.rolling(window=window_size).std()
    
    return rolling_mean_vol, rolling_std_vol, rolling_mean_liq, rolling_std_liq

def get_percentile(current_value, mean, std):
    if std > 0:
        z_score = (current_value - mean) / std
        percentile = norm.cdf(z_score)
    else:
        percentile = 0.5  # No variation
    return percentile

def get_trade_price(base_price, current_vol, current_liq, mean_vol, std_vol, mean_liq, std_liq, trade_direction):
    vol_percentile = get_percentile(current_vol, mean_vol, std_vol)
    liq_percentile = get_percentile(current_liq, mean_liq, std_liq)

    # Define price adjustment scenarios based on market conditions
    if vol_percentile >= 0.9 and liq_percentile < 0.1:
        price_adjustment_percent = np.random.uniform(-0.25, -0.15)
    elif vol_percentile <= 0.1 and liq_percentile < 0.1:
        price_adjustment_percent = np.random.uniform(-0.10, -0.05)
    elif vol_percentile >= 0.9 and liq_percentile >= 0.9:
        price_adjustment_percent = np.random.uniform(-0.05, +0.10)
    else:
        price_adjustment_percent = np.random.uniform(-0.05, +0.05)  # Default for normal conditions

    # Adjust price based on trade direction
    if trade_direction == 'BUY':
        adjusted_price = base_price * (1 - price_adjustment_percent)
    else:  # SELL
        adjusted_price = base_price * (1 + price_adjustment_percent)
    
    return adjusted_price


#### Create trading environment for the blotter

In [76]:
class TradingEnvironmentwithBlotter:
    def __init__(self, data, daily_trading_limit, window_size):
        self.data = preprocess_data(data)
        self.daily_trading_limit = daily_trading_limit
        self.window_size = window_size
        self.state_columns = ['price', 'liquidity', 'RSI', 'MACD', 'MACD_signal', 'MACD_hist', 'Stoch_k', 'Stoch_d',
                              'OBV', 'Upper_BB', 'Middle_BB', 'Lower_BB', 'ATR_1', 'ADX', '+DI', '-DI', 'CCI',
                              'Execution_Speed', 'Running_Mean_Order_Size', 'Running_Std_Order_Size', 'Price_Movement_1min',
                              'Price_Movement_5min', 'Price_Movement_20min', 'Total_Size', 'Bid_Probability', 'Ask_Probability',
                              'Entropy', 'average_order_size', 'order_size_ratio']

        self.reset()

    def reset(self):
        self.current_step = 0
        self.balance = INITIAL_CASH
        self.shares_held = 0
        self.total_shares_traded = 0
        self.cumulative_reward = 0
        self.trades = []
        self.portfolio = {'cash': self.balance, 'holdings': {ticker: 0 for ticker in self.data['symbol'].unique()}}
        self.data['RSI'] = calculate_rsi(self.data['price'])
        self.data['pct_change'] = self.data['price'].pct_change()
        self.data['rolling_mean_vol'], self.data['rolling_std_vol'], self.data['rolling_mean_liq'], self.data['rolling_std_liq'] = calculate_vol_and_liquidity(self.data['price'], self.data['liquidity'], self.window_size)

    def step(self):
        row = self.data.iloc[self.current_step]
        current_price = row['price']
        current_time = pd.to_datetime(row['ts_event'])
        current_rsi = row['RSI']
        current_vol = row['pct_change']
        current_liq = row['liquidity']
        mean_vol = row['rolling_mean_vol']
        std_vol = row['rolling_std_vol']
        mean_liq = row['rolling_mean_liq']
        std_liq = row['rolling_std_liq']

        if current_rsi < 30:  # Entry signal based on RSI
            trade_direction = 'BUY'
            trade_price = get_trade_price(current_price, current_vol, current_liq, mean_vol, std_vol, mean_liq, std_liq, trade_direction)
            trade_size = (self.portfolio['cash'] * np.random.uniform(0.001, 0.005)) / trade_price
            if self.portfolio['cash'] >= trade_size * trade_price:
                self.portfolio['cash'] -= trade_size * trade_price
                self.portfolio['holdings'][row['symbol']] += trade_size
                trade_status = 'filled'
            else:
                trade_status = 'cancelled'
        elif current_rsi > 70:  # Exit signal based on RSI
            trade_direction = 'SELL'
            if self.portfolio['holdings'][row['symbol']] > 0:
                trade_size = min(self.portfolio['holdings'][row['symbol']], self.portfolio['cash']*np.random.uniform(0.001, 0.005) / current_price)
                trade_price = get_trade_price(current_price, current_vol, current_liq, mean_vol, std_vol, mean_liq, std_liq, trade_direction)
                self.portfolio['cash'] += trade_size * trade_price
                self.portfolio['holdings'][row['symbol']] -= trade_size
                trade_status = 'filled'
            else:
                trade_size = 0
                trade_status = 'cancelled'
        else:
            trade_direction = 'HOLD'
            trade_size = 0
            trade_price = current_price
            trade_status = 'skipped'

        if trade_size > 0:
            expected_price = row['ask_px_00']
            actual_price = row['price']
            transaction_time = row['ts_in_delta']
            transaction_cost = self._calculate_transaction_cost(row['Volume'], 0.3, self.data['Volume'].mean())
            slippage = expected_price - actual_price
            time_penalty = 1000 * transaction_time / 1e9
            reward = - (slippage + time_penalty + transaction_cost)
        
            self.cumulative_reward += reward
            self.trades.append({
                'step': self.current_step,
                'timestamp': current_time,
                'action': trade_direction,
                'price': trade_price,
                'shares': trade_size,
                'symbol': row['symbol'],
                'reward': reward,
                'transaction_cost': transaction_cost,
                'slippage': slippage,
                'time_penalty': time_penalty
            })

            

        self.current_step += 1
        if self.current_step >= len(self.data) - 1:
            done=True
            self.current_step = 0

    def _calculate_transaction_cost(self, volume, volatility, daily_volume):
        return volatility * np.sqrt(volume / daily_volume)

    def run(self):
        self.reset()
        for _ in range(len(self.data)):
            self.step()
        return self.cumulative_reward, self.trades

    def render(self):
        print(f'Cumulative reward: {self.cumulative_reward}')
        row = self.data.iloc[self.current_step]
        print(f'Total portfolio value: {self.portfolio["cash"] + self.portfolio["holdings"][row["symbol"]]*row["Close"]}')
        # get trades in a pandas dataframe
        trades_df = pd.DataFrame(self.trades)
        # Save a csv
        trades_df.to_csv('trades_blotter.csv', index=False)
        for trade in self.trades:
            print(f"Step: {trade['step']}, Timestamp: {trade['timestamp']}, Action: {trade['action']}, Price: {trade['price']}, Shares: {trade['shares']}, Symbol: {trade['symbol']}, Reward: {trade['reward']}, Transaction Cost: {trade['transaction_cost']}, Slippage: {trade['slippage']}, Time Penalty: {trade['time_penalty']}")

#### Run the trading blotter

In [77]:
# Filter data for the specified ticker
ticker = 'AAPL'  # Specify the ticker you want to trade
ticker_data = market_features_df[market_features_df['symbol'] == ticker]

window_size = 60
daily_trading_limit = 1000
# Create the trading environment
env = TradingEnvironmentwithBlotter(ticker_data, daily_trading_limit=1000, window_size=window_size)  # Daily trading limit of 1000 shares

# Run the environment
cumulative_reward, trades = env.run()

# Render the results
env.render()

Cumulative reward: -12231.516067279417
Total portfolio value: 9987373.107144544


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [78]:
df=market_features_df.copy()

In [79]:
df['timestamp']=pd.to_datetime(df['ts_recv'])

In [80]:
df.head()

Unnamed: 0,ts_recv,ts_event,rtype,publisher_id,instrument_id,action,side,depth,price,size,flags,ts_in_delta,sequence,bid_px_00,ask_px_00,bid_sz_00,ask_sz_00,bid_ct_00,ask_ct_00,symbol,Close,Volume,High,Low,Open,RSI,MACD,MACD_signal,MACD_hist,Stoch_k,Stoch_d,OBV,Upper_BB,Middle_BB,Lower_BB,ATR_1,ATR_2,ATR_5,ATR_10,ATR_20,ADX,+DI,-DI,CCI,DLR,TWAP,VWAP,Execution_Speed,Running_Mean_Order_Size,Running_Std_Order_Size,Price_Movement_1min,Price_Movement_5min,Price_Movement_20min,Total_Size,Bid_Probability,Ask_Probability,Entropy,average_order_size,order_size_ratio,timestamp
35,1688371214386057385,1688371214385893078,1,2,32,T,N,0,194.05,50,130,164307,326232,194.0,194.3,3101,19,4,10,AAPL,194.05,50,194.3,194.0,194.05,54.544543,0.006271,-0.00313,0.009401,52.525253,61.952862,-266.0,194.065621,194.017,193.968379,0.3,0.175078,0.098615,0.075141,0.072403,97.257397,30.435801,0.196362,166.666667,0.0,194.02,194.021894,3286.14,52.638889,142.027795,0.0,0.05,0.0,3120,0.99391,0.00609,0.053576,1560.0,31.2,2023-07-03 08:00:14.386057385
36,1688371214386063777,1688371214385899379,1,2,32,T,N,0,194.05,50,130,164398,326233,194.0,194.3,3101,19,4,10,AAPL,194.05,50,194.3,194.0,194.05,54.544543,0.007108,-0.001082,0.00819,38.383838,52.525253,-266.0,194.06899,194.02,193.97101,0.3,0.237539,0.138892,0.097627,0.083783,97.361721,22.989295,0.14832,83.333333,0.0,194.020811,194.025188,3287.96,52.567568,140.041966,0.0,0.05,0.0,3120,0.99391,0.00609,0.053576,1560.0,31.2,2023-07-03 08:00:14.386063777
37,1688371215804852019,1688371215804687301,1,2,32,T,B,0,194.21,10,130,164718,328131,194.0,194.21,3101,29,4,1,AAPL,194.21,10,194.21,194.0,194.05,85.890753,0.020446,0.003223,0.017223,40.40404,43.771044,-256.0,194.125889,194.0305,193.935111,0.21,0.22377,0.153114,0.108864,0.090094,97.458593,19.409454,0.125224,79.268293,0.000824,194.025789,194.025596,16471.8,51.447368,138.309035,0.16,0.21,0.0,3130,0.990735,0.009265,0.075881,1565.0,156.5,2023-07-03 08:00:15.804852019
38,1688371219671476629,1688371219671312224,1,2,32,T,N,0,194.14,10,130,164405,331406,194.0,194.16,3101,400,4,1,AAPL,194.14,10,194.16,194.0,194.21,64.827662,0.025079,0.007594,0.017484,49.494949,42.760943,-266.0,194.142928,194.0375,193.932072,0.21,0.216885,0.164491,0.118978,0.096089,97.548546,16.622008,0.10724,-3.205128,-0.00036,194.028718,194.025873,16440.5,50.384615,136.638327,0.09,0.14,0.0,3501,0.885747,0.114253,0.512613,1750.5,175.05,2023-07-03 08:00:19.671476629
39,1688371223368835585,1688371223368671235,1,2,32,T,B,0,194.13,10,130,164350,334235,194.0,194.13,3101,400,4,1,AAPL,194.13,10,194.13,194.0,194.14,62.470772,0.027625,0.011601,0.016025,57.575758,49.158249,-276.0,194.155247,194.044,193.932753,0.14,0.178442,0.159593,0.12108,0.098285,97.632074,15.068361,0.097216,-113.095238,-5.2e-05,194.03125,194.026071,16435.0,49.375,135.026244,-0.08,0.08,0.0,3501,0.885747,0.114253,0.512613,1750.5,175.05,2023-07-03 08:00:23.368835585
