# Deep Q-Learning Applied to Algorithmic Trading

<a href="https://www.kaggle.com/addarm/unsupervised-learning-as-signals-for-pairs-trading" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

INTRO


This deep learning network was inspired by the paper:
```BibTeX
@article{theate2021application,
  title={An application of deep reinforcement learning to algorithmic trading},
  author={Th{\'e}ate, Thibaut and Ernst, Damien},
  journal={Expert Systems with Applications},
  volume={173},
  pages={114632},
  year={2021},
  publisher={Elsevier}
}
```

In [1]:
import os
import warnings
warnings.filterwarnings("ignore")

IS_KAGGLE = os.getenv('IS_KAGGLE', 'True') == 'True'
if IS_KAGGLE:
    # Kaggle confgs
    print('Running in Kaggle...')
    %pip install scikit-learn
    %pip install tensorflow
    %pip install tqdm
    %pip install matplotlib
    %pip install python-dotenv
    %pip install yfinance
    %pip install pyarrow
    for dirname, _, filenames in os.walk('/kaggle/input'):
        for filename in filenames:
            print(os.path.join(dirname, filename))

    DATA_DIR = "/kaggle/input/DATASET"
else:
    DATA_DIR = "./data/"
    print('Running Local...')

import numpy as np
import yfinance as yf
import pandas as pd
from datetime import datetime
from pandas.tseries.offsets import BDay
import matplotlib.pyplot as plt
from tqdm import tqdm
from scipy.stats import skew, kurtosis
import pyarrow as pa
import pyarrow.parquet as pq

os.getcwd()

Running Local...


'c:\\Users\\adamd\\workspace\\deep-reinforced-learning'

In [2]:
START_DATE = "2012-01-01"
SPLIT_DATE = '2018-1-1' # Turning point from train to tst
END_DATE = "2019-12-31" # pd.Timestamp(datetime.now() - BDay(1)).strftime('%Y-%m-%d')
DATA_DIR = "./data"
INDEX = "Date"
TICKER_SYMBOLS = [
    'DIA',  # Dow Jones
    'SPY',  # S&P 500
    'QQQ',  # NASDAQ 100
    'EZU',  # FTSE 100
    'EWJ',  # Nikkei 225
    'GOOGL',  # Google
    'AAPL',  # Apple
    'META',  # Facebook
    'AMZN',  # Amazon
    'MSFT',  # Microsoft
    'NOK',  # Nokia
    'PHIA.AS',  # Philips
    'SIE.DE',  # Siemens
    'BIDU',  # Baidu
    'BABA',  # Alibaba
    '0700.HK',  # Tencent
    '6758.T',  # Sony
    'JPM',  # JPMorgan Chase
    'HSBC',  # HSBC
    '0939.HK',  # CCB
    'XOM',  # ExxonMobil
    'TSLA',  # Tesla
    'VOW3.DE',  # Volkswagen
    '7203.T',  # Toyota
    'KO',  # Coca Cola
    'ABI.BR',  # AB InBev
    '2503.T',  # Kirin
]

TARGET = 'TSLA'
INTERVAL = "1d"

CAPITAL = 100000
STATE_LEN = 30
FEES = 0.1 / 100
OBS_SPACE = (STATE_LEN)*4 # 4 dims: HLOC
ACT_SPACE = 2
EPISODES = 50

# Financial Data

In [3]:
def get_tickerdata(tickers_symbols, start=START_DATE, end=END_DATE, interval=INTERVAL, datadir=DATA_DIR):
    tickers = {}
    earliest_end= datetime.strptime(end,'%Y-%m-%d')
    latest_start = datetime.strptime(start,'%Y-%m-%d')
    os.makedirs(DATA_DIR, exist_ok=True)
    for symbol in tickers_symbols:
        cached_file_path = f"{datadir}/{symbol}-{start}-{end}-{interval}.csv"

        try:
            if os.path.exists(cached_file_path):
                df = pd.read_parquet(cached_file_path)
                df.index = pd.to_datetime(df.index)
                assert len(df) > 0
            else:
                df = yf.download(
                    symbol,
                    start=START_DATE,
                    end=END_DATE,
                    progress=False,
                    interval=INTERVAL,
                )
                assert len(df) > 0
                df.to_parquet(cached_file_path, index=False, compression="snappy")
            min_date = df.index.min()
            max_date = df.index.max()
            nan_count = df["Close"].isnull().sum()
            skewness = round(skew(df["Close"].dropna()), 2)
            kurt = round(kurtosis(df["Close"].dropna()), 2)
            outliers_count = (df["Close"] > df["Close"].mean() + (3 * df["Close"].std())).sum()
            print(
                f"{symbol} => min_date: {min_date}, max_date: {max_date}, kurt:{kurt}, skewness:{skewness}, outliers_count:{outliers_count},  nan_count: {nan_count}"
            )
            tickers[symbol] = df

            if min_date > latest_start:
                latest_start = min_date
            if max_date < earliest_end:
                earliest_end = max_date
        except Exception as e:
            print(f"Error with {symbol}: {e}")

    return tickers, latest_start, earliest_end

tickers, latest_start, earliest_end = get_tickerdata(TICKER_SYMBOLS)
tickers[TARGET]

DIA => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-1.09, skewness:0.4, outliers_count:0,  nan_count: 0
SPY => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-0.97, skewness:0.12, outliers_count:0,  nan_count: 0
QQQ => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-1.01, skewness:0.41, outliers_count:0,  nan_count: 0
EZU => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-0.38, skewness:-0.37, outliers_count:0,  nan_count: 0
EWJ => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-0.56, skewness:-0.15, outliers_count:0,  nan_count: 0
GOOGL => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-1.19, skewness:0.22, outliers_count:0,  nan_count: 0
AAPL => min_date: 1970-01-01 00:00:00, max_date: 1970-01-01 00:00:00.000002010, kurt:-0.16, skewness:0.77, outliers_count:10,  nan_count: 0
META => min_date: 1970-01-01 

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
1970-01-01 00:00:00.000000000,1.929333,1.966667,1.843333,1.872000,1.872000,13921500
1970-01-01 00:00:00.000000001,1.880667,1.911333,1.833333,1.847333,1.847333,9451500
1970-01-01 00:00:00.000000002,1.850667,1.862000,1.790000,1.808000,1.808000,15082500
1970-01-01 00:00:00.000000003,1.813333,1.852667,1.760667,1.794000,1.794000,14794500
1970-01-01 00:00:00.000000004,1.800000,1.832667,1.741333,1.816667,1.816667,13455000
...,...,...,...,...,...,...
1970-01-01 00:00:00.000002006,27.452000,28.134001,27.333332,27.948000,27.948000,199794000
1970-01-01 00:00:00.000002007,27.890667,28.364668,27.512667,28.350000,28.350000,120820500
1970-01-01 00:00:00.000002008,28.527332,28.898666,28.423332,28.729334,28.729334,159508500
1970-01-01 00:00:00.000002009,29.000000,29.020666,28.407333,28.691999,28.691999,149185500


# Deep Q-Network Architecure

In [4]:
import os
import gym
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout
from tensorflow.keras.optimizers import Adam
from collections import deque
import random

# Define the Replay Memory
class ReplayMemory:
    def __init__(self, capacity):
        self.memory = deque(maxlen=capacity)

    def push(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def sample(self, batch_size):
        return random.sample(self.memory, batch_size)

    def __len__(self):
        return len(self.memory)


class DQNModel(Sequential):
    def __init__(self, observation_space, action_space, learning_rate=0.0001):
        super(DQNModel, self).__init__()
        self.add(Dense(512, activation='relu', input_shape=(observation_space,)))
        #self.add(BatchNormalization())
        self.add(Dropout(0.2))
        self.add(Dense(512, activation='relu'))
        #self.add(BatchNormalization())
        self.add(Dropout(0.2))
        self.add(Dense(action_space, activation='linear'))
        self.compile(loss='mean_squared_error', optimizer=Adam(learning_rate=learning_rate))


class DQNAgent:
    def __init__(self, observation_space, action_space, replay_memory_size=100000, batch_size=64, gamma=0.99, epsilon=1.0, epsilon_min=0.01, epsilon_decay=0.995, learning_rate=0.0001, target_update_iter=1000):
        self.action_space = action_space
        self.memory = ReplayMemory(replay_memory_size)  # Replay memory to store experiences
        self.gamma = gamma  # Discount factor for future rewards
        self.epsilon = epsilon  # Exploration rate
        self.epsilon_min = epsilon_min
        self.epsilon_decay = epsilon_decay
        self.batch_size = batch_size
        self.target_update_iter = target_update_iter
        self.model = DQNModel(observation_space, action_space, learning_rate)
        self.target_model = DQNModel(observation_space, action_space, learning_rate)
        self.target_model.set_weights(self.model.get_weights())
        self.iteration = 0

    def remember(self, state, action, reward, next_state, done):
        self.memory.push(state, action, reward, next_state, done)

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_space)
        q_values = self.model.predict(np.array([state]))
        return np.argmax(q_values[0])

    def replay(self):
        if len(self.memory) < self.batch_size:
            return
        batch = self.memory.sample(self.batch_size)
        states, actions, rewards, next_states, dones = map(np.array, zip(*batch))

        q_update = rewards + self.gamma * np.amax(self.target_model.predict(next_states), axis=1) * (1 - dones)
        q_values = self.model.predict(states)
        for i, action in enumerate(actions):
            q_values[i][action] = q_update[i]

        self.model.fit(states, q_values, batch_size=self.batch_size, verbose=0)

        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

        self.iteration += 1
        if self.iteration % self.target_update_iter == 0:
            self.update_target_model()

    def update_target_model(self):
        self.target_model.set_weights(self.model.get_weights())

    def save_model(self, file_name):
        self.model.save_weights(file_name)





# Trading Environment

In [5]:
import gym

ACT_LONG = 1
ACT_SHORT = -1
ACT_HOLD = 0

class TradingEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self, data, startingDate, splitingDate, stock_df=TARGET, money=CAPITAL, stateLength=STATE_LEN, transactionCosts=FEES, endingDate=END_DATE):
        super(TradingEnv, self).__init__()

        self.stock_df = stock_df
        self.data = data
        self.initial_balance = money
        self.state_length = stateLength
        self.transaction_cost = transactionCosts
        self.startingDate = startingDate
        self.splitingDate = splitingDate

        self.balance = money
        self.position = 0
        self.total_shares = 0
        self.current_step = stateLength

        self.action_space = gym.spaces.Discrete(3) # 3 actions we can take: Long, Short, Hold
        self.observation_space = gym.spaces.Box(low=ACT_SHORT, high=ACT_LONG, shape=(stateLength, 4), dtype=np.float32)

    def reset(self):
        self.balance = self.initial_balance
        self.position = 0
        self.total_shares = 0
        self.current_step = self.state_length
        return self._next_observation()

    def _next_observation(self):
        frame = self.data.iloc[self.current_step-self.state_length:self.current_step]
        obs = frame[['Close', 'Low', 'High', 'Volume']].values
        return obs

    def step(self, action):
        current_price = self.data.iloc[self.current_step]['Close']
        self.current_step += 1
        done = self.current_step >= len(self.data)
        reward = 0

        if action == ACT_SHORT:
            if self.position > 0:
                # We short or sell
                self.balance += self.total_shares * current_price * (1 - self.transaction_cost)
                reward = self.balance - self.initial_balance
                self.total_shares = 0
            self.position = -1
        elif action == ACT_LONG:
            # We long or close the short
            if self.position < 0:
                self.balance += self.total_shares * current_price * (1 - self.transaction_cost)
                reward = self.balance - self.initial_balance
                self.total_shares = 0
            self.position = 1
            self.total_shares = self.balance // (current_price * (1 + self.transaction_cost))
            self.balance -= self.total_shares * current_price * (1 + self.transaction_cost)
        # For action == ACT_HOLD Just HODL!

        next_obs = self._next_observation() if not done else np.zeros(self.observation_space.shape)

        return next_obs, reward, done, {}

    def render(self, mode='human', close=False):
        print(f'Step: {self.current_step}, Balance: {self.balance}')

# Trading Operations

In [6]:
class TradingSimulator:
    def __init__(self, startingDate, endingDate, splitingDate, observationSpace, actionSpace, money, stateLength, transactionCosts, numberOfEpisodes, stock_df):
        # Setup environment and agent without starting training
        self.env = TradingEnv(data=stock_df, startingDate=startingDate, splitingDate=splitingDate, money=money, stateLength=stateLength, transactionCosts=transactionCosts)
        self.agent = DQNAgent(observation_space=observationSpace, action_space=actionSpace)
        self.episodes = numberOfEpisodes

    def preprocess_data(self, data):
        return data.flatten()

    def simulateNewStrategy(self):
        for episode in range(self.episodes):
            state = self.env.reset()
            done = False
            total_reward = 0
            state = self.preprocess_data(state)

            while not done:
                action = self.agent.act(np.array([state]))
                next_state, reward, done, _ = self.env.step(action)
                next_state = self.preprocess_data(next_state)

                self.agent.remember(state, action, reward, next_state, done)
                state = next_state
                total_reward += reward

                self.agent.replay()

            if episode % self.agent.target_update_iter == 0:
                self.agent.update_target_model()

            print(f'Episode: {episode+1}, Total Reward: {total_reward}')
            if (episode + 1) % 10 == 0:
                self.agent.save_model(f'dqn_model_{episode + 1}.h5')

    def simulateExistingStrategy(self, model_path):
        if os.path.isfile(model_path):
            self.agent.model.load_weights(model_path)
        else:
            raise SystemError("The trading strategy specified does not exist, please provide a valid one.")


In [7]:
model_path = f"./strats/dqn_{TARGET}_{START_DATE}_{SPLIT_DATE}"
if not os.path.exists(model_path):
    simulator = TradingSimulator(START_DATE, END_DATE, SPLIT_DATE, OBS_SPACE, ACT_SPACE, CAPITAL, STATE_LEN, FEES, EPISODES, tickers[TARGET])
    simulator.simulateNewStrategy()
else:
    simulator = TradingSimulator(START_DATE, END_DATE, SPLIT_DATE, OBS_SPACE, ACT_SPACE, CAPITAL, STATE_LEN, FEES, EPISODES, tickers[TARGET])
    simulator.simulateExistingStrategy(model_path)





In [None]:
simulator

# Conclusion

CONCLUDE

## References

- [TensorFlow Agents](https://www.tensorflow.org/agents/overview)
- [Open Gym AI Github](https://github.com/openai/gym)
- [Greg et al, OpenAI Gym, (2016)](https://arxiv.org/abs/1606.01540)
- [Théate, Thibaut, and Damien Ernst. "An application of deep reinforcement learning to algorithmic trading." Expert Systems with Applications 173 (2021): 114632.](https://www.sciencedirect.com/science/article/pii/S0957417421000737)

## Github

Article here is also available on [Github](https://github.com/adamd1985/pairs_trading_unsupervised_learning)

Kaggle notebook available [here](https://www.kaggle.com/code/addarm/unsupervised-learning-as-signals-for-pairs-trading)

## Media

All media used (in the form of code or images) are either solely owned by me, acquired through licensing, or part of the Public Domain and granted use through Creative Commons License.

## CC Licensing and Use

<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.