<h1> Reinforcement Learning Trading </h1>

<h2>Key Components: </h2>
<ul>
    <li><strong>Agent:</strong>The trading algorithm that interacts with the environment.</li>
    <li><strong>Environment:</strong>The market (historical price data or real-time trading platform</li>
    <li><strong>Actions:</strong>The decisions the agent can make, such as buying, selling, or holding an asset.</li>
    <li><strong>Rewards:</strong>The feedback the agent receives based on its actions, such as profits (positive reward) or losses (negative reward)</li>
    <li><strong>Policy:</strong>The strategy the agent uses to decide which action to take at each step.</li>
</ul>

<h2>Advancements in Trading</h2>
<ul>
    <li><strong>Adaptability: </strong>The RL agent can adapt to market changes and learn from patterns in historical or real-time data.</li>
    <li><strong>Automation: </strong>RL automates decision-making, allowing it to handle complex and high-frequency trading strategies.</li>
    <li><strong>Optimization: </strong>It can optimize for long-term returns and other performance metrics like the Sharpe ratio or risk-adjusted returns.</li>
</ul>

In [1]:
import yfinance as yf
from datetime import datetime, timedelta

# Calculate 15 years ago from today
today = datetime.now()
fifteen_years_ago = today - timedelta(days=15*365.25)  # Account for leap years
yesterday = datetime.now() - timedelta(days=1)

data = yf.download('^GSPC', start=fifteen_years_ago, end=yesterday)

data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2009-10-01,1054.910034,1054.910034,1029.449951,1029.849976,1029.849976,5791450000
2009-10-02,1029.709961,1030.599976,1019.950012,1025.209961,1025.209961,5583240000
2009-10-05,1026.869995,1042.579956,1025.920044,1040.459961,1040.459961,4313310000
2009-10-06,1042.02002,1060.550049,1042.02002,1054.719971,1054.719971,5029840000
2009-10-07,1053.650024,1058.02002,1050.099976,1057.579956,1057.579956,4238220000


In [2]:
data.index

DatetimeIndex(['2009-10-01', '2009-10-02', '2009-10-05', '2009-10-06',
               '2009-10-07', '2009-10-08', '2009-10-09', '2009-10-12',
               '2009-10-13', '2009-10-14',
               ...
               '2024-09-16', '2024-09-17', '2024-09-18', '2024-09-19',
               '2024-09-20', '2024-09-23', '2024-09-24', '2024-09-25',
               '2024-09-26', '2024-09-27'],
              dtype='datetime64[ns]', name='Date', length=3773, freq=None)

<h2>Forward Fill Missing Values (weekends & holidays)</h2>

In [3]:
import pandas as pd
data.index = pd.to_datetime(data.index)
data.index

DatetimeIndex(['2009-10-01', '2009-10-02', '2009-10-05', '2009-10-06',
               '2009-10-07', '2009-10-08', '2009-10-09', '2009-10-12',
               '2009-10-13', '2009-10-14',
               ...
               '2024-09-16', '2024-09-17', '2024-09-18', '2024-09-19',
               '2024-09-20', '2024-09-23', '2024-09-24', '2024-09-25',
               '2024-09-26', '2024-09-27'],
              dtype='datetime64[ns]', name='Date', length=3773, freq=None)

In [4]:
date_range = pd.date_range(start=data.index.min(), end=data.index.max())
missing_dates = date_range.difference(data.index)
print(f"Missing Dates: {missing_dates}")

Missing Dates: DatetimeIndex(['2009-10-03', '2009-10-04', '2009-10-10', '2009-10-11',
               '2009-10-17', '2009-10-18', '2009-10-24', '2009-10-25',
               '2009-10-31', '2009-11-01',
               ...
               '2024-08-25', '2024-08-31', '2024-09-01', '2024-09-02',
               '2024-09-07', '2024-09-08', '2024-09-14', '2024-09-15',
               '2024-09-21', '2024-09-22'],
              dtype='datetime64[ns]', length=1703, freq=None)


In [5]:
# Reindex the DataFrame to include all the dates
data_full = data.reindex(date_range)

# Fill missing values with forward fill (propogating the last known value)
data_full.fillna(method='ffill', inplace=True)

missing_values = data_full.isnull().sum()
print(f"Missing values after forward fill: \n{missing_values}")

Missing values after forward fill: 
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


  data_full.fillna(method='ffill', inplace=True)


In [6]:
date_range = pd.date_range(start=data_full.index.min(), end=data_full.index.max())
missing_dates = date_range.difference(data_full.index)
print(f"Missing Dates: {missing_dates}")

Missing Dates: DatetimeIndex([], dtype='datetime64[ns]', freq='D')


In [7]:
data_full.head(30)

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
2009-10-01,1054.910034,1054.910034,1029.449951,1029.849976,1029.849976,5791450000.0
2009-10-02,1029.709961,1030.599976,1019.950012,1025.209961,1025.209961,5583240000.0
2009-10-03,1029.709961,1030.599976,1019.950012,1025.209961,1025.209961,5583240000.0
2009-10-04,1029.709961,1030.599976,1019.950012,1025.209961,1025.209961,5583240000.0
2009-10-05,1026.869995,1042.579956,1025.920044,1040.459961,1040.459961,4313310000.0
2009-10-06,1042.02002,1060.550049,1042.02002,1054.719971,1054.719971,5029840000.0
2009-10-07,1053.650024,1058.02002,1050.099976,1057.579956,1057.579956,4238220000.0
2009-10-08,1060.030029,1070.670044,1060.030029,1065.47998,1065.47998,4988400000.0
2009-10-09,1065.280029,1071.51001,1063.0,1071.48999,1071.48999,3763780000.0
2009-10-10,1065.280029,1071.51001,1063.0,1071.48999,1071.48999,3763780000.0


<h2>Pre-process data</h2>
<ul>
    <li>Normalize</li>
    <li>Extract relevant features (Close and Volume) No additional Bias</li>
</ul>

In [8]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data_full[['Close', 'Volume']])

<h2>Interact with the Environment</h2>

In [10]:
import gymnasium as gym
from gymnasium import spaces
import numpy as np

class TradingEnv(gym.Env):
    def __init__(self, data, initial_balance=50000):
        super(TradingEnv, self).__init__()
        
        self.data = data
        self.current_step = 0
        self.initial_balance = initial_balance 
        self.balance = initial_balance
        self.shares_held = 0
        self.portfolio_value = self.balance
        
        self.action_space = spaces.Discrete(3)  # Buy, Hold, Sell
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(data.shape[1],), dtype=np.float32)
        
    def reset(self):
        self.current_step = 0
        self.balance = self.initial_balance
        self.share_held = 0
        self.portfolio_value = self.balance
        return self.data[self.current_step]

    def step(self, action):
        done = False
        # Action: 0 = Hold, 1 = Buy, 2 = Sell
        current_price = self.data[self.current_step, 3] # Assuming 'Close' price is the fourth column

        # Action Logic
        if action == 1: # Buy
            if self.balance > 0: # Buy if you have cash available
                self.shares_held = self.balance / current_price
                self.balance = 0 # Cash becomes 0 after buying
        elif action == 2: # Sell
            if self.shares_held > 0: # Sell if you hold shares
                self.balance = self.shares_held * currrent_price
                self.shares_held = 0 # No more shares held after selling

        # Update Portfolio Value
        self.portfolio_value = self.balance + self.shares_held * current_price

        # Reward is based on the portfolio value increase
        reward = self.portfolio_value - self.initial_balance

        # Move to the next step
        self.current_step += 1
                
        # End of data
        if self.current_step >= len(self.data) - 1:
            done = True
            
        return self.data[self.current_step], reward, done, {}

    def render(self):
        # This can be customized to display the portfolio performance over time
        print(f'Step: {self.current_step}')
        print(f'Portfolio Value: {self.portfolio_value}')
        print(f'Shares Held: {self.shares_held}')
        print(f'Cash Balance: {self.balance}')

<h2>Proximal Policy Optimization PPO</h2>
<h3>Stable-Baseline3</h3>

In [11]:
from stable_baselines3 import PPO

env = TradingEnv(data_scaled)
model = PPO("MlpPolicy", env, verbose=1)


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Users/codymckeon/anaconda3/envs/rl_trading/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/codymckeon/anaconda3/envs/rl_trading/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/codymckeon/anaconda3/envs/rl_trading/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/Users/codymckeon/anaconda3/envs/rl_trading/lib/python3.12/site-packages/trait

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


In [None]:
model.learn(total_timesteps=100000)
model.save("ppo_trading_model")

<h2>Analyzing the Compute Power</h2>

In [12]:
# Check the size of your dataset
print(f"Number of rows: {data_scaled.shape[0]}")
print(f"Number of columns: {data_scaled.shape[1]}")

Number of rows: 5476
Number of columns: 2


In [13]:
size = 5476 * 2
size

10952

In [16]:
print(data.info(memory_usage='deep'))

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3773 entries, 2009-10-01 to 2024-09-27
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       3773 non-null   float64
 1   High       3773 non-null   float64
 2   Low        3773 non-null   float64
 3   Close      3773 non-null   float64
 4   Adj Close  3773 non-null   float64
 5   Volume     3773 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 335.4 KB
None
