# Cody Nichols' COMP 6970 Final Project
This will be where I test and compile my final results from my findings.

- Paper 1
    - ~~Read Paper~~
    - ~~Implement Bullish/Bearish Classification with NN Structure~~
    - ~~Implement Candlestick Classification with NN Structure~~
- Paper 2
    - ~~Read Paper~~
    - ~~Implement MS-CNN Architecture~~
    - ~~Create DDQN Architecture~~
- ~~Paper 3 (Time Permitting) (Decided Not To Do After Finishing Paper 2 Because of Complexity of Paper 2)~~
- Complete Project
    - Combine Code (in this ipynb)
        - Implement Dataset Curation
        - Create my own version of Paper 1
        - Create my own version of Paper 2
        - Combine the outputs of Paper 1 and 2's vision components as inputs for Paper 2's DDQN
    - Write Paper
    - Create Presentation
    - Make Recording

# Paper 1 Rendition
Paper 1 (Empowering Financial Technical Analysis Using Computer Vision Techniques) creates a CNN architecture that classifies candlestick charts in 2 ways. The first classification is a binary classification of bullish or bearish for a 20-candlestick image. The second classification is a multi-class classification of candlestick pattern type based on each individual candlestick and some surrounding candlesticks. The CNN architecture is 14-layer CNN with a 8 convolution layers, 3 max pooling layers, a VGG-16 pretrained preprocessing layer, a fully connected layer, and an output layer. This architecture was implemented in paper1.ipynb for the binary classification task, but was not used for the multi-class classification task. The reasoning behind not using it for the multi-class classification task was because of the ambiguity on how the model was trained. The assumed training method was dividing up the candlestick charts into smaller images to make classification on each candlestick and labeling each image easier, but there were some issues with the images I created not being big enough to make it all the way through without needing to size them back up. 

In this notebook, I will perform the same classification tasks with the knowledge that I have gained from implementing the author's solution to achieve the best accuracy on the binary classification and the best per-class accuracy for the multi-class classifcation. This per-class accuracy is because I will be having "None" labels in my candlesticks (unlike the author of Paper 1) because most candlesticks in candlestick charts do not have a pattern.

In [33]:
# Imports

import os
import cv2
import math
import torch
import random
import numpy as np
import pandas as pd
from collections import deque
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.optim as optim
from torchvision import models, transforms
from torch.utils.data import random_split, DataLoader, Dataset
from PIL import Image
from pyts.image import GramianAngularField
from tqdm import tqdm
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from datetime import datetime, timedelta
import torch.nn.functional as F
from sklearn.metrics import confusion_matrix
from torchvision.transforms import ToTensor
from torch.utils.data import Subset
from sklearn.preprocessing import StandardScaler

In [34]:
# Device Checking

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("Number of GPUs:", torch.cuda.device_count())
if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Device being used:", device)

PyTorch version: 2.5.1+cu118
CUDA available: True
CUDA version: 11.8
Number of GPUs: 1
GPU 0: NVIDIA GeForce GTX 1660 Ti
Device being used: cuda


# Paper 2

In [35]:
class AdvancedDataset(Dataset):
    def __init__(self, file_path=None, daily_data=None, window_size=20, scaler=None):
        """
        Initializes the AdvancedDataset.

        Args:
            file_path (str, optional): Path to the CSV file containing the data.
            daily_data (list of pd.DataFrame, optional): Pre-split list of daily DataFrames.
            window_size (int, optional): The size of the window for state representation. Defaults to 20.
            scaler (StandardScaler, optional): A pre-fitted StandardScaler instance.
        """
        self.window_size = window_size
        self.scaler = scaler  # Assign the scaler

        if daily_data is not None:
            # Initialize with pre-split daily data
            self.daily_data = daily_data
            # If a scaler is provided, use it. Otherwise, raise an error
            if self.scaler is None:
                raise ValueError("Scaler must be provided when initializing with daily_data.")
        elif file_path is not None:
            # Load and preprocess the data from the CSV file
            self.data = pd.read_csv(file_path, parse_dates=["datetime"])

            # Ensure all required columns are present
            required_columns = [
                "datetime", "open", "high", "low", "close", "volume",
                "RSI_20", "Bollinger_High", "Bollinger_Low", "Bollinger_Middle",
                "OBV", "ATR_20", "Stochastic_%K", "Stochastic_%D",
                "VWAP", "SMA_7", "SMA_20", "EMA_7", "EMA_20"
            ]
            missing_columns = set(required_columns) - set(self.data.columns)
            if missing_columns:
                raise ValueError(f"Missing columns in the data: {missing_columns}")

            # Extract the date component
            self.data['date'] = self.data['datetime'].dt.date

            # Handle missing values using ffill and bfill to avoid FutureWarning
            self.data = self.data.ffill().bfill()

            # Group the data by date
            self.grouped = self.data.groupby('date')

            # Split data into days, ensuring each day has at least `window_size` entries
            self.daily_data = []
            for date, group in self.grouped:
                if len(group) >= self.window_size:
                    self.daily_data.append(group.reset_index(drop=True))
            
            # Initialize the scaler if not provided
            if self.scaler is None:
                # Exclude 'datetime' and 'date' from scaling
                feature_columns = [
                    "open", "high", "low", "close", "volume",
                    "RSI_20", "Bollinger_High", "Bollinger_Low", "Bollinger_Middle",
                    "OBV", "ATR_20", "Stochastic_%K", "Stochastic_%D",
                    "VWAP", "SMA_7", "SMA_20", "EMA_7", "EMA_20"
                ]
                self.scaler = StandardScaler()
                self.scaler.fit(self.data[feature_columns].values)

        else:
            raise ValueError("Either 'file_path' or 'daily_data' must be provided.")

    def __len__(self):
        return len(self.daily_data)

    def __getitem__(self, idx):
        """
        Retrieves the item at the given index or slice.

        Args:
            idx (int or slice): The index or slice to retrieve.

        Returns:
            dict or AdvancedDataset: If idx is an int, returns a dictionary with processed data.
                                      If idx is a slice, returns a new AdvancedDataset instance.
        """
        if isinstance(idx, slice):
            # Handle slicing by returning a new AdvancedDataset instance with sliced daily_data and the same scaler
            sliced_data = self.daily_data[idx]
            return AdvancedDataset(daily_data=sliced_data, window_size=self.window_size, scaler=self.scaler)
        elif isinstance(idx, int):
            # Handle single index access
            day_data = self.daily_data[idx]

            # Select all 18 numerical features (excluding 'datetime' and 'date')
            feature_columns = [
                "open", "high", "low", "close", "volume",
                "RSI_20", "Bollinger_High", "Bollinger_Low", "Bollinger_Middle",
                "OBV", "ATR_20", "Stochastic_%K", "Stochastic_%D",
                "VWAP", "SMA_7", "SMA_20", "EMA_7", "EMA_20"
            ]
            raw_data = day_data[feature_columns].values  # Shape: [num_entries, 18]

            # Normalize the data
            normalized_data = self._normalize(raw_data)  # Shape: [num_entries, 18]

            states = []
            for i in range(len(normalized_data) - self.window_size + 1):
                state = normalized_data[i : i + self.window_size]  # Shape: [window_size, 18]
                state_tensor = torch.tensor(state, dtype=torch.float32).transpose(0, 1)  # Shape: [18, window_size]
                state_tensor = state_tensor.unsqueeze(0).unsqueeze(0)  # Shape: [1, 1, 18, window_size]
                states.append(state_tensor)
            
            return {
                'states': states,  # List of tensors, each of shape [1, 1, 18, window_size]
                'raw_data': raw_data,  # Raw numerical data
                'datetimes': day_data['datetime'].values[self.window_size - 1 :],
                'dates': day_data['date'].iloc[0]
            }
        else:
            raise TypeError("Invalid index type. Must be int or slice.")

    def _normalize(self, data):
        """
        Normalizes the data using standard scaling (zero mean, unit variance).

        Args:
            data (np.ndarray): The raw OHLCV and technical indicators data. Shape: [num_entries, 18]

        Returns:
            np.ndarray: The normalized data. Shape: [num_entries, 18]
        """
        # Apply standard scaling (zero mean, unit variance)
        return self.scaler.transform(data)

In [36]:
# Single Scale 1D Convolution

class SingleScale1D(nn.Module):
    def __init__(self, input_channels=18, output_channels=5, kernel_size=3):
        super(SingleScale1D, self).__init__()
        self.conv1d = nn.Conv1d(in_channels=input_channels, out_channels=output_channels, kernel_size=kernel_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv1d(x)
        x = self.relu(x)
        return x

In [37]:
# 3x3 2D Convolution

class ThreeByThreeConv2D(nn.Module):
    def __init__(self, input_channels=1, output_channels=2, kernel_size=3):
        super(ThreeByThreeConv2D, self).__init__()
        self.conv2d = nn.Conv2d(in_channels=input_channels, out_channels=output_channels, kernel_size=kernel_size, padding=1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv2d(x)
        x = self.relu(x)
        return x

In [38]:
# 5x5 2D Convolution

class FiveByFiveConv2D(nn.Module):
    def __init__(self, input_channels=1, output_channels=1, kernel_size=5):
        super(FiveByFiveConv2D, self).__init__()
        self.conv2d = nn.Conv2d(in_channels=input_channels, out_channels=output_channels, kernel_size=kernel_size, padding=2)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv2d(x)
        x = self.relu(x)
        return x

In [39]:
# Multi-scale Network

class MultiScaleNet(nn.Module):
    def __init__(self, height=18, width=20):
        super(MultiScaleNet, self).__init__()
        self.single_scale_1d = SingleScale1D(input_channels=18, output_channels=5, kernel_size=3)
        self.three_by_three = ThreeByThreeConv2D(input_channels=1, output_channels=2, kernel_size=3)
        self.five_by_five = FiveByFiveConv2D(input_channels=1, output_channels=1, kernel_size=5)
        self.height = height
        self.width = width

    def forward(self, x):
        batch_size, channels, height, width = x.size()
        
        # Assuming height=18 (features) and width=sequence_length
        x_1d = x.view(batch_size, self.height, self.width)  # [batch, 18, width]
        features_1d = self.single_scale_1d(x_1d)            # [batch, 5, width]

        # Aggregate along the time dimension
        features_1d = features_1d.mean(dim=2, keepdim=True)  # [batch, 5, 1]
        features_1d = features_1d.view(batch_size, 5, 1, 1) # [batch, 5, 1, 1]
        features_1d = F.interpolate(features_1d, size=(self.height, self.width), mode='bilinear', align_corners=False)  # [batch, 5, height, width]

        features_3x3 = self.three_by_three(x)               # [batch, 2, height, width]
        features_5x5 = self.five_by_five(x)                 # [batch, 1, height, width]

        combined_features = torch.cat((features_1d, features_3x3, features_5x5), dim=1)  # [batch, 8, height, width]
        
        return combined_features

In [40]:
# Efficient Channel Attention (ECA) Block

class ECABlock(nn.Module):
    def __init__(self, channels, gamma=2, b=1):
        super(ECABlock, self).__init__()
        t = int(abs((math.log(channels, 2) + b) / gamma))
        k = t if t % 2 else t + 1
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv = nn.Conv1d(1, 1, kernel_size=k, padding=(k - 1) // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        y = self.avg_pool(x)                             # [batch, channels, 1, 1]
        y = y.squeeze(-1).transpose(-1, -2)             # [batch, 1, channels]
        y = self.conv(y)                                 # [batch, 1, channels]
        y = self.sigmoid(y)                              # [batch, 1, channels]
        y = y.transpose(-1, -2).unsqueeze(-1)           # [batch, channels, 1]
        return x * y.expand_as(x)


In [41]:
# Backbone Network

class Backbone(nn.Module):
    def __init__(self, input_channels=8):
        super(Backbone, self).__init__()
        self.conv1 = nn.Conv2d(input_channels, 64, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(64)
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.eca = ECABlock(64)

    def forward(self, x):
        x = self.conv1(x)     # [batch, 64, H, W]
        x = self.bn1(x)
        x = F.relu(x)
        x = self.maxpool(x)   # [batch, 64, H/2, W/2]
        x = self.conv2(x)     # [batch, 64, H/2, W/2]
        x = self.bn2(x)
        x = F.relu(x)
        x = self.eca(x)       # [batch, 64, H/2, W/2]
        return x

In [42]:
# Combined Network with Backbone

class MSNetWithBackbone(nn.Module):
    def __init__(self):
        super(MSNetWithBackbone, self).__init__()
        self.multi_scale_net = MultiScaleNet()
        self.backbone = Backbone(input_channels=8)  # 5 + 2 + 1 from MultiScaleNet

    def forward(self, x):
        x = self.multi_scale_net(x)   # [batch, 8, H, W]
        x = self.backbone(x)         # [batch, 64, H/2, W/2]
        return x

In [43]:
# Final Network with Q-Value Output

class MSNetWithQValue(nn.Module):
    def __init__(self, num_actions=3):
        super(MSNetWithQValue, self).__init__()
        self.multi_scale_with_backbone = MSNetWithBackbone()

        self.conv_final = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.flatten = nn.Flatten()
        
        # Determine the size after convolution
        sample_input = torch.randn(1, 1, 18, 20)  # [batch, channels, height=18, width=20]
        sample_output = self.multi_scale_with_backbone(sample_input)
        sample_output = self.conv_final(sample_output)
        sample_output = F.relu(sample_output)
        feature_map_size = sample_output.view(-1).size(0)

        self.fc1 = nn.Linear(feature_map_size, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, num_actions)

    def forward(self, x):
        x = self.multi_scale_with_backbone(x)  # [batch, 64, H/2, W/2]
        x = self.conv_final(x)                 # [batch, 64, H/2, W/2]
        x = F.relu(x)
        x = self.flatten(x)                    # [batch, 64 * (H/2) * (W/2)]
        x = self.relu(self.fc1(x))            # [batch, 128]
        x = self.fc2(x)                        # [batch, num_actions]
        return x

In [44]:
# Create MS-CNN and ensure the output shape is the correct size (3 Q values: 1 for each action)

model = MSNetWithQValue(num_actions=3)
test_input = torch.randn(1, 1, 18, 20)
output = model(test_input)
print("Output shape:", output.shape)

Output shape: torch.Size([1, 3])


In [45]:
class ReplayBuffer:
    # Initialize buffer as a double ended queue
    def __init__(self, capacity):
        self.buffer = deque(maxlen=capacity)

    # Push a state, action, reward, next_state, done tuple on the buffer
    def push(self, state, action, reward, next_state, done):
        self.buffer.append((state, action, reward, next_state, done))

    # Randomly sample
    def sample(self, batch_size):
        batch = random.sample(self.buffer, batch_size)
        state, action, reward, next_state, done = zip(*batch)
        state_batch = torch.cat(state, dim=0)
        next_state_batch = torch.cat(next_state, dim=0)
        return (
            state_batch,
            torch.tensor(action, dtype=torch.long),
            torch.tensor(reward, dtype=torch.float32),
            next_state_batch,
            torch.tensor(done, dtype=torch.float32),
        )

    # Override buffer's length method and return length
    def __len__(self):
        return len(self.buffer)

In [46]:
class DDQNAgent:
    def __init__(self, buffer_capacity, gamma=0.99, lr=1e-3, target_update_freq=100):
        self.actor = MSNetWithQValue()
        self.target = MSNetWithQValue()
        self.target.load_state_dict(self.actor.state_dict())
        self.target.eval()

        self.buffer = ReplayBuffer(buffer_capacity)
        self.optimizer = optim.Adam(self.actor.parameters(), lr=lr)
        self.gamma = gamma
        self.target_update_freq = target_update_freq
        self.steps = 0
        self.epsilon = 1.0
        self.epsilon_min = 0.05
        self.epsilon_decay = 0.995

        self.position = 'sold_out'

    # Select an action based on current state
    def select_action(self, state):
        # Exploration block. 
        # If bought in and told to hold or buy randomly, hold. 
        # If sold out and told to hold or sell randomly, sell.
        if random.random() < self.epsilon:
            random_action = random.randint(0,2)
            if self.position == 'bought_in':
                if random_action in [0, 1]:
                    return 0
                else:
                    return 2
            else:
                if random_action in [0, 2]:
                    return 0
                else:
                    return 1
        # Eploitation block.
        # Select action with highest Q value
        else:
            with torch.no_grad():
                q_values = self.actor(state)
                max_q, max_action = q_values.max(dim=1)
                max_action = max_action.item()
                
                if self.position == 'bought_in':
                    if max_action in [0, 1]:
                        return 0
                    else:
                        return 2
                else:
                    if max_action in [0, 2]:
                        return 0
                    else:
                        return 1

        return 0

    def train(self, batch_size):
        if len(self.buffer) < batch_size:
            return

        state_batch, action_batch, reward_batch, next_state_batch, done_batch = self.buffer.sample(batch_size)

        with torch.no_grad():
            next_q_values = self.actor(next_state_batch)
            next_actions = next_q_values.argmax(dim=1)
            target_q_values = self.target(next_state_batch)
            target_q = reward_batch + self.gamma * (1 - done_batch) * target_q_values[range(batch_size), next_actions]

        current_q = self.actor(state_batch)[range(batch_size), action_batch]

        loss = F.mse_loss(current_q, target_q)

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        if self.steps % self.target_update_freq == 0:
            self.target.load_state_dict(self.actor.state_dict())

        self.epsilon = max(self.epsilon_min, self.epsilon * self.epsilon_decay)
        self.steps += 1

    def store_transition(self, state, action, reward, next_state, done):
        self.buffer.push(state, action, reward, next_state, done)

    def eval_mode(self):
        self.actor.eval()
        self.target.eval()
        self.epsilon = 0.0

In [47]:
class StockTradingEnvWithFeatures:
    def __init__(self, daily_dataset, k=5, testing=False):
        self.daily_dataset = daily_dataset
        self.current_day = 0
        self.num_days = len(daily_dataset)
        self.k = k
        self.testing = testing

        self.current_step = 0
        self.done = False
        self.price_history = []
        self.initial_cash = 500000
        self.cash_balance = self.initial_cash
        self.total_asset_value = self.cash_balance
        self.previous_total_asset_value = self.total_asset_value
        self.trade_log = []
        self.cost_basis = 0.0
        self.shares_held = 0

        self.reset()

    def reset(self):
        self.testing = False
        self.current_step = 0
        self.done = False
        self.price_history = []
        self.initial_cash = 500000
        self.cash_balance = self.initial_cash
        self.total_asset_value = self.cash_balance
        self.previous_total_asset_value = self.total_asset_value
        self.trade_log = []
        self.cost_basis = 0.0
        self.shares_held = 0

        if self.current_day >= self.num_days:
            self.done = True
            return None

        day_data = self.daily_dataset[self.current_day]
        self.states = day_data['states']
        self.raw_data = day_data['raw_data']
        self.datetimes = day_data['datetimes']
        self.window_size = self.states[0].shape[-1]
        self.num_steps = len(self.states)
        self.raw_close_prices = self.raw_data[self.window_size - 1 :, 3]

        self.current_day += 1

        return self._get_state()

    def _get_state(self):
        if self.current_step >= self.num_steps:
            self.done = True
            return None

        state = self.states[self.current_step]
        current_close_price = self.raw_data[self.current_step + self.window_size - 1, 3]
        self.price_history.append(current_close_price)
        return state

    def step(self, action):
        if self.current_step + 1 < self.num_steps:
            state = self.states[self.current_step]
            current_close_price = self.raw_close_prices[self.current_step]
            next_close_price = self.raw_close_prices[self.current_step + 1]
            datetime = self.datetimes[self.current_step]

            datetime = self.datetimes[self.current_step]

            profit = 0

            if action == 1:
                if current_close_price == 0:
                    num_shares = 0
                else:
                    num_shares = int(self.cash_balance / current_close_price)
                total_cost = num_shares * current_close_price
                self.cash_balance -= total_cost
                self.shares_held += num_shares
                self.cost_basis += total_cost

            elif action == 2:
                num_shares = self.shares_held
                total_proceeds = num_shares * current_close_price
                self.cash_balance += total_proceeds
                self.shares_held = 0
                profit = total_proceeds - self.cost_basis
                self.cost_basis = 0
            
            else:
                profit = 0.0

            self.previous_total_asset_value = self.total_asset_value
            self.total_asset_value = self.cash_balance + self.shares_held * current_close_price

            profit = self.total_asset_value - self.previous_total_asset_value

            if self.testing:
                reward = 0
            else:
                reward = self.calculate_reward(action, current_close_price)

            action_mapping = {0: 'Hold', 1: 'Buy', 2: 'Sell'}
            trade_info = {
                'datetime': datetime,
                'action': action_mapping[action],
                'reward': reward,
                'profit': profit,
                'total_asset_value': self.total_asset_value,
                'cash_balance': self.cash_balance,
                'shares_held': self.shares_held,
                'position': 'bought_in' if self.shares_held > 0 else 'sold_out'
            }
            self.trade_log.append(trade_info)

            self.current_step += 1
            done = self.current_step >= self.num_steps - 1
            return self._get_state(), reward, done, {}
        else:
            self.done = True
            return None, 0, True, {}

    def calculate_reward(self, action, current_close_price):
        POS_t = 1 if self.shares_held > 0 else 0

        R_k_t = []
        for i in range(1, self.k + 1):
            future_step = self.current_step + i
            if future_step >= self.num_steps:
                break
            future_price = self.raw_close_prices[future_step]
            if current_close_price == 0:
                R_k_t_i = 0
            else:
                R_k_t_i = (future_price - current_close_price) / current_close_price
            R_k_t.append(R_k_t_i)

        if len(R_k_t) < 2:
            SR_t = 0
        else:
            mean_R = np.mean(R_k_t)
            std_R = np.std(R_k_t)
            if std_R == 0:
                SR_t = 0
            else:
                SR_t = mean_R / std_R

        SSR_t = POS_t * SR_t

        return SSR_t

In [48]:
file_path = "./labeling/TQQQ_minute_data_cleaned_advanced.csv"
dataset = AdvancedDataset(file_path)
num_days = len(dataset)

train_size = int(0.8 * num_days)
test_size = num_days - train_size

train_dataset = dataset[:train_size]
test_dataset = dataset[train_size:]

print(f"Training Days: {len(train_dataset)}")
print(f"Testing Days: {len(test_dataset)}")

env_train = StockTradingEnvWithFeatures(train_dataset)
env_test = StockTradingEnvWithFeatures(test_dataset, testing=True)

# Initialize agent
agent = DDQNAgent(buffer_capacity=10000)
batch_size = 32

train_log_dir = "./training_logs/"
test_log_dir = "./testing_logs/"
os.makedirs(train_log_dir, exist_ok=True)
os.makedirs(test_log_dir, exist_ok=True)

Training Days: 369
Testing Days: 93


In [None]:
'''
Training Loop
'''
for day in range(len(train_dataset)):
    state = env_train.reset()
    done = False
    total_reward = 0

    while not done and state is not None:
        action = agent.select_action(state)

        next_state, reward, done, _ = env_train.step(action)

        agent.store_transition(state, action, reward, next_state, done)

        agent.train(batch_size)

        if action == 1:
            agent.position = 'bought_in'
        elif action == 2:
            agent.position = 'sold_out'

        state = next_state
        total_reward += reward

    if env_train.shares_held > 0:
        last_close_price = env_train.raw_data[env_train.current_step + env_train.window_size - 1, 3]
        total_proceeds = env_train.shares_held * last_close_price
        profit = total_proceeds - env_train.cost_basis
        env_train.cash_balance += total_proceeds
        env_train.shares_held = 0
        env_train.cost_basis = 0.0
        env_train.total_asset_value = env_train.cash_balance

        trade_info = {
            'datetime': env_train.datetimes[-1],
            'action': 'Sell (EOD)',
            'reward': 0.0,
            'profit': profit,
            'total_asset_value': env_train.total_asset_value,
            'cash_balance': env_train.cash_balance,
            'shares_held': env_train.shares_held,
            'position': 'sold_out'
        }
        env_train.trade_log.append(trade_info)

        agent.position = 'sold_out'

    total_profit = env_train.total_asset_value - env_train.initial_cash
    print(f"Training Day {day + 1}/{len(train_dataset)}, Total Reward: {total_reward:.2f}, Total Profit: {total_profit:.2f}")

    trade_log = env_train.trade_log
    trade_log_df = pd.DataFrame(trade_log)
    trade_log_df.to_csv(f"{train_log_dir}/2trade_log_day_{day + 1}.csv", index=False)

    env_train.trade_log = []

Training Day 1/369, Total Reward: 43.94, Total Profit: -13129.23
Training Day 2/369, Total Reward: -67.87, Total Profit: 1831.91


In [None]:
'''
Testing Loop
'''
torch.save(agent.actor.state_dict(), 'ddqn_actor3.pth')
torch.save(agent.target.state_dict(), 'ddqn_target3.pth')
print("Agent's actor and target networks have been saved successfully.")

loaded_agent = DDQNAgent(buffer_capacity=10000)
loaded_agent.actor.load_state_dict(torch.load('ddqn_actor3.pth', weights_only=True))
loaded_agent.target.load_state_dict(torch.load('ddqn_target3.pth', weights_only=True))
loaded_agent.eval_mode()

# print("Agent's actor and target networks have been loaded and set to evaluation mode successfully.")

for day in range(len(test_dataset)):
    state = env_test.reset()
    done = False
    total_reward = 0

    while not done and state is not None:
        action = loaded_agent.select_action(state)

        next_state, reward, done, _ = env_test.step(action)

        if action == 1:
            agent.position = 'bought_in'
        elif action == 2:
            agent.position = 'sold_out'

        state = next_state
        total_reward += reward

    if env_test.shares_held > 0:
        last_close_price = env_test.raw_data[env_test.current_step + env_test.window_size - 1, 3]
        total_proceeds = env_test.shares_held * last_close_price
        profit = total_proceeds - env_test.cost_basis
        env_test.cash_balance += total_proceeds
        env_test.shares_held = 0
        env_test.cost_basis = 0.0
        env_test.total_asset_value = env_test.cash_balance

        trade_info = {
            'datetime': env_test.datetimes[-1],
            'action': 'Sell (EOD)',
            'reward': 0.0,
            'profit': profit,
            'total_asset_value': env_test.total_asset_value,
            'cash_balance': env_test.cash_balance,
            'shares_held': env_test.shares_held,
            'position': 'sold_out'
        }
        env_test.trade_log.append(trade_info)

        agent.position = 'sold_out'

    total_profit = env_test.total_asset_value - env_test.initial_cash
    print(f"Testing Day {day + 1}/{len(test_dataset)}, Total Reward: {total_reward:.2f}, Total Profit: {total_profit:.2f}")

    trade_log = env_test.trade_log
    trade_log_df = pd.DataFrame(trade_log)
    trade_log_df.to_csv(f"{test_log_dir}/2trade_log_day_{day + 1}.csv", index=False)

    env_test.trade_log = []