# Neural Network Classifier

In this notebook I explore building a neural netwrok classifier from lagged returns data. As with the SVM, this model is fundamentally flawed due to the efficient markets hypothesis, but again this provides good practice for building and backtesting models.

I use PyTorch and Lightning here. This is obviously overkill for such a simple NN classifier however, it is once again a good learning experience.

In [1]:
import lightning as L
import pandas as pd
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader, random_split
from lightning.pytorch.loggers import WandbLogger

In [2]:
class Classifier(nn.Module):
    def __init__(self, input_size, output_size, hidden_l1, hidden_l2):
        super().__init__()
    
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, hidden_l1),
            nn.ReLU(),
            nn.Linear(hidden_l1, hidden_l2),
            nn.ReLU(),
            nn.Linear(hidden_l1, hidden_l2),
            nn.ReLU(),
            nn.Linear(hidden_l1, hidden_l2),
            nn.ReLU(),
            nn.Linear(hidden_l2, output_size),
        )

    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits

In [3]:
class LitClassifier(L.LightningModule):
    def __init__(self, Classifier, learning_rate):
        super().__init__()
        self.Classifier = Classifier
        self.learning_rate= learning_rate
        self.BCE = torch.nn.BCEWithLogitsLoss()
        self.save_hyperparameters()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer
    
    def training_step(self, batch, batch_idx):
        # training_step defines the train loop.
        x, y = batch
        # x = x.view(x.size(0), -1)
        x = self.Classifier(x)
        loss = self.BCE(x, y)
        self.log("train_loss", loss)
        return loss

    def validation_step(self, batch, batch_idx):
        # this is the validation loop
        x, y = batch
        # x = x.view(x.size(0), -1)
        x = self.Classifier(x)
        val_loss = self.BCE(x, y)
        self.log("val_loss", val_loss)
    
    def test_step(self, batch, batch_idx):
        # this is the test loop
        x, y = batch
        # x = x.view(x.size(0), -1)
        x = self.Classifier(x)
        test_loss = self.BCE(x, y)
        self.log("test_loss", test_loss)

In [13]:
df = pd.read_csv("EURUSD_train.csv")

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x117f05e50>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 291d49fd0, raw_cell="df = pd.read_csv("EURUSD_train.csv")" store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/Users/edroberts/Desktop/algo-trade/NN_model.ipynb#X20sZmlsZQ%3D%3D>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x117f05e50>> (for post_run_cell), with arguments args (<ExecutionResult object at 28fef4990, execution_count=13 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 291d49fd0, raw_cell="df = pd.read_csv("EURUSD_train.csv")" store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/Users/edroberts/Desktop/algo-trade/NN_model.ipynb#X20sZmlsZQ%3D%3D> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [14]:
df.iloc[:,8:14]

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x117f05e50>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 28fee1690, raw_cell="df.iloc[:,8:14]" store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/Users/edroberts/Desktop/algo-trade/NN_model.ipynb#X21sZmlsZQ%3D%3D>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Unnamed: 0,returns_lag1,returns_lag2,returns_lag3,returns_lag4,returns_lag5
0,-0.000047,-0.000702,-0.000421,0.000327,0.000421
1,0.000515,-0.000047,-0.000702,-0.000421,0.000327
2,0.000515,0.000515,-0.000047,-0.000702,-0.000421
3,0.000047,0.000515,0.000515,-0.000047,-0.000702
4,0.000164,0.000047,0.000515,0.000515,-0.000047
...,...,...,...,...,...
3995,0.000023,0.000046,0.000000,-0.000046,0.000138
3996,0.000023,0.000023,0.000046,0.000000,-0.000046
3997,-0.000276,0.000023,0.000023,0.000046,0.000000
3998,0.000138,-0.000276,0.000023,0.000023,0.000046


Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x117f05e50>> (for post_run_cell), with arguments args (<ExecutionResult object at 120432590, execution_count=14 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 28fee1690, raw_cell="df.iloc[:,8:14]" store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/Users/edroberts/Desktop/algo-trade/NN_model.ipynb#X21sZmlsZQ%3D%3D> result=      returns_lag1  returns_lag2  returns_lag3  returns_lag4  returns_lag5
0        -0.000047     -0.000702     -0.000421      0.000327      0.000421
1         0.000515     -0.000047     -0.000702     -0.000421      0.000327
2         0.000515      0.000515     -0.000047     -0.000702     -0.000421
3         0.000047      0.000515      0.000515     -0.000047     -0.000702
4         0.000164      0.000047      0.000515      0.000515     -0.000047
...            ...           ...           ...           ...       

TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [4]:
class MarketDataset(Dataset):
    def __init__(self, csv_file):
        self.data = pd.read_csv(csv_file,index_col=0)
        self.features = self.data.iloc[:,8:14].values  # Select all columns except the last one
        self.labels = self.data.iloc[:,7].values  # Select the last column

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        features = torch.FloatTensor(self.features[idx])
        label = torch.FloatTensor([self.labels[idx]])  # Assuming market direction is -1 or +1

        return features, label

In [5]:
class MarketDataModule(L.LightningDataModule):
    def __init__(self, train_dir: str = "./EURUSD_train.csv",test_dir: str = "./EURUSD_test.csv" , batch_size: int = 32):
        super().__init__()
        self.train_dir = train_dir
        self.test_dir = test_dir
        self.batch_size = batch_size

    def setup(self, stage: str):
        if stage == "fit":
            markets_full = MarketDataset(self.train_dir)
            total = len(markets_full)
            train_val = round(total * 0.8)
            lengths = [train_val, total - train_val]
            self.markets_train, self.markets_val = random_split(
                markets_full, lengths, generator=torch.Generator().manual_seed(42)
            )

        elif stage == 'test':
            self.market_test = MarketDataset(self.test_dir)
        # self.market_predict = MarketDataset(self.da_dir)


    def train_dataloader(self):
        return DataLoader(self.markets_train, 
                        batch_size=self.batch_size,
                        drop_last=True,
                        shuffle=False,
                    )

    def val_dataloader(self):
        return DataLoader(self.markets_val, 
                        batch_size=self.batch_size,
                        drop_last=True,
                        shuffle=False,
                    )

    def test_dataloader(self):
        return DataLoader(self.markets_test, batch_size=self.batch_size)

    def predict_dataloader(self):
        return DataLoader(self.markets_predict, batch_size=self.batch_size)


In [6]:
marketsDataset = MarketDataModule()
model= LitClassifier(Classifier(5,1,10,10),learning_rate=1e-3)


# train model
wandb_logger = WandbLogger(project="NN_trading", log_model="all")
trainer = L.Trainer(max_epochs=10,
                    default_root_dir="./checkpoints/",
                    logger = wandb_logger)
wandb_logger.watch(model)
trainer.fit(model=model, train_dataloaders=marketsDataset)

/Users/edroberts/opt/anaconda3/envs/DNN_trading/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:198: Attribute 'Classifier' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['Classifier'])`.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33medroberts[0m ([33mjwst[0m). Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
/Users/edroberts/opt/anaconda3/envs/DNN_trading/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:43: attribute 'Classifier' removed from hparams because it cannot be pickled

  | Name       | Type              | Params
-------------------------------------------------
0 | Classifier | Classifier        | 401   
1 | BCE        | BCEWithLogitsLoss | 0     
-------------------------------------------------
401       Trainable params
0         Non-trainable params
401       Total params
0.002     Total estimated model params size (MB)


                                                                           

/Users/edroberts/opt/anaconda3/envs/DNN_trading/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/Users/edroberts/opt/anaconda3/envs/DNN_trading/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


Epoch 9: 100%|██████████| 100/100 [00:00<00:00, 106.27it/s, v_num=acn2]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 100/100 [00:01<00:00, 87.04it/s, v_num=acn2] 
Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x117f05e50>> (for post_run_cell), with arguments args (<ExecutionResult object at 14f2b0410, execution_count=6 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 16c5967d0, raw_cell="marketsDataset = MarketDataModule()
model= LitClas.." store_history=True silent=False shell_futures=True cell_id=vscode-notebook-cell:/Users/edroberts/Desktop/algo-trade/NN_model.ipynb#W5sZmlsZQ%3D%3D> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given