# Project: MoonShot - AI-powered Trading Strategy

This notebook outlines the development of MoonShot, an artificial intelligence (AI) system designed to implement and learn from a trading strategy. We aim to achieve the following objectives:

---

### Define a Trading Strategy:
We will establish a set of rules and indicators that MoonShot will use to identify and execute potential trades. This strategy could involve technical analysis, fundamental analysis, or a combination of both.

## Trading Strategy:
The MoonShot trading strategy use a set of fundamentals and technical indicators to create buy/sell signals. The following is the list of datapoints that MoonShot uses:

* Z score
* Bollinger Bands x RSI
* Simple Moving Averages

### Implement the Strategy:
We will translate the defined trading strategy into code, enabling MoonShot to autonomously analyze market data, generate trading signals, and potentially execute trades (with proper safeguards in place).

---

### Train a Neural Network:
We will train a multilayer neural network on historical market data and the corresponding trading signals generated by MoonShot's strategy. This neural network will attempt to learn and potentially improve upon the initial strategy, potentially identifying new patterns or refining existing ones.

We will consider the model a success if it is able to increase both the Win rate by 15% while maintaining the profit percentage.

#### Throughout this notebook, we will document the development process, including:

- Data acquisition and preparation
- Feature engineering and selection
- Building and training the neural network
- Evaluating the performance of MoonShot's strategy and the trained neural network

---

Disclaimer: This project is for educational purposes only and should not be used for real-world trading without proper risk management and regulatory compliance. The market is inherently risky, and any trading strategy, including those involving AI, is susceptible to losses.

# Import Required Libraries

In [None]:
# Required libraries

# Import the generic libraries
import sys
import pytictoc

#Import the neural network architecture
import torch
import torch.nn as nn
import torch.optim as optim

#Import financial data
import ta
import yfinance as yf

# Import data science tools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns

---
# Load and Preprocess Data

> This step starts with importing the dataset.
Next, we preprocess the data to ensure its readiness for training.
This includes cleaning the data to address missing values or outliers, normalizing or scaling features for consistency, and partitioning the data into training, validation, and test sets for effective model training and evaluation.
This careful preprocessing guarantees that the data is appropriately formatted for training our deep neural network, thus enhancing its performance and generalization capabilities.

## 1. import the dataset

In [None]:
buy_trades_core = pd.read_csv("CSV/buytable.csv")
buy_trades_core.fillna(0)

Unnamed: 0,Buy P/E Ratio,Buy Fwd P/E Ratio,Buy P/B Ratio,Buy RSI,Buy Upper BB,Buy Bollinger %b,Buy Lower BB,Buy vol,Buy Z Score,Buy MACD,Buy VWAP,Buy OBV,Buy Stoch,Buy Awesome Oscillator,Buy Ultimate Oscillator,Buy TSI,Buy Acum/Dist,Profitable
0,0.000000,0.000000,0.000000,24.211988,271.219748,-0.048257,251.430252,4.947374,-2.137498,-2.154089,261.214801,-2618000,1.950719,1.286618,41.995321,-1.470883,3.717088e+05,Yes
1,0.000000,0.000000,0.000000,43.786890,91.902800,0.220007,82.260200,2.410650,-1.091614,-0.277936,86.380876,519400,35.112060,-6.482647,47.008240,-15.532739,-3.451807e+06,Yes
2,16.743890,11.495973,1.871449,20.430433,153.330658,-0.101009,140.353342,3.244329,-2.343166,-1.432220,145.745994,-564100,8.786611,-1.082588,38.515067,2.319048,1.118320e+06,Yes
3,12.446736,7.277964,2.530213,24.356451,199.953365,-0.036277,173.050635,6.725683,-2.090792,-1.231614,185.151154,-5783400,11.827957,-12.969765,37.629497,-25.499065,-1.259635e+07,Yes
4,34.833874,20.529467,8.169850,43.106665,174.727510,0.288348,158.774490,3.988255,-0.825173,-0.567193,171.210284,-5316300,0.300120,-0.592059,-6.517047,-6.758611,-9.577518e+07,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2378,0.000000,0.000000,0.000000,26.240493,14.798538,0.006185,13.057462,0.435269,-1.925247,-0.201803,13.970538,155505900,11.235955,0.151353,34.518671,10.070177,-1.702076e+08,Yes
2379,33.582000,20.858385,4.872040,11.726526,128.535788,-0.207307,113.397212,3.784644,-2.757591,-1.646504,119.362273,74241800,6.334311,-2.076471,31.686509,-3.626917,4.712295e+07,Yes
2380,48.500000,115.476190,2.465977,49.989695,137.367120,0.424863,130.969880,1.599310,-0.292939,-0.466962,133.911666,23991500,57.364341,-2.311706,48.990757,-1.053302,1.962370e+07,Yes
2381,9.650743,10.323895,2.104130,40.495837,137.930515,0.150069,127.419485,2.627758,-1.364281,-1.142085,132.167718,27058200,21.522310,-1.201559,43.661236,5.598968,-1.785873e+07,Yes


### preprocess the main dataset

* Drop data not required by any subsets
* Split the data into the X and Y sets
* Handle missing data
* scaling, normalization, and correlations

In [None]:
# We start by changing the string values for the profitable column into their boolean equivalents.
# This column will become out target (y) values
buy_trades_core["Profitable"] = buy_trades_core["Profitable"].replace("Yes", 1)
buy_trades_core["Profitable"] = buy_trades_core["Profitable"].replace("No", 0)

# Removing columns that are unused for future subsets.
# Columns are removed upstream to avoid corrupting future scaling, normalization, or correlations with bad data.
buy_trades_core = buy_trades_core.drop(columns= ["Buy vol"])
buy_trades = buy_trades_core

# Two new dataframes are created, one contains all of the input features of the dataset (x), the other contains the target values (y)
buy_x = buy_trades.drop(columns= "Profitable")
buy_y = buy_trades["Profitable"]

# The input features are then preprocessed using standard scaling and normalization techniques.
# Scaling helps prevent feature domination in model training and increases convergence in the gradient descent used in optimization functions
# The scaler is initialized from the scikit learn library and then fit to the features of our dataset
scaler = StandardScaler()
scaler.fit(buy_x)
# We finalize the process by applying the scaler to the data in our dataframe. This is stored as a numpy array.
buy_x_scaled = scaler.transform(buy_x)

# The dataset is split into the training and test sets.
# Data is shuffled to prevent overfitting to subsets and reduce underlying patterns in time based data.
# We use the industry standard of starting with an 80/20 split on the data set, adjusting if needed based on task complexity and set size
buy_x_train, buy_x_test, buy_y_train, buy_y_test = train_test_split(buy_x_scaled, buy_y, test_size=0.2, random_state = 42)

buy_x_train.shape

  buy_trades_core["Profitable"] = buy_trades_core["Profitable"].replace("No", 0)


(1906, 16)

In [None]:

# We define the class of a simple Neural Network through the use of the PyTorch library
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        #define the first layer which has neurons = <input_size> with edges per neuron = <hidden_size>
        self.fc0 = nn.Linear(input_size, hidden_size)
        #defines the hidden layers with <hidden_size> neurons and <hidden_size> outgoing edges
        self.fc1 = nn.Linear(hidden_size, hidden_size)
        # defines the second hidden layer
        #self.fc2 = nn.Linear(hidden_size, hidden_size)
        # defines the third hidden layer
        #self.fc3 = nn.Linear(hidden_size, hidden_size)
        #defines the output layer with hidden_size connections going to output_size neurons
        self.fcf = nn.Linear(hidden_size, output_size)
        #defines the relu function used as the activation function between neurons
        self.relu = nn.ReLU()
        #defines the final function used on the forward pass
        self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax()


    def forward(self, x):
        x = self.fc0(x)
        x = self.relu(x)

        x = self.fc1(x)
        x = self.relu(x)

        #x = self.fc2(x) # Second hidden layer
       # x = self.relu(x)

       # x = self.fc3(x) # Third hidden layer
       # x = self.relu(x)

        x = self.fcf(x)
        x = self.sigmoid(x)
        #x = self.softmax(x)
        #x = x.squeeze(1) # added to remove additional dimension [400, 1] added during pytorch linear layering removed for softmax
        return x

## Added a second layer with lines
#         self.fc2a = nn.Linear(hidden_size, hidden_size)
#        x = self.relu(x)
#        x = self.fc2a(x)

## Training Setup

> We attempt to convert the numpy and pandas series we have currently used for our dataset into tensors
Pandas dataframes and Numpy Arrays are used before this step for data exploration and manipulation but the deep learning library pytorch performs operations on tensors.

In [None]:

try:
    buy_y_train_tensor = torch.from_numpy(buy_y_train.values)
    buy_y_train_tensor = buy_y_train_tensor.float()
    print("Created Y train tensor")
    buy_y_validation_tensor = torch.from_numpy(buy_y_test.values)
    buy_y_validation_tensor = buy_y_validation_tensor.float()
    print("Created Y validation tensor")
except ValueError:
    print("Error: buy_y_train or buy_y_test contains non-convertible values.")
try:
    buy_x_train_tensor = torch.from_numpy(buy_x_train)
    buy_x_train_tensor = buy_x_train_tensor.float()
    print("Created X train tensor")
    print("Createc X validation tensor")
    buy_x_validation_tensor = torch.from_numpy(buy_x_test)
    buy_x_validation_tensor = buy_x_validation_tensor.float()
except ValueError:
    print("Error: buy_x_train contains non-convertible values.")



Created Y train tensor
Created Y validation tensor
Created X train tensor


## Training and Tuning

> Here we define the hyperparameters of the neural network and begin training the network with those parameters. As deep learning is an iterative process- with model degredation and improvements both contributing to overall progress- this section does not contain the history of experimental training and parameter tuning that moonShot has undergone.

In [None]:

input_size = len(buy_x_train_tensor[0])
hidden_size = 56
#56
output_size = 1
learning_rate = 0.00001
num_epochs = 100

# Loss: 0.4661
# Loss with 2 layers: 0.4745


In [None]:
moonShot_buy = SimpleNN(input_size, hidden_size, output_size)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(moonShot_buy.parameters(), lr=learning_rate)
# optimizer = torch.optim.SGD(moonShot_buy.parameters(), lr=learning_rate, momentum=0.9)
# optimizer = torch.optim.Adagrad(moonShot_buy.parameters(), lr=learning_rate)



In [None]:

def evaluate(model, x_val, y_val):
  """
  This function evaluates the model performance on a validation set.

  Args:
      model: The deep neural network model.
      x_val: Validation set input data.
      y_val: Validation set target labels.

  Returns:
      val_loss: The validation loss (calculated using the criterion function).
      val_accuracy: The validation accuracy.
  """
  with torch.no_grad():  # Deactivate gradient calculation for validation
    # Forward pass on validation set
    val_outputs = model(x_val)
    val_loss = criterion(val_outputs, y_val)

    # Calculate accuracy
    predicted = (val_outputs > 0.5).float()  # Thresholding for binary classification
    val_accuracy = (predicted == y_val).sum() / len(y_val)

  return val_loss.item(), val_accuracy.item()


In [None]:
# Reshape target tensor to match output shape
buy_y_train_tensor = buy_y_train_tensor.view(-1, 1)
buy_y_validation_tensor = buy_y_validation_tensor.view(-1, 1)

for epoch in range(num_epochs):
    moonShot_buy.train()
    optimizer.zero_grad()

# The forward pass as defined in the neural network architecture
    outputs = moonShot_buy(buy_x_train_tensor)
    loss = criterion(outputs, buy_y_train_tensor)

# Backward pass of the calculated loss
    loss.backward()
    optimizer.step()


# Evaluate on validation set
    val_loss, val_accuracy = evaluate(moonShot_buy, buy_x_validation_tensor, buy_y_validation_tensor)

# Print loss and validation metrics (optional)
    if (epoch + 1) % 2 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {loss.item():.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}')


Epoch [2/100], Train Loss: 0.6539, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [4/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [6/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [8/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [10/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [12/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [14/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [16/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [18/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [20/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [22/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [24/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6059
Epoch [26/100], Train Loss: 0.6538, Val Loss: 0.6631, Val Accuracy: 0.6080
Epoch [28/100], Train Loss: 0