# Project: MoonShot - AI-powered Trading Strategy

This notebook outlines the development of MoonShot, an Artificial intelligence (AI) system designed to assess the increase in a single stock price against a wide array of technical indicators and use this information to develop a trading pattern. We aim to achieve the following objectives:

---

### Select effective technical indicators:
We will feed a number of technical indicators into the AI as additional training parameters, it will determine correlations between indicators and the increase in stock price.
From these technical indicator correlations we will select the most effective techincal indicators with the most correlation to a rise in stock price. 

---

### Train a Neural Network:
We will train a multilayer neural network on historical market data and the corresponding stock price history as well as the technical indicators selected. This neural network will attempt to learn and potentially improve upon the initial strategy, potentially identifying new patterns or refining existing ones.

We will consider the model a success if it is able to increase both the Win rate by 15% while maintaining the profit percentage.

#### Throughout this notebook, we will document the development process, including:

- Data acquisition and preparation
- Feature engineering and selection
- Building and training the neural network
- Evaluating the performance of MoonShot's strategy and the trained neural network

---

Disclaimer: This project is for educational purposes only and should not be used for real-world trading without proper risk management and regulatory compliance. The market is inherently risky, and any trading strategy, including those involving AI, is susceptible to losses.

# Import Required Libraries

In [6]:
# Required libraries

# Import the generic libraries
import sys
import pytictoc

#Import the neural network architecture
import torch
import torch.nn as nn
import torch.optim as optim

#Import financial data
import ta
import yfinance as yf

# Import data science tools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns

from tickers500 import Tickers500
from tickerTA import Ticker
from tickerTA import TechnicalAnalysis

---
# Load and Preprocess Data

## 1. Initialize our Tickers500 class, update all TickerData stored CSV files, import the DataFrame we are interested in from CSV

In [7]:
tickers500 = Tickers500()
ticker = tickers500.get_random_ticker()
ticker


'EW'

In [8]:
# start_date = '2020-01-01'
ticker_df = tickers500.load_ticker_data_to_df(ticker)
# ticker_df = tickers500.load_ticker_data_to_df(ticker, start_date)
ticker_df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2000-03-27,1.270833,1.385417,1.270833,1.375000,1.375000,11026800
1,2000-03-28,1.375000,1.375000,1.333333,1.338542,1.338542,3232800
2,2000-03-29,1.333333,1.343750,1.317708,1.333333,1.333333,1311600
3,2000-03-30,1.333333,1.333333,1.281250,1.281250,1.281250,5650800
4,2000-03-31,1.244792,1.244792,1.130208,1.130208,1.130208,23794800
...,...,...,...,...,...,...,...
6053,2024-04-18,87.349998,87.349998,85.980003,86.449997,86.449997,3122000
6054,2024-04-19,87.199997,87.199997,85.379997,85.940002,85.940002,3895700
6055,2024-04-22,86.540001,87.110001,85.730003,86.959999,86.959999,2408100
6056,2024-04-23,87.400002,87.930000,86.760002,87.750000,87.750000,2663600


## 2. Send that DataFrame through TechnicalAnalysis class to have the technical indicators added on to a DataFrame.

In [9]:
technical_analysis = TechnicalAnalysis(ticker, ticker_df)
technical_analysis.df_ta

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Z Score Adj Close,MACD,MACD Signal,...,RSI,TSI,Awesome Oscillator,Ultimate Oscillator,Stoch,Stoch Signal,Williams %R,KAMA,PPO,ROC
0,2000-03-27,1.270833,1.385417,1.270833,1.375000,1.375000,11026800,,,,...,,,,,,,,,,
1,2000-03-28,1.375000,1.375000,1.333333,1.338542,1.338542,3232800,,,,...,,,,,,,,,,
2,2000-03-29,1.333333,1.343750,1.317708,1.333333,1.333333,1311600,,,,...,,,,,,,,,,
3,2000-03-30,1.333333,1.333333,1.281250,1.281250,1.281250,5650800,,,,...,,,,,,,,,,
4,2000-03-31,1.244792,1.244792,1.130208,1.130208,1.130208,23794800,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6053,2024-04-18,87.349998,87.349998,85.980003,86.449997,86.449997,3122000,-2.278788,-0.983743,0.464616,...,19.924083,-6.340158,-2.731000,38.448934,4.820447,13.138098,-95.179553,89.860998,-0.572482,-8.334218
6054,2024-04-19,87.199997,87.199997,85.379997,85.940002,85.940002,3895700,-2.074598,-1.025598,0.208217,...,18.339316,-10.247711,-3.536500,38.387253,5.779206,6.256418,-94.220794,88.970680,-0.904894,-7.611260
6055,2024-04-22,86.540001,87.110001,85.730003,86.959999,86.959999,2408100,-1.505336,-0.934872,-0.025502,...,31.429321,-12.218100,-4.320471,41.849286,16.305484,8.968379,-83.694516,88.692621,-1.066142,-4.649121
6056,2024-04-23,87.400002,87.930000,86.760002,87.750000,87.750000,2663600,-1.114287,-0.778351,-0.220089,...,40.320548,-12.889955,-4.702441,44.519442,26.362640,16.149110,-73.637360,88.567023,-1.110528,-5.339803


In [10]:
technical_analysis.df_ta.drop(columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume'], inplace=True)
technical_analysis.df_ta

Unnamed: 0,Adj Close,Z Score Adj Close,MACD,MACD Signal,MACD Histogram,RSI,TSI,Awesome Oscillator,Ultimate Oscillator,Stoch,Stoch Signal,Williams %R,KAMA,PPO,ROC
0,1.375000,,,,,,,,,,,,,,
1,1.338542,,,,,,,,,,,,,,
2,1.333333,,,,,,,,,,,,,,
3,1.281250,,,,,,,,,,,,,,
4,1.130208,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6053,86.449997,-2.278788,-0.983743,0.464616,-1.448359,19.924083,-6.340158,-2.731000,38.448934,4.820447,13.138098,-95.179553,89.860998,-0.572482,-8.334218
6054,85.940002,-2.074598,-1.025598,0.208217,-1.233814,18.339316,-10.247711,-3.536500,38.387253,5.779206,6.256418,-94.220794,88.970680,-0.904894,-7.611260
6055,86.959999,-1.505336,-0.934872,-0.025502,-0.909371,31.429321,-12.218100,-4.320471,41.849286,16.305484,8.968379,-83.694516,88.692621,-1.066142,-4.649121
6056,87.750000,-1.114287,-0.778351,-0.220089,-0.558262,40.320548,-12.889955,-4.702441,44.519442,26.362640,16.149110,-73.637360,88.567023,-1.110528,-5.339803


In [18]:
technical_analysis.df_ta.dropna(inplace=True)

### preprocess the main dataset

* Drop data not required by any subsets
* Split the data into the X and Y sets
* Handle missing data
* scaling, normalization, and correlations

In [19]:
# Two new dataframes are created, one contains all of the input features of the dataset (x), the other contains the target values (y)
x = technical_analysis.df_ta.drop(columns=['Adj Close'])
price_diff = technical_analysis.df_ta['Adj Close'].diff()
y = price_diff.apply(lambda x: 1 if x > 0 else 0)


# The data is scaled
scaler = StandardScaler()
# scaler.fit(x)
scaler.fit_transform(x)
# x_train = sc.fit_transform(x_train)
x_scaled = scaler.transform(x)

# The data is split into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x_scaled, y, test_size=0.2, random_state=0)

# The data is converted to PyTorch tensors
x_train = torch.FloatTensor(x_train).to('cuda')
x_test = torch.FloatTensor(x_test).to('cuda')
y_train = torch.FloatTensor(y_train.to_numpy().copy()).to('cuda')
y_test = torch.FloatTensor(y_test.to_numpy().copy()).to('cuda')


In [20]:

# The neural network is defined
class ANN(nn.Module):
    # def __init__(self, input_features=9, hidden1=20, hidden2=20, output_features=1):
    def __init__(self, input_features, hidden1, hidden2, output_features):
        super().__init__()
        self.f_connected1 = nn.Linear(input_features, hidden1)
        self.f_connected2 = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, output_features)
    def forward(self, x):
        x = torch.relu(self.f_connected1(x))
        x = torch.relu(self.f_connected2(x))
        x = self.out(x)
        return x
    


In [21]:
def evaluate_model(model, x_test, y_test):
    with torch.no_grad():
        y_val = model(x_test)
        loss = loss_function(y_val, y_test)

        # calculate accuracy
        y_val = model(x_test)
        predicted = torch.argmax(y_val, 1)
        correct = (predicted == y_test).sum().item()
        accuracy = correct / y_test.shape[0]
    return accuracy, loss.item()

In [22]:
# The model is instantiated
input_size = len(x_train[0])
hidden_size1 = 56
hidden_size2 = 56
output_size = 1
torch.manual_seed(20)
model = ANN(input_size, hidden_size1, hidden_size2, output_size)
model.to('cuda')

# The loss function and optimizer are defined
loss_function = nn.MSELoss()
# criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)


In [23]:

# The model is trained
epochs = 10000
final_losses = []


In [26]:
t = pytictoc.TicToc()
t.tic()

for epoch in range(epochs):
    epoch += 1
    y_pred = model.forward(x_train).squeeze()
    loss = loss_function(y_pred, y_train)
    final_losses.append(loss)

    val_accuracy, val_loss = evaluate_model(model, x_test, y_test)

    val_pred = model.forward(x_test).squeeze()
    val_loss = loss_function(val_pred, y_test)

    if epoch%10 == 1:
        # print('Epoch number: {} and the loss: {}'.format(epoch, loss.item()))
        print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {loss.item():.4f}, Val Loss: {val_loss:.4f}, Val Accuracy: {val_accuracy:.4f}')
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

t.toc()

Epoch [2/10000], Train Loss: 0.0353, Val Loss: 0.2577, Val Accuracy: 0.4813
Epoch [12/10000], Train Loss: 0.0347, Val Loss: 0.2559, Val Accuracy: 0.4813
Epoch [22/10000], Train Loss: 0.0362, Val Loss: 0.2579, Val Accuracy: 0.4813
Epoch [32/10000], Train Loss: 0.0345, Val Loss: 0.2572, Val Accuracy: 0.4813
Epoch [42/10000], Train Loss: 0.0372, Val Loss: 0.2614, Val Accuracy: 0.4813
Epoch [52/10000], Train Loss: 0.0400, Val Loss: 0.2614, Val Accuracy: 0.4813
Epoch [62/10000], Train Loss: 0.0365, Val Loss: 0.2541, Val Accuracy: 0.4813
Epoch [72/10000], Train Loss: 0.0349, Val Loss: 0.2574, Val Accuracy: 0.4813
Epoch [82/10000], Train Loss: 0.0360, Val Loss: 0.2571, Val Accuracy: 0.4813
Epoch [92/10000], Train Loss: 0.0342, Val Loss: 0.2567, Val Accuracy: 0.4813
Epoch [102/10000], Train Loss: 0.0359, Val Loss: 0.2575, Val Accuracy: 0.4813
Epoch [112/10000], Train Loss: 0.0458, Val Loss: 0.2619, Val Accuracy: 0.4813
Epoch [122/10000], Train Loss: 0.0369, Val Loss: 0.2579, Val Accuracy: 0.48

Feature Scaling: Machine learning models often perform better when the input features are on a similar scale. You could consider scaling your input features using techniques such as Min-Max Scaling or Standard Scaling.

Feature Selection: Not all features are equally relevant for predicting your target variable. You could use feature selection techniques to identify and select the most informative features. This can also help to reduce overfitting.

Feature Engineering: You could create new features that might be informative for your prediction task. For example, you could calculate moving averages, price change percentages, or other technical indicators.

Time Series Considerations: Stock prices are a time series, and time series data has some unique characteristics that you might want to consider. For example, you could include lagged features (i.e., the value of the stock price or other features at previous time steps) as input features.

Model Selection: Different models have different strengths and weaknesses, and some models might be better suited to your task than others. You could consider trying out different types of models (e.g., linear models, decision tree-based models, neural networks) to see which one performs best.

Hyperparameter Tuning: The performance of your model can often be improved by tuning its hyperparameters. For neural networks, important hyperparameters include the learning rate, the number of layers, and the number of units in each layer.

## Training Setup

> We attempt to convert the numpy and pandas series we have currently used for our dataset into tensors
Pandas dataframes and Numpy Arrays are used before this step for data exploration and manipulation but the deep learning library pytorch performs operations on tensors.

## Training and Tuning

> Here we define the hyperparameters of the neural network and begin training the network with those parameters. As deep learning is an iterative process- with model degredation and improvements both contributing to overall progress- this section does not contain the history of experimental training and parameter tuning that moonShot has undergone.