# Predicting Stock Price Movement with a Random Forest Classifier
##  Also: Backtesting

As I read through the book "The Master Algorithm" by Pedro Domingos I learned that in the beginning of algorithmic trading at wallstreet, Random Forests were successfully incorperated into trading systems, aiding experienced traders in their decisions which stock to buy or sell. This sparked my interest because on the one hand, I haven't had the chance to dive deep into Random Forests and mixed with my interest in financial markets promised an interesting side-project for me. On the other hand I knew that while Random Forests have been overtaken by Neural Networks in many aspects, they can still achieve sota performance, if applied to the right task. In trading everything is about gainig alpha, that little bit of edge against the competition that lets you make smarter and more profitable decisions. To really compete, one has to find a niche that suits ones trading strategy, and maybe a Random Forest on just the right indicators could be my edge. One can dream, anyhow.

This is serves more as a showcase of a backtesting pipeline, as I'm not going to write yet another blog post explaining what Random Forests are or how to calculate basic trading indicators. There are literally hundred of these already, besides of course the literature, which I'm sure does a better job explaining than I could. The Pipeline is straight forward, and comments should be enough to give insight into more complicated code.

### Content
1. Data Preparation
2. Calculating Indicators
3. Model Fit
4. First Metrics
5. Backtesting
6. What Now?

In [2]:
"""The usual Imports"""
import ast
import time
import pandas as pd
from datetime import date, timedelta, datetime
import math
import numpy as np
import matplotlib.pyplot as plt



## 1. Data Preparation

Two Files are needed: 
tickers_and_sectors is a file I composed of a public available list of nasdaq tickers. As the name suggest, I only needed the ticker (Name of a stock) and its sector (e.g. Technology).

Historical Data. Obviously. Taken from yahoo finance with yfinance lib. Can stem from where-ever.

In [3]:
"""Tickers and Sectors"""

with open('tickers_and_sectors.txt', 'r') as f:
    tickers_and_sectors = ast.literal_eval(f.readlines()[0])

# Dict that sorts tickers by sector
tickers_by_sector = {}

for ticker, sector in tickers_and_sectors.items():
    if sector in tickers_by_sector:
        tickers_by_sector[sector].append(ticker)
    else:
        tickers_by_sector[sector] = [ticker]
        

"""Historical Data"""
start = time.time()

data = pd.read_csv('10_year_all_tickers', index_col=[0], header=[0,1])

end = time.time()
print("this took {} seconds".format(end-start))

data[:5]

this took 38.116071939468384 seconds


Unnamed: 0_level_0,AACG,AACG,AACG,AACG,AACG,AACG,AACQ,AACQ,AACQ,AACQ,...,ZYNE,ZYNE,ZYNE,ZYNE,ZYXI,ZYXI,ZYXI,ZYXI,ZYXI,ZYXI
Unnamed: 0_level_1,Adj Close,Close,High,Low,Open,Volume,Adj Close,Close,High,Low,...,High,Low,Open,Volume,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2011-03-16,0.650397,8.1,8.25,7.69,7.71,158900.0,,,,,...,,,,,,,,,,
2011-03-17,0.649594,8.09,8.6,7.84,8.41,49200.0,,,,,...,,,,,0.408047,0.75,0.75,0.75,0.75,9000.0
2011-03-18,0.641565,7.99,8.1,7.68,8.1,18100.0,,,,,...,,,,,0.408047,0.75,0.75,0.75,0.75,6500.0
2011-03-21,0.635141,7.91,8.0,7.8,8.0,14700.0,,,,,...,,,,,0.408047,0.75,0.75,0.75,0.75,8600.0
2011-03-22,0.635141,7.91,8.0,7.86,8.0,4800.0,,,,,...,,,,,0.408047,0.75,0.75,0.75,0.75,100.0


In [4]:
"""
This cell gets rid of the multihierarchical index and sorts by ticker, rather than by date. This is needed for indicator calculation.
"""

start = time.time()
adj = data.T.melt(value_name='Adj Close', ignore_index=False).xs('Adj Close', level=1)
adj.index.name = 'Symbol'
adj = adj.sort_values(['Symbol', 'Date']).reset_index()

close = data.T.melt(value_name='Close', ignore_index=False).xs('Close', level=1)
close.index.name = 'Symbol'
close = close.sort_values(['Symbol', 'Date']).reset_index()

high = data.T.melt(value_name='High', ignore_index=False).xs('High', level=1)
high.index.name = 'Symbol'
high = high.sort_values(['Symbol', 'Date']).reset_index()

low = data.T.melt(value_name='Low', ignore_index=False).xs('Low', level=1)
low.index.name = 'Symbol'
low = low.sort_values(['Symbol', 'Date']).reset_index()

_open = data.T.melt(value_name='Open', ignore_index=False).xs('Open', level=1)
_open.index.name = 'Symbol'
_open = _open.sort_values(['Symbol', 'Date']).reset_index()

vol = data.T.melt(value_name='Volume', ignore_index=False).xs('Volume', level=1)
vol.index.name = 'Symbol'
vol = vol.sort_values(['Symbol', 'Date']).reset_index()

#ToDo give cleaned another name
cleaned = pd.merge(adj, close, how='inner', left_on=['Date', 'Symbol'], right_on=['Date', 'Symbol'])
cleaned = pd.merge(cleaned, high, how='inner', left_on=['Date', 'Symbol'], right_on=['Date', 'Symbol'])
cleaned = pd.merge(cleaned, low, how='inner', left_on=['Date', 'Symbol'], right_on=['Date', 'Symbol'])
cleaned = pd.merge(cleaned, _open, how='inner', left_on=['Date', 'Symbol'], right_on=['Date', 'Symbol'])
cleaned = pd.merge(cleaned, vol, how='inner', left_on=['Date', 'Symbol'], right_on=['Date', 'Symbol'])

end = time.time()

print("this took {} seconds".format(end-start))
cleaned

this took 70.4691858291626 seconds


Unnamed: 0,Symbol,Date,Adj Close,Close,High,Low,Open,Volume
0,AACG,2011-03-16,0.650397,8.100000,8.250000,7.690000,7.710000,158900.0
1,AACG,2011-03-17,0.649594,8.090000,8.600000,7.840000,8.410000,49200.0
2,AACG,2011-03-18,0.641565,7.990000,8.100000,7.680000,8.100000,18100.0
3,AACG,2011-03-21,0.635141,7.910000,8.000000,7.800000,8.000000,14700.0
4,AACG,2011-03-22,0.635141,7.910000,8.000000,7.860000,8.000000,4800.0
...,...,...,...,...,...,...,...,...
9514249,ZYXI,2021-03-10,16.740000,16.740000,17.799999,16.610001,17.600000,556200.0
9514250,ZYXI,2021-03-11,16.980000,16.980000,17.110001,16.370001,16.820000,518100.0
9514251,ZYXI,2021-03-12,16.870001,16.870001,16.959999,16.549999,16.879999,421500.0
9514252,ZYXI,2021-03-15,16.330000,16.330000,17.030001,16.129999,16.809999,463200.0


In [5]:
"""
From now on I rely on numpy arrays, as they work much quicker on this scale than pandas, as countless posts prove.
We're working with 10 years of dayily trading data for well above 3000 tickers.

Order of columns: ['ticker', 'date', 'adjclose', 'close', 'high', 'low', 'open', 'volume']
"""

np_data = cleaned.to_numpy()
np_data.shape

(9514254, 8)

## 2. Caluclating Technical Indicators

1. Change of Price (cop) & boolean for up-days
2. Relative Strength Indicator (rsi) & Exponentially Weighted Moving Average (ewma)
3. On-Balance Volume (obv)
4. Adding Prediction Column and cleaning data from NaN-Values
5. Sector indicators.

All of these are calculated in a separate cell. In a bigger systems one should of course look to minimize time spent, packing as much as possible in a single loop while maintaing read-, and scaleability.

In [6]:
"""
Calculating change in price together with up days (days which closed above their open price)
"""

# used only once, outcomment
np_data = np.hstack((np_data, np.ones((np_data.shape[0],2))))

start = time.time()
for i in range(np_data.shape[0]):
    cop = np_data[i][6] - np_data[i][3]
    np_data[i][8] = cop
    if cop < 0:
        np_data[i][9] = 0

end = time.time()    
print("This took {} seconds".format(end-start))
#print_lines(np_data,20)

This took 7.517102241516113 seconds


In [7]:
"""
Calculating Relative Strength Indicator
For the rsi required ewma for up-days and down-days are also used as separate indicators. Later we'll think about this again.
"""

# used only once, outcomment
np_data = np.hstack((np_data, np.ones((np_data.shape[0],3))))

#variables
alpha = 0.82
window_size = 14
#alpha = 2 / (window_size + 1.0)
up_day = 0 
cop = 0.0 # change of price
ewma_up, ewma_down = 0.0, 0.0

start_time = time.time()

for i, row in enumerate(np_data):
    #print(row, i)
    cop = 0 if np.isnan(row[8]) else row[8]
    up_day = row[9]
    
    # THINK ABOUT IT: does it make sense to add ewma up and down as seperate indicator, or does this introduce a stronger bias because
    # rsi already incoperates them? d
    if up_day == 1:
        ewma_up = alpha * cop + (1-alpha) * ewma_up
        row[11] = ewma_up
        
    else:
        ewma_down = alpha * (-cop) + (1-alpha) * ewma_up
        row[12] = ewma_down
             
    if i >= window_size:
        #print(ewma_down, ewma_up)
        try:
            row[10] = 100 - (100/ (1+ ewma_up/ewma_down))
        except ZeroDivisionError as err:
            row[10] = 0
            #print("Handling runtime error, rsi set to 0", err)
    
    try:
        if row[0] != np_data[i+1][0]:
            #print('New Ticker, time until now: {} sec.'.format(time.time()-start_time))
            ewma_up, ewma_down = 0, 0
    except IndexError as err:
        print("Finished with last index check out of bounds", err)

print("This took {} seconds".format(time.time()-start_time))

Finished with last index check out of bounds index 9514254 is out of bounds for axis 0 with size 9514254
This took 22.63750410079956 seconds


In [8]:
"""
Calculating On-Balance Volume Indicator
"""

# used only once, outcomment
np_data = np.hstack((np_data, np.ones((np_data.shape[0],1))))


# variables
obv = 0 #on balance volume
volume = 0
up_day = 0

start_time = time.time()

for i, row in enumerate(np_data):
    #print(row, i)
    volume = 0 if np.isnan(row[7]) else row[7] 
    up_day = row[9]
    
    if up_day == 1:
        obv += volume
    else:
        obv -= volume
    
    row[-1] = obv
    
    try:
        if row[0] != np_data[i+1][0]:
            #print('New Ticker, time until now: {} sec.'.format(time.time()-start_time))
            obv = 0
    except IndexError as err:
        print("Finished with last index check out of bounds", err)
            
print("This took {} seconds".format(time.time()-start_time))

Finished with last index check out of bounds index 9514254 is out of bounds for axis 0 with size 9514254
This took 17.308608055114746 seconds


In [9]:
"""
Adding Prediction Column
"""

# used only once, outcomment
# we add two columns here, because we want two extra for sector indices,
# which will be only added after another resort
np_data = np.hstack((np_data, np.zeros((np_data.shape[0],3))))


start_price, cur_price = 0, 0
time_span = 14

start_time = time.time()

for i, row in enumerate(np_data):
    if i+time_span >= np_data.shape[0]: break
        
    start_price = row[3]
    
    for j in range(time_span):
        cur_price = np_data[i+j][3]
        
        if cur_price >= (start_price * 1.15):
            np_data[i][-1] = 1
            
print("This took {} sec".format(time.time()-start_time))

This took 53.224575996398926 sec


In [10]:
"""
Sorting and Cleaning the data from NaN values
"""

import math

# Sort data for dates
np_data = np_data[np_data[:, 1].argsort()]

# Clean data from NaN values
cleaned_data = []

start = time.time()

for row in np_data:
    no_nan = True
    for ele in row[2:]:
        if math.isnan(ele):
            no_nan = False
            break
    if no_nan:
        cleaned_data.append(row)
        
print("This took {} secs".format(time.time()- start))        
cleaned_data = np.asarray(cleaned_data)
cleaned_data.shape

This took 24.340319871902466 secs


(5319966, 17)

In [11]:
"""
Sector indicators
Go through all tickers for one day, sum up the relevant values,
then reiterate through the day and fill in the result.

Requires data sorted by dates
"""

for sector, ticker_list in tickers_by_sector.items():
    idx = 0
    k = 0
    count_fin = False
    sector_updays = 0
    sector_ewma = 0
    n_tickers = len(ticker_list)
    
    pre_date = cleaned_data[0,1]
    
    print(sector)
    while idx < cleaned_data.shape[0]:
        row = cleaned_data[idx]
        ticker = row[0]
        date = row[1]
        #print(pre_date, date)
        # Summed all relevant vals of one day, now fill them in
        if date != pre_date and not count_fin:
            idx -= k
            count_fin = True
            #print(idx)
            continue
            
        # Do for one day:
        if ticker in ticker_list and not count_fin:
            
            up_day = row[9]
            sector_ewma = row[11] + row[12]
            
            if up_day:
                sector_updays += 1
            else:
                sector_updays -= 1
            #if sector_updays != 0:
                #print("here")
        
        if count_fin and ticker in ticker_list:
            cleaned_data[idx][-3] = sector_updays
            cleaned_data[idx][-2] = sector_ewma / n_tickers
            #row[-2] = sector_updays #because last row is already prediction row
            
        k += 1    
        # Next Day Begins
        if count_fin and date != pre_date:
            count_fin = False
            sector_updays = 0
            k = 0
        idx += 1
        pre_date = date

Consumer Defensive
Industrials
Financial Services
Technology
Healthcare
Communication Services
Utilities

Real Estate
Energy
Basic Materials
Consumer Cyclical
Services


Indicators are the vital part of a algorithmic trading strategy. You have to carefully pick a combination that you think will fit your strategy well, indicators that can work together and yield new information not only about the present day, but which also allow a look into the past. Dont forget to keep an eye out for the bigger picture, with sector indices providing a birds eye view of the current market sentiment. Of course one can go absolutely haywire and stuff as many indicators into a system as possible, but I believe in simplicity, price action and volume. Therefore I picked just a few indicators that support my view. While these are not all that I used over the course of the project, they are some of the most important ones.

In [12]:
raise Exception("You will not pass!")

Exception: You will not pass!

## 3. Model Fit

In [22]:
from sklearn.model_selection import train_test_split

X_cols = cleaned_data[:,2:-1].astype('float64')
#X_cols = numerical_data[:,:6]
Y_cols = cleaned_data[:,-1].astype('float64')

X_train, X_test, y_train, y_test = train_test_split(X_cols, Y_cols, train_size=0.6, random_state = 0)
print("X_train: {}, X_test: {}, y_train: {}, y_test: {}".format(X_train.shape, X_test.shape, y_train.shape, y_test.shape))

X_train: (3191979, 14), X_test: (2127987, 14), y_train: (3191979,), y_test: (2127987,)


In [None]:
from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier(n_estimators = 100,  criterion = "gini", random_state = 0)#oob_score = True,

start = time.time()

# Fit the data to the model
forest.fit(X_train, y_train)
#forest.fit(sampl, y_train_t)
#forest.fit(X_train[:10], y_train[:10])
#forest.fit(np.random.random_sample((3989,11)), y_train[:3989])

print("This took {} secs".format(time.time()-start))

In [None]:
import joblib
# save
joblib.dump(forest, "eng_10y_sector_trainsize06.joblib", compress=3)

eng trained on all rows sector forest, tested on german test data:
    Of the correct predictions, we have 1408 positives and 309284 negatives
    Of the incorrect predictions, we have 4723 positives and 26631 negatives
    
eng trained on all rows sector forest, tested on german train data:
    Of these correct predictions, we have 4171 positives and 927940 negatives
    Of the incorrect predictions, we have 13979 positives and 80047 negatives
    
eng trained price action solo indicators, tested on german test data:
    Of these correct predictions, we have 1399 positives and 309330 negatives
    Of the incorrect predictions, we have 4677 positives and 26640 negatives

In [None]:
sampl = np.random.uniform(low=2.42, high=1337.42, size=(150000,5))

In [None]:
y_train.shape

In [None]:
y_pred = forest.predict(X_test)

total_cor, total = 0, 0
total_true_pos, total_true_neg = 0, 0
total_false_pos, total_false_neg = 0, 0

for pred, target in zip(y_pred, y_test):
    total += 1
    if pred == target:
        total_cor += 1
        
        if target == 0: total_true_neg += 1
        else: total_true_pos += 1
    elif pred == 0:
        total_false_neg += 1
    else: total_false_pos += 1


In [None]:
print("{} were correct predictions, yielding an accuracy of {}%.".format(total_cor, (total_cor/total)*100))
print("Of the correct predictions, we have {} positives and {} negatives".format(total_true_pos, total_true_neg))
print("Of the incorrect predictions, we have {} positives and {} negatives".format(total_false_pos, total_false_neg))

In [None]:
total_cor, total = 0, 0

for i, rows in enumerate(zip(X_test, y_test)):
    x_row, y_row = rows[0], rows[1]
    
    y_pred = forest.predict(x_row.reshape(1, -1))
    
    if y_pred == y_row:
        total_cor += 1
        print("Correct predicted row: ", x_row)
    else:
        print("Incorrectly predicted row: ", x_row)

### Save the model with joblib dump

In [13]:
import joblib
# save
#joblib.dump(forest, "german_no_sector_forest.joblib", compress=3) #without compress its 6 gb big :D 

#to load:
forest = joblib.load("eng_10y_sector_trainsize06.joblib")

## Backtesting with budget

In [None]:
cleaned_data[389589]
    

In [None]:
import backtest

# Trade Attributes
total_n_trades = 0
total_n_trades_won = 0
total_n_trades_lost = 0
total_sum_trades_won = 0
total_sum_trades_lost = 0

total_average_gain = 0
total_average_loss = 0
total_total_profit = 0


# dict of all currently held stocks
held = {}
# dict of all buys of one stock
bought = {}
# dict of all sells of one stock
sold = {}
# dict of all profits of one day
profit_per_day = {}
total_revenue = 0

bet_percent = 0.12
budget = 30000
#bet_size = backtest.calc_bet_size(budget, bet_percent)

start = time.time()
old_date = 0

for row in cleaned_data[X_train.shape[0]+1727987:,]:
#for row in cleaned_data[-250000:,]:
#for row in test_t:
    ticker, date = row[0], row[1]
    data = row[2:-1] #2 if adj close should be incoperated
    bet_size = budget * bet_percent
    
    #if old_date != date:
        #print("\n###################\n     One day takes", time.time()-start)
        #print("################")
    old_date = date
    
    # Before buying, make sure ticker is not already bought
    # If already bought, check for a sell signal
    if ticker in held.keys():
        n_shares, held_data = held[ticker]

        # held_data[1] is close price
        if backtest.is_sell(held_data[1], data[1]):
            _, negative = backtest.is_sell(held_data[1], data[1]) # change two lines so you only need function call once
            revenue, profit, budget = backtest.sell(entry_ticker_close=held_data[1], cur_ticker_close=data[1],
                                           n_shares=n_shares, budget=budget, neg=negative, perfect=False)

            if negative:
                total_n_trades_lost += 1
                total_sum_trades_lost -= profit
            else:
                total_n_trades_won += 1
                total_sum_trades_won += profit

            del held[ticker]
            print("\nSold {} at {} for a profit of {:4f} returning {} as revenue".format(ticker, date,
                                                                                         profit, revenue))
            print("New budget: ", budget)
            
    
    # buy ticker
    #ToDo maybe input lag in buying, say one stock thas has been sold cant be rebought for one month
    elif backtest.is_buy(data, forest): # shouldnt hit if ticker is sold the same day
        if budget > 10000:
            print("\nBought ", ticker, " at ", date, end=" ")
            n_shares, left_over_bet, budget = backtest.buy(share_price= data[1],
                                                          bet_size=bet_size, budget = budget)
            held[ticker] = (n_shares, data)
            print("for {} n shares with per share price: {:4f}, totaling: {:4f}".format(n_shares, data[1],
                                                                                        n_shares*data[1]))
            print("New budget is: ", budget)
            
    
            
print("This took {} secs".format(time.time()-start))
print("Final budget is: ", budget)

money_held = 0
for n_shares, data in held.values():
    money_held += n_shares * data[1]
print("Money still held: ", money_held)

In [None]:
len(held.values())

In [None]:
for t in held.items():
    print(t)

In [None]:
print(cleaned_data[3989975])
print(date)

In [None]:
print("This took {} secs".format(time.time()-start))
print("Final budget is: ", budget)

money_held = 0
for ticker, (n_shares, data) in held.items():
    #print(ticker, n_shares)
    money_held += n_shares * data[1]
print("Money still held: ", money_held)

In [None]:
total_n_trades_lost

In [None]:
total_n_trades_won

In [None]:
len(held.keys())

In [None]:
total_sum_trades_lost

In [None]:
total_sum_trades_won

In [None]:
pd.Series(forest.feature_importances_, index=['adjclose', 'close', 'high', 'low', 'open', 'volume', 'cop', 'upday', 
                                              'rsi', 'ewma_up', 'ewma_down', 'obv', 'sector_up', 'sector_ewma']).sort_values(ascending=False)

## Test with fewer tickers

Now we want to run our test with tickers from the s&p 500. This will show us what happens in backtesting when restricting number of tickers and let us make an educated guess which ticker have more impact with the performance of the model. S&P 500 consists of the strongest tickers in the US-market, especially in the tech-sector. With this we can estimate if our model mainly relies on established and well performing tickers, or if it makes heavy use of smaller, more under-the-radar tickers.

In [14]:

sp500_data = pd.read_csv('s&p500_constituents_csv.csv', header=[0])
print(sp500_data.head())
sp500_tickers = sp500_data['Symbol'].to_list()

  Symbol                 Name       Sector
0    MMM                   3M  Industrials
1    AOS          A. O. Smith  Industrials
2    ABT  Abbott Laboratories  Health Care
3   ABBV               AbbVie  Health Care
4   ABMD              Abiomed  Health Care


In [23]:
import backtest

# Trade Attributes
total_n_trades = 0
total_n_trades_won = 0
total_n_trades_lost = 0
total_sum_trades_won = 0
total_sum_trades_lost = 0

total_average_gain = 0
total_average_loss = 0
total_total_profit = 0


# dict of all currently held stocks
held = {}
# dict of all buys of one stock
bought = {}
# dict of all sells of one stock
sold = {}
# dict of all profits of one day
profit_per_day = {}
total_revenue = 0

bet_percent = 0.12
budget = 30000
#bet_size = backtest.calc_bet_size(budget, bet_percent)

start = time.time()
old_date = 0

for row in cleaned_data[X_train.shape[0]+1727987:,]:
#for row in cleaned_data[-250000:,]:
#for row in test_t:
    ticker, date = row[0], row[1]
    
    if ticker not in sp500_tickers:
        continue
        
    data = row[2:-1] #2 if adj close should be incoperated
    bet_size = budget * bet_percent
    
    #if old_date != date:
        #print("\n###################\n     One day takes", time.time()-start)
        #print("################")
    old_date = date
    
    # Before buying, make sure ticker is not already bought
    # If already bought, check for a sell signal
    if ticker in held.keys():
        n_shares, held_data = held[ticker]

        # held_data[1] is close price
        if backtest.is_sell(held_data[1], data[1]):
            _, negative = backtest.is_sell(held_data[1], data[1]) # change two lines so you only need function call once
            revenue, profit, budget = backtest.sell(entry_ticker_close=held_data[1], cur_ticker_close=data[1],
                                           n_shares=n_shares, budget=budget, neg=negative, perfect=False)

            if negative:
                total_n_trades_lost += 1
                total_sum_trades_lost -= profit
            else:
                total_n_trades_won += 1
                total_sum_trades_won += profit

            del held[ticker]
            print("\nSold {} at {} for a profit of {:4f} returning {} as revenue".format(ticker, date,
                                                                                         profit, revenue))
            print("New budget: ", budget)
            
    
    # buy ticker
    #ToDo maybe input lag in buying, say one stock thas has been sold cant be rebought for one month
    elif backtest.is_buy(data, forest): # shouldnt hit if ticker is sold the same day
        if budget > 10000:
            print("\nBought ", ticker, " at ", date, end=" ")
            n_shares, left_over_bet, budget = backtest.buy(share_price= data[1],
                                                          bet_size=bet_size, budget = budget)
            held[ticker] = (n_shares, data)
            print("for {} n shares with per share price: {:4f}, totaling: {:4f}".format(n_shares, data[1],
                                                                                        n_shares*data[1]))
            print("New budget is: ", budget)
            
    
            
print("This took {} secs".format(time.time()-start))
print("Final budget is: ", budget)

money_held = 0
for n_shares, data in held.values():
    money_held += n_shares * data[1]
print("Money still held: ", money_held)


Bought  ENPH  at  2020-09-17 for 53 n shares with per share price: 66.870003, totaling: 3544.110146
New budget is:  26455.88985

Bought  ETSY  at  2020-09-17 for 29 n shares with per share price: 109.230003, totaling: 3167.670097
New budget is:  23288.219748

Bought  MKTX  at  2020-09-18 for 6 n shares with per share price: 438.660004, totaling: 2631.960022
New budget is:  20656.25972824

Bought  AMD  at  2020-09-18 for 33 n shares with per share price: 74.930000, totaling: 2472.690010
New budget is:  18183.5697208512

Bought  MPWR  at  2020-09-18 for 8 n shares with per share price: 253.940002, totaling: 2031.520020
New budget is:  16152.049704349056

Bought  CDW  at  2020-09-21 for 17 n shares with per share price: 108.809998, totaling: 1849.769958
New budget is:  14302.27974982717

Bought  ILMN  at  2020-09-22 for 6 n shares with per share price: 268.510010, totaling: 1611.060059
New budget is:  12691.219689847909

Bought  SIVB  at  2020-09-22 for 6 n shares with per share price: 2


Bought  DISCA  at  2020-11-05 for 69 n shares with per share price: 21.430000, totaling: 1478.670021
New budget is:  10980.93799309664

Sold WYNN at 2020-11-05 for a profit of 269.867976 returning 3265.677896 as revenue
New budget:  14246.61588909664

Bought  NTRS  at  2020-11-05 for 20 n shares with per share price: 81.580002, totaling: 1631.600037
New budget is:  12615.015852405044

Sold GOOGL at 2020-11-05 for a profit of 139.530030 returning 1669.4799799999998 as revenue
New budget:  14284.495832405044

Bought  AMAT  at  2020-11-05 for 24 n shares with per share price: 69.949997, totaling: 1678.799927
New budget is:  12605.69590251644

Bought  CZR  at  2020-11-05 for 26 n shares with per share price: 56.130001, totaling: 1459.380028
New budget is:  11146.315874214468

Sold HOLX at 2020-11-05 for a profit of 142.002012 returning 1668.7419519999999 as revenue
New budget:  12815.057826214468

Bought  APA  at  2020-11-05 for 171 n shares with per share price: 8.970000, totaling: 1533.


Bought  WDC  at  2020-11-25 for 40 n shares with per share price: 45.259998, totaling: 1810.399933
New budget is:  13475.685765469645

Bought  AAL  at  2020-11-27 for 107 n shares with per share price: 14.980000, totaling: 1602.859951
New budget is:  11872.825813613288

Bought  NTAP  at  2020-11-27 for 26 n shares with per share price: 53.259998, totaling: 1384.759956
New budget is:  10488.065855979694

Bought  FANG  at  2020-11-27 for 28 n shares with per share price: 43.520000, totaling: 1218.560013
New budget is:  9269.505843262132

Sold MRNA at 2020-11-27 for a profit of 392.886018 returning 2659.765958 as revenue
New budget:  11929.271801262132

Sold FANG at 2020-11-30 for a profit of -99.680040 returning 1118.87997 as revenue
New budget:  13048.151771262132

Bought  PENN  at  2020-11-30 for 22 n shares with per share price: 70.000000, totaling: 1540.000000
New budget is:  11508.151768710677

Sold AAL at 2020-11-30 for a profit of -90.949940 returning 1511.91001 as revenue
New bu


Bought  MRNA  at  2020-12-23 for 9 n shares with per share price: 130.339996, totaling: 1173.059967
New budget is:  9606.210256077979

Sold MRNA at 2020-12-24 for a profit of -62.549980 returning 1110.50999 as revenue
New budget:  10716.720246077979

Sold MPWR at 2020-12-24 for a profit of 261.684120 returning 3039.1739900000002 as revenue
New budget:  13755.89423607798

Bought  KLAC  at  2020-12-24 for 6 n shares with per share price: 259.079987, totaling: 1554.479919
New budget is:  12201.414317748622

Bought  LRCX  at  2020-12-24 for 3 n shares with per share price: 480.339996, totaling: 1441.019989
New budget is:  10760.394329618786

Bought  DISCA  at  2020-12-24 for 45 n shares with per share price: 28.570000, totaling: 1285.649986
New budget is:  9474.744340064532

Sold ABMD at 2020-12-28 for a profit of 224.448048 returning 2353.647998 as revenue
New budget:  11828.392338064532

Bought  AMAT  at  2020-12-28 for 16 n shares with per share price: 84.870003, totaling: 1357.920044



Sold FFIV at 2021-01-25 for a profit of 146.609970 returning 1767.51002 as revenue
New budget:  11437.342188805067

Sold VIAC at 2021-01-25 for a profit of 209.484006 returning 2055.003956 as revenue
New budget:  13492.346144805068

Sold DISCA at 2021-01-26 for a profit of 159.263994 returning 1604.064054 as revenue
New budget:  15096.410198805068

Sold MRNA at 2021-01-26 for a profit of 125.459934 returning 1435.659974 as revenue
New budget:  16532.070172805066

Bought  FANG  at  2021-01-26 for 33 n shares with per share price: 59.610001, totaling: 1967.130020
New budget is:  14564.940152068459

Sold DISCK at 2021-01-26 for a profit of 152.958042 returning 1774.2280520000002 as revenue
New budget:  16339.168204068459

Bought  ZBRA  at  2021-01-26 for 4 n shares with per share price: 395.160004, totaling: 1580.640015
New budget is:  14758.528189580245

Bought  APA  at  2021-01-26 for 115 n shares with per share price: 15.300000, totaling: 1759.500022
New budget is:  12999.028166830616


Sold NVDA at 2021-02-11 for a profit of 167.976012 returning 1718.106082 as revenue
New budget:  18928.267861948785

Sold DISCA at 2021-02-12 for a profit of 166.787970 returning 1848.60799 as revenue
New budget:  20776.875851948786

Bought  FOX  at  2021-02-12 for 79 n shares with per share price: 31.450001, totaling: 2484.550060
New budget is:  18292.32578971493

Sold AMAT at 2021-02-12 for a profit of 173.951952 returning 1751.2319819999998 as revenue
New budget:  20043.55777171493

Sold VIAC at 2021-02-12 for a profit of 115.500006 returning 1380.750026 as revenue
New budget:  21424.307797714933

Sold ILMN at 2021-02-12 for a profit of 91.020006 returning 948.840016 as revenue
New budget:  22373.14781371493

Sold ENPH at 2021-02-12 for a profit of 115.457958 returning 1368.597988 as revenue
New budget:  23741.745801714933

Sold ETSY at 2021-02-12 for a profit of 132.089982 returning 1548.960012 as revenue
New budget:  25290.705813714932

Bought  MAR  at  2021-02-12 for 23 n shares


Bought  PENN  at  2021-03-08 for 31 n shares with per share price: 109.279999, totaling: 3387.679962
New budget is:  24879.449498670707

Sold TSLA at 2021-03-08 for a profit of -104.850040 returning 1689.0 as revenue
New budget:  26568.449498670707

Bought  AAL  at  2021-03-08 for 148 n shares with per share price: 21.469999, totaling: 3177.559898
New budget is:  23390.889598830225

Bought  NXPI  at  2021-03-08 for 16 n shares with per share price: 171.000000, totaling: 2736.000000
New budget is:  20654.889596970595

Sold XRAY at 2021-03-08 for a profit of 235.704024 returning 2690.264054 as revenue
New budget:  23345.153650970595

Bought  PTC  at  2021-03-08 for 23 n shares with per share price: 118.849998, totaling: 2733.549965
New budget is:  20611.603682854126

Bought  APA  at  2021-03-09 for 114 n shares with per share price: 21.600000, totaling: 2462.400043
New budget is:  18149.20364091163

Bought  ENPH  at  2021-03-09 for 14 n shares with per share price: 148.699997, totaling:

In [26]:
print('Total trades won', total_n_trades_won)
print('Total trades lost', total_n_trades_lost)

Total trades won 178
Total trades lost 29


## Buying at open of the next day

Currently we used the close price as buy price for our stocks.

In [30]:
import backtest

# Trade Attributes
total_n_trades = 0
total_n_trades_won = 0
total_n_trades_lost = 0
total_sum_trades_won = 0
total_sum_trades_lost = 0

total_average_gain = 0
total_average_loss = 0
total_total_profit = 0

# list of stocks that should be bought at open the next day
to_buy = []
# dict of all currently held stocks
held = {}
# dict of all buys of one stock
bought = {}
# dict of all sells of one stock
sold = {}
# dict of all profits of one day
profit_per_day = {}
total_revenue = 0

bet_percent = 0.12
budget = 30000
#bet_size = backtest.calc_bet_size(budget, bet_percent)

start = time.time()
old_date = 0

for row in cleaned_data[X_train.shape[0]+1727987:,]:
#for row in cleaned_data[-250000:,]:
#for row in test_t:
    ticker, date = row[0], row[1]
    
    if ticker not in sp500_tickers:
        continue
    
    data = row[2:-1] #2 if adj close should be incoperated
    bet_size = budget * bet_percent
    
    if ticker in to_buy:
        print("\nBought ", ticker, " at ", date, end=" ")
        n_shares, left_over_bet, budget = backtest.buy(share_price=data[4],
                                                      bet_size=bet_size, budget = budget)
        held[ticker] = (n_shares, data)
        to_buy.remove(ticker)
        print("for {} n shares with per share price: {:4f}, totaling: {:4f}".format(n_shares, data[4],
                                                                                    n_shares*data[4]))
        print("New budget is: ", budget)
        
        
    #if old_date != date:
        #print("\n###################\n     One day takes", time.time()-start)
        #print("################")
    old_date = date
    
    # Before buying, make sure ticker is not already bought
    # If already bought, check for a sell signal
    if ticker in held.keys():
        n_shares, held_data = held[ticker]

        # held_data[1] is close price
        if backtest.is_sell(held_data[1], data[1]):
            _, negative = backtest.is_sell(held_data[1], data[1]) # change two lines so you only need function call once
            revenue, profit, budget = backtest.sell(entry_ticker_close=held_data[1], cur_ticker_close=data[1],
                                           n_shares=n_shares, budget=budget, neg=negative, perfect=False)

            if negative:
                total_n_trades_lost += 1
                total_sum_trades_lost -= profit
            else:
                total_n_trades_won += 1
                total_sum_trades_won += profit

            del held[ticker]
            print("\nSold {} at {} for a profit of {:4f} returning {} as revenue".format(ticker, date,
                                                                                         profit, revenue))
            print("New budget: ", budget)
            
    
    # buy ticker
    #ToDo maybe input lag in buying, say one stock thas has been sold cant be rebought for one month
    elif backtest.is_buy(data, forest): # shouldnt hit if ticker is sold the same day
        if budget > 10000:
            to_buy.append(ticker)
    
            
print("This took {} secs".format(time.time()-start))
print("Final budget is: ", budget)

money_held = 0
for n_shares, data in held.values():
    money_held += n_shares * data[1]
print("Money still held: ", money_held)


Bought  ENPH  at  2020-09-18 for 53 n shares with per share price: 67.720001, totaling: 3589.160065
New budget is:  26410.83994

Bought  ETSY  at  2020-09-18 for 28 n shares with per share price: 110.684998, totaling: 3099.179932
New budget is:  23311.6600072

Bought  MKTX  at  2020-09-21 for 6 n shares with per share price: 434.329987, totaling: 2605.979919
New budget is:  20705.680086336

Bought  AMD  at  2020-09-21 for 33 n shares with per share price: 74.230003, totaling: 2449.590111
New budget is:  18256.08997597568

Bought  MPWR  at  2020-09-21 for 8 n shares with per share price: 248.779999, totaling: 1990.239990
New budget is:  16265.849988858597

Bought  CDW  at  2020-09-22 for 18 n shares with per share price: 108.330002, totaling: 1949.940033
New budget is:  14315.909960195566

Bought  ILMN  at  2020-09-23 for 6 n shares with per share price: 274.700012, totaling: 1648.200073
New budget is:  12667.709884972097

Bought  KLAC  at  2020-09-23 for 8 n shares with per share pric


Sold ZION at 2020-10-28 for a profit of -32.999990 returning 612.00001 as revenue
New budget:  9535.181784903973

Sold ENPH at 2020-10-30 for a profit of -32.220010 returning 588.53998 as revenue
New budget:  10123.721764903972

Bought  DISH  at  2020-11-02 for 46 n shares with per share price: 26.000000, totaling: 1196.000000
New budget is:  8927.721763115496

Bought  ULTA  at  2020-11-02 for 5 n shares with per share price: 209.000000, totaling: 1045.000000
New budget is:  7882.721761541637

Bought  CSCO  at  2020-11-02 for 26 n shares with per share price: 36.189999, totaling: 940.939964
New budget is:  6941.78180015664

Bought  VRSK  at  2020-11-02 for 4 n shares with per share price: 180.509995, totaling: 722.039978
New budget is:  6219.741824137843

Bought  DISCK  at  2020-11-02 for 40 n shares with per share price: 18.480000, totaling: 739.199982
New budget is:  5480.541845241301

Bought  NWL  at  2020-11-02 for 36 n shares with per share price: 17.799999, totaling: 640.799973



Sold CSCO at 2020-11-13 for a profit of 85.800000 returning 1019.20004 as revenue
New budget:  9422.343912564353

Bought  MRNA  at  2020-11-13 for 13 n shares with per share price: 86.620003, totaling: 1126.060036
New budget is:  8296.283873056631

Sold TMUS at 2020-11-13 for a profit of 32.111994 returning 363.281994 as revenue
New budget:  8659.565867056632

Bought  FOXA  at  2020-11-13 for 40 n shares with per share price: 25.950001, totaling: 1038.000031
New budget is:  7621.565833009836

Bought  PENN  at  2020-11-13 for 13 n shares with per share price: 65.349998, totaling: 849.549980
New budget is:  6772.015853048655

Bought  UAL  at  2020-11-13 for 21 n shares with per share price: 37.240002, totaling: 782.040035
New budget is:  5989.975820682817

Bought  DISH  at  2020-11-13 for 23 n shares with per share price: 30.469999, totaling: 700.809984
New budget is:  5289.165832200879

Bought  FOX  at  2020-11-13 for 24 n shares with per share price: 25.430000, totaling: 610.320007
Ne


Bought  CDNS  at  2020-12-10 for 19 n shares with per share price: 114.690002, totaling: 2179.110046
New budget is:  16045.187896770543

Bought  PYPL  at  2020-12-10 for 9 n shares with per share price: 208.360001, totaling: 1875.240005
New budget is:  14169.947889158078

Bought  PENN  at  2020-12-11 for 22 n shares with per share price: 75.370003, totaling: 1658.140060
New budget is:  12511.807832459108

Sold WDC at 2020-12-15 for a profit of 166.968072 returning 1817.568022 as revenue
New budget:  14329.375854459107

Bought  AAL  at  2020-12-15 for 101 n shares with per share price: 16.990000, totaling: 1715.989977
New budget is:  12613.385881924014

Sold ENPH at 2020-12-15 for a profit of 177.066048 returning 1326.096028 as revenue
New budget:  13939.481909924014

Sold STX at 2020-12-15 for a profit of 125.028072 returning 1434.188032 as revenue
New budget:  15373.669941924014

Sold ETSY at 2020-12-16 for a profit of 132.815988 returning 1370.1759780000002 as revenue
New budget:  1


Bought  LRCX  at  2021-01-11 for 3 n shares with per share price: 495.519989, totaling: 1486.559967
New budget is:  11669.688030755618

Bought  ILMN  at  2021-01-11 for 3 n shares with per share price: 379.649994, totaling: 1138.949982
New budget is:  10530.738047064944

Sold TER at 2021-01-11 for a profit of 66.383982 returning 769.943992 as revenue
New budget:  11300.682039064945

Bought  AAL  at  2021-01-11 for 91 n shares with per share price: 14.820000, totaling: 1348.619972
New budget is:  9952.06206437715

Sold MRNA at 2021-01-12 for a profit of 144.576018 returning 1398.216028 as revenue
New budget:  11350.278092377152

Sold AMAT at 2021-01-12 for a profit of 110.375994 returning 1326.556034 as revenue
New budget:  12676.83412637715

Sold ETSY at 2021-01-12 for a profit of 329.856006 returning 3062.495896 as revenue
New budget:  15739.330022377151

Bought  DISCK  at  2021-01-13 for 62 n shares with per share price: 30.070000, totaling: 1864.339981
New budget is:  13874.9900396


Sold FANG at 2021-02-04 for a profit of 157.679970 returning 1855.3799699999997 as revenue
New budget:  9144.552289131836

Sold PYPL at 2021-02-04 for a profit of 194.351952 returning 2033.8719720000001 as revenue
New budget:  11178.424261131837

Bought  VIAC  at  2021-02-04 for 26 n shares with per share price: 50.340000, totaling: 1308.840004
New budget is:  9869.584259796016

Sold WDC at 2021-02-04 for a profit of 170.687988 returning 1782.527958 as revenue
New budget:  11652.112217796017

Bought  HST  at  2021-02-04 for 97 n shares with per share price: 14.320000, totaling: 1389.039970
New budget is:  10263.072251660495

Sold PENN at 2021-02-04 for a profit of 139.775982 returning 1445.2359620000002 as revenue
New budget:  11708.308213660495

Bought  AAL  at  2021-02-05 for 79 n shares with per share price: 17.600000, totaling: 1390.400030
New budget is:  10317.908188021236

Bought  TSCO  at  2021-02-05 for 8 n shares with per share price: 146.220001, totaling: 1169.760010
New bud


Sold DISCK at 2021-03-03 for a profit of 207.300000 returning 2381.80008 as revenue
New budget:  16971.8657209363

Bought  DISCA  at  2021-03-03 for 34 n shares with per share price: 59.209999, totaling: 2013.139969
New budget is:  14958.725754423946

Sold PTC at 2021-03-04 for a profit of -83.300010 returning 885.01 as revenue
New budget:  15843.735754423946

Bought  WBA  at  2021-03-04 for 39 n shares with per share price: 47.599998, totaling: 1856.399940
New budget is:  13987.335813893073

Sold FANG at 2021-03-04 for a profit of 222.456036 returning 1720.6760259999999 as revenue
New budget:  15708.011839893072

Sold PENN at 2021-03-04 for a profit of -151.579980 returning 2395.57999 as revenue
New budget:  18103.59182989307

Bought  DISCK  at  2021-03-05 for 41 n shares with per share price: 52.279999, totaling: 2143.479950
New budget is:  15960.1118803059

Sold APA at 2021-03-05 for a profit of 231.750000 returning 2240.25 as revenue
New budget:  18200.3618803059

Sold UAL at 2021

In [31]:
print('Total trades won', total_n_trades_won)
print('Total trades lost', total_n_trades_lost)

Total trades won 170
Total trades lost 35
