<a href="https://colab.research.google.com/github/branndonm1/NRandomStockPredictor/blob/main/NRandomStockPredictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction




This program gets NUMB_RAND_TICKS number of stocks tickers randomly out of all publicly traded stocks. Then it gets historical data from LOOK_BACK_DAYS numbers of days ago until yesterday. It trains regressions models with 80% of the data. Then tests with the remaining 20%. It then takes yesterdays data and todays "open" data to predict the "high" value for today. Based off this high value prediction, it gives a summary of estimated performance for all of the randomly picked stocks. One can specify the amount of CAPITAL one would like to invest today at opening to get predicted returns once the high value is obtained.


# Define constants


In [24]:
LOOK_BACK_DAYS=90
NUMB_RAND_TICKS=10
CAPITAL = 1000

# Installing and importing packages

In [25]:
!pip install yfinance 

import numpy as np #import numpy for handy array and math stuff 
import matplotlib.pyplot as plt #for plotting data
import pandas as pd #for stats/excel type stuff
import seaborn as sns #for statistical plots
import yfinance as yf #api to get stock data from yahoo finance
import scipy #for math stuff
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet #import linear regression method from linear model packs in sklearn
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split #import train test split method from sklearn package (within pandas)
from datetime import date, timedelta #for getting dates

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Getting and prep dates

In [26]:
today = date.today()
if today.weekday() == 0: #because markets are closed on weekends
  yesterday = today-timedelta(3)
else:  
  yesterday = today-timedelta(1)

tomorrow = today+timedelta(1)
look_back_day = today-timedelta(LOOK_BACK_DAYS)

todays_date = today.strftime("%Y-%m-%d")
yesterday_date = yesterday.strftime("%Y-%m-%d")
tomorrow_date = tomorrow.strftime("%Y-%m-%d")
look_back_date = look_back_day.strftime("%Y-%m-%d")

print("We will process historical data between", look_back_date, "and", todays_date) 

We will process historical data between 2022-06-01 and 2022-08-30


# Load and prep tickers data

In [27]:
#load list of all stock tickers into data frame -- will be used for choosing tickers
tickers_list_df = pd.read_csv('tickers_data.csv') 

#get n random tickers 
symbols_list = tickers_list_df["Symbol"].sample(n=NUMB_RAND_TICKS, replace = False).tolist() 

symbols_list


['NCTY', 'OKE', 'SIRI', 'QBTS', 'GGE', 'MCR', 'REKR', 'CANG', 'IIIIW', 'HOV']

# Load and prep stock data

In [28]:
#input: list of tkrs (strings), start date (string "YYYY-MM-DD"), end date (string "YYYY-MM-DD")
#returns: 1. list of (full) data frames *one for each stock in list of tkrs from start date date to end date 
#         2. list of corresponding tickers
def get_stock_data(tkrs, start_date, stop_date):
  stock_data = [yf.download(tkrs[i], start=start_date, end=stop_date) for i in range(len(tkrs))]  #download stock data between given dates
  for i in range(len(stock_data)):
    if stock_data[i]["High"].any() == "NaN" or stock_data[i].empty: #drops empty dfs and dfs with empty cells 
      stock_data.pop(i)
      tkrs.pop(i)
  return stock_data, tkrs

In [29]:
stock_data_dflist_full, symbols_full = get_stock_data(symbols_list, look_back_date, todays_date)
stock_data_dflist_yesterday, symbols_yesterday = get_stock_data(symbols_list, yesterday_date, todays_date)
stock_data_dflist_today, symbols_today = get_stock_data(symbols_list, todays_date, tomorrow_date)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

# Training and prediction






In [30]:
#define models
lr = LinearRegression()
la = Lasso()
ri = Ridge()
en = ElasticNet()

def train_n_predict(full_df,yesterday_df,today_df):
  #data prep
  x = full_df.drop("High", axis=1) #axis=1 for col, gets rid of "price" col 
  y = full_df["High"]  # only "price" col data
  x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.2, random_state=0) #RETURNS arrays with desired splitting of data for training and testing

  #fit models with training data
  lr.fit(x_train, y_train)
  la.fit(x_train, y_train)
  ri.fit(x_train, y_train)
  en.fit(x_train, y_train)

  #given test inputs, make predictions
  lr_pred = lr.predict(x_test) 
  la_pred = la.predict(x_test)
  ri_pred = ri.predict(x_test)
  en_pred = en.predict(x_test)

  #compare prediction on test data with actual result
  print("Error metrics for each model")
  print("LinearReg mse:"+str(mean_squared_error(y_test, lr_pred)))
  print("Lasso mse:"+str(mean_squared_error(y_test, la_pred)))
  print("Ridge mse:"+str(mean_squared_error(y_test, ri_pred)))
  print("ElNet mse:"+str(mean_squared_error(y_test, en_pred)))
  print("\n")

  #prediction
  pred_df = yesterday_df.drop(["High"], axis=1)
  pred_df["Open"]=[today_df["Open"]]
  preds = [lr.predict(pred_df), la.predict(pred_df), ri.predict(pred_df), en.predict(pred_df)] 
  
  print("The high price predictions are:")
  for i in range(len(preds)):
    print(preds[i][0])
  print('\n')
  print("The open high spreads are")
  for i in range(len(preds)):
    print(preds[i][0] - today_df["Open"][0])
  print('\n')
  print("The open high percent gains are")
  for i in range(len(preds)):
    print(str(100*(preds[i][0] - today_df["Open"][0])/today_df["Open"][0])+"%")
  print('\n')
  print("Investing $"+str(CAPITAL)+" dollars and pulling out when high prediction is hit will give a profit of:")
  for i in range(len(preds)):
    print(CAPITAL*(preds[i][0] - today_df["Open"][0])/today_df["Open"][0])
  print('-------------------------------------------------------------\n')


for i in range(len(stock_data_dflist_full)):
  print("Info for ticker: ", symbols_full[i],"\n")
  train_n_predict(stock_data_dflist_full[i], stock_data_dflist_yesterday[i], stock_data_dflist_today[i])

Info for ticker:  NCTY 

Error metrics for each model
LinearReg mse:0.00802367092341272
Lasso mse:0.18495409692832548
Ridge mse:0.039055261218582674
ElNet mse:0.18495569830120162


The high price predictions are:
1.302970656092921
1.533901501646223
1.4207079954163662
1.5339008512235774


The open high spreads are
0.052970656092921065
0.28390150164622296
0.17070799541636617
0.28390085122357744


The open high percent gains are
4.237652487433685%
22.712120131697837%
13.656639633309293%
22.712068097886196%


Investing $1000 dollars and pulling out when high prediction is hit will give a profit of:
42.37652487433685
227.1212013169784
136.56639633309294
227.12068097886194
-------------------------------------------------------------

Info for ticker:  OKE 

Error metrics for each model
LinearReg mse:0.32065846566991907
Lasso mse:0.7799031041912934
Ridge mse:0.2878855014781217
ElNet mse:0.24645891713015788


The high price predictions are:
64.28153953460564
65.20791170226022
64.4781129748359