<a href="https://colab.research.google.com/github/adrnsrf/Stock-Market-Machine-Learning-Project/blob/main/Main_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is the goal of this model?

This model uses machine learning to help predict small increases in the market. It looks at a particular group of stocks that move similarly and presents a binary question each day:

**Based on the data of the past 60 days, and due to the conditions up to this TIME in the day for this group of stocks, is there a certain PERCENT INCREASE between now and the end of the day?**

Example of "conditions" the model looks at are simple things like: yesterdays close, yesterdays open, todays open, the slope of the price at 12pm, the current amplitude of price up until 12, etc. 

# Imports

Import most of the relevant python packages required 

In [1]:
!pip install yfinance --upgrade --no-cache-dir

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance
  Downloading yfinance-0.1.79-py2.py3-none-any.whl (29 kB)
Collecting requests>=2.26
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 11.3 MB/s 
Installing collected packages: requests, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
Successfully installed requests-2.28.1 yfinance-0.1.79


In [2]:
# Imports (model imports come later)
!pip install yfinance
import pandas as pd
import yfinance as yf
import pandas_datareader as pdr
from datetime import datetime, timedelta
import datetime as dt
from datetime import date
import numpy as np
import matplotlib.pyplot as plt
import itertools as it
import time
import statistics
from scipy.stats import norm
from collections import defaultdict
import math

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Define stockdata function to collect and manipulate data

This is a function that intakes arguemnts of stock symbol (ticker), number of days (period).

This function collects the 5 minute historical data from yahoo, and adds all columns (features) I wanted to use to in my model. It also adds all the columns I want as targets, a binary column indicating if there was a 0.5% increase, 0.7% increase, etc.

Note that most comments in the following code block are sections for added columns. Some columns are for subsetting, some are feature values, and some are target values.

In [3]:
def stockdata(ticker, period):

  # Inputs
  ticker = ticker
  period = period
  interval = '5m'
  rn = 6 # rounding number

  # Pull data from Yahoo Finance, separate datetime into date, day, time columns
  # This will help with subsetting
  data = yf.download(ticker, period=period, interval=interval)
  data.reset_index(level=0, inplace=True)

  thedays = ['Mon' , 'Tue' , 'Wed' , 'Thu' , 'Fri' , 'Sat' , 'Sun']

  data['Ticker'] = ticker
  data['Date'] = (data['Datetime'].dt.date).astype(str)
  data['Day'] = (data['Datetime'].dt.weekday).apply(lambda x: thedays[x]).astype(str)
  data['Day_num'] = data['Datetime'].dt.weekday
  data['Mon?'] = data['Day_num'].apply(lambda x: 1 if x == 0 else 0)
  data['Fri?'] = data['Day_num'].apply(lambda x: 1 if x == 4 else 0)
  data['Time'] = (data['Datetime'].dt.time).astype(str)

  # Get list of valid stock trading dates
  date_end = dt.datetime.strptime(data.loc[data.index[-1], 'Date'], '%Y-%m-%d') + dt.timedelta(days=2)
  date_start = date_end - dt.timedelta(days=100)
  date_data = yf.download(ticker, start=date_start, end=date_end)
  date_data.reset_index(level=0, inplace=True)
  date_data['Date_str'] = date_data['Date'].dt.strftime('%Y-%m-%d')
  datelist = date_data['Date_str'].tolist()

  # Define function to add/subtract dates that are in the datelist
  def date_by_adding_business_days(from_date, add_days):
    date_pos = datelist.index(from_date.strftime('%Y-%m-%d'))
    number = date_pos + add_days
    date_str = datelist[number]
    return date_str

  def date_by_subtracting_business_days(from_date, sub_days):
    date_pos = datelist.index(from_date.strftime('%Y-%m-%d'))
    number = date_pos - sub_days
    date_str = datelist[number]
    return date_str

  # Day Change Index: collect locations where the day changes into a list
  # This will help us with populating new columns by indexing locations of date change
  dci = [] 
  for x in range(1,len(data)):
    if data['Date'][x] == data['Date'][x-1]:
        pass
    else:
        dci.append(x)

  #Volume Moving Average: smooth out the values of trading volume
  column = 'Volume'
  for i in range(0,5):
    data.loc[data.index[i],'VMA'] = data.loc[data.index[i],column]
  for i in range(5,dci[0]-5):
    data.loc[data.index[i],'VMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),0)
  for i in range(dci[0]-5,dci[0]):
      data.loc[data.index[i],'VMA'] = data.loc[data.index[i],column]

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],'VMA'] = data.loc[data.index[i],column]
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],'VMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),0)
    for i in range(dci[x+1]-5,dci[x+1]+5):
        data.loc[data.index[i],'VMA'] = data.loc[data.index[i],column]

  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],'VMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),0)
  for i in range(len(data)-5,len(data)):
        data.loc[data.index[i],'VMA'] = data.loc[data.index[i],column]

  # Interval: collect time interval throughout day. 
  # This will help with graphing different days on the same plot
  data.loc[data.index[0],'Interval'] = 1
  for i in range(1,len(data)):
    if data['Date'][i-1] == data['Date'][i]:
        data.loc[data.index[i],'Interval'] = data.loc[data.index[i-1],'Interval'] + 1
    else:
        data.loc[data.index[i],'Interval'] = 1
 
  # Previous Day Close: create column where every row displays the previous days close 
  col_name = 'PDC'
  previous_how_many_days = 0
  days1 = previous_how_many_days +1
  today_date = dt.datetime.strptime(data.loc[data.index[0], 'Date'], '%Y-%m-%d')
  start_date = date_by_subtracting_business_days(today_date, days1)
  close = date_data[date_data['Date_str'] == start_date]['Adj Close'].iloc[0]
  data.loc[data.index[0],col_name] = close
  for i in range(1,len(data)):
    if i in dci:
        today_date = dt.datetime.strptime(data.loc[data.index[i], 'Date'], '%Y-%m-%d')
        start_date = date_by_subtracting_business_days(today_date, days1)
        close = date_data[date_data['Date_str'] == start_date]['Adj Close'].iloc[0]
        data.loc[data.index[i],col_name] = close
    else:
        data.loc[data.index[i],col_name] = data.loc[data.index[i-1],col_name]
    
  # Daily Percentage: create running list of percent of current from previous day close
  data.loc[data.index[0],'Daily Percent'] = (((data['Open'][0] / data['PDC'][0]) - 1)*100).round(2)
  for i in range(1,len(data)):
    if data['Date'][i-1] != data['Date'][i]:
        data.loc[data.index[i],'Daily Percent'] = (((data['Open'][i] / data['PDC'][i]) - 1)*100).round(2)
    else:
        data.loc[data.index[i],'Daily Percent'] = (((data['Open'][i] / data['PDC'][i]) - 1)*100).round(2)
            
  #Daily Percentage Moving Average: smooth out the values of Daily Percent
  column = 'Daily Percent'
  for i in range(0,5):
    data.loc[data.index[i],'DPMA'] = data.loc[data.index[i],column]
  for i in range(5,dci[0]-5):
    data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
  for i in range(dci[0]-5,dci[0]):
      data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] )
                                                  /6),rn)
    
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
    

    for i in range(dci[x+1]-5,dci[x+1]):
        data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)
        
    for i in range(dci[x+1],dci[x+1]+5):
        data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] )
                                                  /6),rn)
      
    
  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
    

  for i in range(len(data)-6,len(data)): ## THIS WAS CHANGED TO 6 INSTEAD OF 5 BECAUSE END OF LAST DAY SHOWS 16:00:00
        data.loc[data.index[i],'DPMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)

  #Daily Percentage Moving Average trailing
  column = 'Daily Percent'
  newcolumn = 'DPMA_t'
  for i in range(0,5):
    data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]
  for i in range(5,dci[0]):
    data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]
    
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)
    

    for i in range(dci[x+1]-5,dci[x+1]):
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)
        
    for i in range(dci[x+1],dci[x+1]+5):
        data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]

    
    
  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)
    
  for i in range(len(data)-6,len(data)): ## THIS WAS CHANGED TO 6 INSTEAD OF 5 BECAUSE END OF LAST DAY SHOWS 16:00:00
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)

  # Daily Percentage Moving Average Shifted: shift daily percent moving average to start at 0
  column = 'DPMA'

  for i in range(0,dci[0]):
        data.loc[data.index[i],'DPMAS'] = data.loc[data.index[i], column] - data.loc[dci[0], column]

  for x in range(0,len(dci)-1):

    for i in range(dci[x],dci[x+1]):
        data.loc[data.index[i],'DPMAS'] = data.loc[data.index[i], column] - data.loc[dci[x], column]        
  for i in range(dci[-1],len(data)):
        data.loc[data.index[i],'DPMAS'] = data.loc[data.index[i], column] - data.loc[dci[-1], column]    
        
  # path: show intervals of increasing and decreasing from DPMA
  for i in range(0,dci[0]-2):
        data.loc[0,'path'] = 0
        if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i+1],'DPMA']
            +data.loc[data.index[i+2],'DPMA'])/3 > data.loc[data.index[i-1],'DPMA']:
            data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']+0.1
        else:
            data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']-0.1

  for i in range(dci[0]-2,dci[0]):
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i-1],'DPMA']
        +data.loc[data.index[i-2],'DPMA'])/3 < data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']+0.1
    else:
        data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']-0.1

  for x in range(0,len(dci)-1):

    for i in range(dci[x],dci[x+1]-2):
        data.loc[dci[x],'path'] = 0
        if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i+1],'DPMA']
            +data.loc[data.index[i+2],'DPMA'])/3 > data.loc[data.index[i-1],'DPMA']:
            data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']+0.1
        else:
            data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']-0.1

    for i in range(dci[x+1]-2,dci[x+1]):
        if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i-1],'DPMA']
            +data.loc[data.index[i-2],'DPMA'])/3 < data.loc[data.index[i-1],'DPMA']:
            data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']+0.1
        else:
            data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']-0.1

  for i in range(dci[-1],len(data)-2):
    data.loc[dci[-1],'path'] = 0
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i+1],'DPMA']
        +data.loc[data.index[i+2],'DPMA'])/3 > data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']+0.1
    else:
        data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']-0.1

  for i in range(len(data)-2,len(data)):
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i-1],'DPMA']
        +data.loc[data.index[i-2],'DPMA'])/3 < data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']+0.1
    else:
        data.loc[data.index[i],'path'] = data.loc[data.index[i-1],'path']-0.1
    
  # bipath: show binary intervals of increasing and decreasing
  for i in range(0,dci[0]-2):
    data.loc[0,'bipath'] = 0
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i+1],'DPMA']
        +data.loc[data.index[i+2],'DPMA'])/3 > data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'bipath'] = 1
    else:
        data.loc[data.index[i],'bipath'] = 0

  for i in range(dci[0]-2,dci[0]):
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i-1],'DPMA']
          +data.loc[data.index[i-2],'DPMA'])/3 < data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'bipath'] = 1
    else:
        data.loc[data.index[i],'bipath'] = 0

  for x in range(0,len(dci)-1):

    for i in range(dci[x],dci[x+1]-2):
        data.loc[dci[x],'bipath'] = 0
        if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i+1],'DPMA']
            +data.loc[data.index[i+2],'DPMA'])/3 > data.loc[data.index[i-1],'DPMA']:
            data.loc[data.index[i],'bipath'] = 1
        else:
            data.loc[data.index[i],'bipath'] = 0

    for i in range(dci[x+1]-2,dci[x+1]):
        if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i-1],'DPMA']
            +data.loc[data.index[i-2],'DPMA'])/3 < data.loc[data.index[i-1],'DPMA']:
            data.loc[data.index[i],'bipath'] = 1
        else:
            data.loc[data.index[i],'bipath'] = 0

  for i in range(dci[-1],len(data)-2):
    data.loc[dci[-1],'bipath'] = 0
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i+1],'DPMA']
        +data.loc[data.index[i+2],'DPMA'])/3 > data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'bipath'] = 1
    else:
        data.loc[data.index[i],'bipath'] = 0

  for i in range(len(data)-2,len(data)):
    if (data.loc[data.index[i],'DPMA']+data.loc[data.index[i-1],'DPMA']
        +data.loc[data.index[i-2],'DPMA'])/3 < data.loc[data.index[i-1],'DPMA']:
        data.loc[data.index[i],'bipath'] = 1
    else:
        data.loc[data.index[i],'bipath'] = 0

  # Slope: show slope of DPMA 
  data.loc[0,'Slope'] = 0
  for i in range(1,len(data)):
    if i in dci:
        data.loc[data.index[i],'Slope'] = 0
    else:
        data.loc[data.index[i],'Slope'] = data['DPMA'][i].round(rn) - data['DPMA'][i-1].round(rn)
        
  # Slope trailing: show slope of DPMA_t
  data.loc[0,'Slope_t'] = 0
  for i in range(1,len(data)):
    if i in dci:
        data.loc[data.index[i],'Slope_t'] = 0
    else:
        data.loc[data.index[i],'Slope_t'] = data['DPMA_t'][i].round(rn) - data['DPMA_t'][i-1].round(rn)

  #Slope Moving Average: smooth out the data of Slope
  column = 'Slope'
  for i in range(0,5):
    data.loc[data.index[i],'SMA'] = data.loc[data.index[i],column]
  for i in range(5,dci[0]-5):
    data.loc[data.index[i],'SMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
  for i in range(dci[0]-5,dci[0]):
      data.loc[data.index[i],'SMA'] = data.loc[data.index[i],column]

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],'SMA'] = data.loc[data.index[i],column]
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],'SMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
    for i in range(dci[x+1]-5,dci[x+1]+5):
        data.loc[data.index[i],'SMA'] = data.loc[data.index[i],column]

  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],'SMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
  for i in range(len(data)-5,len(data)):
        data.loc[data.index[i],'SMA'] = data.loc[data.index[i],column]

  #Slope Moving Average trailing: smooth out the values of trailing slope
  column = 'Slope_t'
  newcolumn = 'SMA_t'

  for i in range(0,5):
    data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]
  for i in range(5,dci[0]):
    data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]
    
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)
    
    for i in range(dci[x+1]-5,dci[x+1]):
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)
        
    for i in range(dci[x+1],dci[x+1]+5):
        data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]

    
    
  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)
    
  for i in range(len(data)-6,len(data)): ## THIS WAS CHANGED TO 6 INSTEAD OF 5 BECAUSE END OF LAST DAY SHOWS 16:00:00
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)

        
  # Conc (Concavity): show slope of Slope  
  data.loc[0,'Conc'] = 0
  for i in range(1,len(data)):
    if i in dci:
        data.loc[data.index[i],'Conc'] = 0
    else:
        data.loc[data.index[i],'Conc'] = data['SMA'][i].round(rn) - data['SMA'][i-1].round(rn)

  # Conc trailing: show slope of Slope_t
  data.loc[0,'Conc_t'] = 0
  for i in range(1,len(data)):
    if i in dci:
        data.loc[data.index[i],'Conc_t'] = 0
    else:
        data.loc[data.index[i],'Conc_t'] = data['Slope_t'][i].round(rn) - data['Slope_t'][i-1].round(rn)

  #Conc Moving Average: smooth out the data of Conc
  column = 'Conc'
  for i in range(0,5):
    data.loc[data.index[i],'CMA'] = data.loc[data.index[i],column]
  for i in range(5,dci[0]-5):
    data.loc[data.index[i],'CMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
  for i in range(dci[0]-5,dci[0]):
      data.loc[data.index[i],'CMA'] = data.loc[data.index[i],column]

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],'CMA'] = data.loc[data.index[i],column]
        
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],'CMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
    for i in range(dci[x+1]-5,dci[x+1]+5):
        data.loc[data.index[i],'CMA'] = data.loc[data.index[i],column]

  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],'CMA'] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i+1],column] 
                                                  + data.loc[data.index[i+2],column]
                                                  + data.loc[data.index[i+3],column] 
                                                  + data.loc[data.index[i+4],column]
                                                  + data.loc[data.index[i+5],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /11),rn)
  for i in range(len(data)-5,len(data)):
        data.loc[data.index[i],'CMA'] = data.loc[data.index[i],column]                

  #Conc Moving Average trailing: smooth out the data of SMA
  column = 'Conc_t'
  newcolumn = 'CMA_t'

  for i in range(0,5):
    data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]
  for i in range(5,dci[0]):
    data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)

  for i in range(dci[0],dci[0]+5):
    data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]
    
  for x in range(0,len(dci)-1):
    for i in range(dci[x]+5,dci[x+1]-5):
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)
    

    for i in range(dci[x+1]-5,dci[x+1]):
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)
        
    for i in range(dci[x+1],dci[x+1]+5):
        data.loc[data.index[i],newcolumn] = data.loc[data.index[i],column]

    
    
  for i in range(dci[-1]+5, len(data)-5):
    data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column] 
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column])
                                                  /6),rn)
    
  for i in range(len(data)-6,len(data)): ## THIS WAS CHANGED TO 6 INSTEAD OF 5 BECAUSE END OF LAST DAY SHOWS 16:00:00
        data.loc[data.index[i],newcolumn] = np.round(((  data.loc[data.index[i],column]
                                                  + data.loc[data.index[i-1],column] 
                                                  + data.loc[data.index[i-2],column]
                                                  + data.loc[data.index[i-3],column] 
                                                  + data.loc[data.index[i-4],column]
                                                  + data.loc[data.index[i-5],column] )
                                                  /6),rn)

  # RELATIVE MIN: binary to determine if each time increment represents a relative minimum of price
  for i in range(1,len(data)-2):
    if i in dci:
        data.loc[data.index[i],'rel_min'] = 0
    else:
        if data.loc[data.index[i-1],'SMA'] <= 0 and data.loc[data.index[i],'SMA'] <= 0 and data.loc[data.index[i+1],'SMA'] > 0 and data.loc[data.index[i+2],'SMA'] > 0:
            data.loc[data.index[i],'rel_min'] = 1
        else:
            data.loc[data.index[i],'rel_min'] = 0

  # SUM RELATIVE MINS: accumulation of number of relative mins up until each time increment
  data.loc[data.index[0],'num_mins'] = 0
  for i in range(0,len(data)):
    if i in dci:
        data.loc[data.index[i],'num_mins'] = 0
    elif data.loc[data.index[i],'rel_min'] == 0:
        data.loc[data.index[i],'num_mins'] = data.loc[data.index[i-1],'num_mins']
    elif data.loc[data.index[i],'rel_min'] == 1:
        data.loc[data.index[i],'num_mins'] = data.loc[data.index[i-1],'num_mins'] + 1
    else:
        continue


  # RELATIVE MAX: binary to determine if each time increment represents a relative maximum of price
  for i in range(0,len(data)-2): # note minus 1 because last day shows time 16:00:00
    if i in dci:
        data.loc[data.index[i],'rel_max'] = 0
    else:
        if data.loc[data.index[i-1],'SMA'] >= 0 and data.loc[data.index[i],'SMA'] >= 0 and data.loc[data.index[i+1],'SMA'] < 0 and data.loc[data.index[i+2],'SMA'] < 0:
            data.loc[data.index[i],'rel_max'] = 1
        else:
            data.loc[data.index[i],'rel_max'] = 0

  # SUM RELATIVE MAX: accumulation of number of relative maxes up until each time increment
  data.loc[data.index[0],'num_max'] = 0
  for i in range(1,len(data)):
    if i in dci:
        data.loc[data.index[i],'num_max'] = 0
    elif data.loc[data.index[i],'rel_max'] == 0:
        data.loc[data.index[i],'num_max'] = data.loc[data.index[i-1],'num_max']
    elif data.loc[data.index[i],'rel_max'] == 1:
        data.loc[data.index[i],'num_max'] = data.loc[data.index[i-1],'num_max'] + 1
    else:
        continue            

  # Rise first 5 min: did the stock rise in the first 5 minutes of the day?
  for i in range(0,dci[0]):
    if data.loc[data.index[1],'DPMA'] > data.loc[data.index[0],'DPMA']:
      data.loc[data.index[0:dci[0]],'rise_5'] = 1
    else:
      data.loc[data.index[0:dci[0]],'rise_5'] = 0

  for i in range(0,len(dci)-1):
    if data.loc[data.index[dci[i]+1],'DPMA'] > data.loc[data.index[dci[i]],'DPMA']:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_5'] = 1
    else:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_5'] = 0

  for i in range(dci[-1],len(data)):
    if data.loc[data.index[dci[-1]+1],'DPMA'] > data.loc[data.index[dci[-1]],'DPMA']:
      data.loc[data.index[dci[-1]:len(data)],'rise_5'] = 1
    else:
      data.loc[data.index[dci[-1]:len(data)],'rise_5'] = 0

  # Rise first 10 min: did the stock rise in the first 10 minutes of the day?
  for i in range(0,dci[0]):
    if data.loc[data.index[2],'DPMA'] > data.loc[data.index[0],'DPMA']:
      data.loc[data.index[0:dci[0]],'rise_10'] = 1
    else:
      data.loc[data.index[0:dci[0]],'rise_10'] = 0

  for i in range(0,len(dci)-1):
    if data.loc[data.index[dci[i]+2],'DPMA'] > data.loc[data.index[dci[i]],'DPMA']:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_10'] = 1
    else:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_10'] = 0

  for i in range(dci[-1],len(data)):
    if data.loc[data.index[dci[-1]+2],'DPMA'] > data.loc[data.index[dci[-1]],'DPMA']:
      data.loc[data.index[dci[-1]:len(data)],'rise_10'] = 1
    else:
      data.loc[data.index[dci[-1]:len(data)],'rise_10'] = 0

  # Rise first 15 min: did the stock rise in the first 15 minutes of the day?
  for i in range(0,dci[0]):
    if data.loc[data.index[3],'DPMA'] > data.loc[data.index[0],'DPMA']:
      data.loc[data.index[0:dci[0]],'rise_15'] = 1
    else:
      data.loc[data.index[0:dci[0]],'rise_15'] = 0

  for i in range(0,len(dci)-1):
    if data.loc[data.index[dci[i]+3],'DPMA'] > data.loc[data.index[dci[i]],'DPMA']:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_15'] = 1
    else:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_15'] = 0

  for i in range(dci[-1],len(data)):
    if data.loc[data.index[dci[-1]+3],'DPMA'] > data.loc[data.index[dci[-1]],'DPMA']:
      data.loc[data.index[dci[-1]:len(data)],'rise_15'] = 1
    else:
      data.loc[data.index[dci[-1]:len(data)],'rise_15'] = 0

  # Rise first 30 min: did the stock rise in the first 30 minutes of the day?
  for i in range(0,dci[0]):
    if data.loc[data.index[6],'DPMA'] > data.loc[data.index[0],'DPMA']:
      data.loc[data.index[0:dci[0]],'rise_30'] = 1
    else:
      data.loc[data.index[0:dci[0]],'rise_30'] = 0

  for i in range(0,len(dci)-1):
    if data.loc[data.index[dci[i]+6],'DPMA'] > data.loc[data.index[dci[i]],'DPMA']:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_30'] = 1
    else:
      data.loc[data.index[dci[i]:dci[i+1]],'rise_30'] = 0

  for i in range(dci[-1],len(data)):
    if data.loc[data.index[dci[-1]+6],'DPMA'] > data.loc[data.index[dci[-1]],'DPMA']:
      data.loc[data.index[dci[-1]:len(data)],'rise_30'] = 1
    else:
      data.loc[data.index[dci[-1]:len(data)],'rise_30'] = 0

  # Previous Percentage Close: create column where every row displays the previous percentage close
  col_name = 'PPC'
  previous_how_many_days = 1
  days1 = previous_how_many_days +1
  today_date = dt.datetime.strptime(data.loc[data.index[0], 'Date'], '%Y-%m-%d')
  start_date1 = date_by_subtracting_business_days(today_date, days1)
  start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
  close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
  close2 = date_data[date_data['Date_str'] == start_date2]['Adj Close'].iloc[0]
  data.loc[data.index[0],col_name] = (1-(close1 / close2))*100
  for i in range(1,len(data)):
    if i in dci:
        today_date = dt.datetime.strptime(data.loc[data.index[i], 'Date'], '%Y-%m-%d')
        start_date1 = date_by_subtracting_business_days(today_date, days1)
        start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
        close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
        close2 = date_data[date_data['Date_str'] == start_date2]['Adj Close'].iloc[0]
        data.loc[data.index[i],col_name] = (1 - (close1 / close2))*100
    else:
        data.loc[data.index[i],col_name] = data.loc[data.index[i-1],col_name]

  # Previous 2 Percentage Close: create column where every row displays the previous percentage close
  col_name = 'P2PC'
  previous_how_many_days = 2
  days1 = previous_how_many_days +1
  today_date = dt.datetime.strptime(data.loc[data.index[0], 'Date'], '%Y-%m-%d')
  start_date1 = date_by_subtracting_business_days(today_date, days1)
  start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
  close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
  close2 = date_data[date_data['Date_str'] == start_date2]['Adj Close'].iloc[0]
  data.loc[data.index[0],col_name] = (1 - (close1 / close2))*100
  for i in range(1,len(data)):
    if i in dci:
        today_date = dt.datetime.strptime(data.loc[data.index[i], 'Date'], '%Y-%m-%d')
        start_date1 = date_by_subtracting_business_days(today_date, days1)
        start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
        close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
        close2 = date_data[date_data['Date_str'] == start_date2]['Adj Close'].iloc[0]
        data.loc[data.index[i],col_name] = (1 - (close1 / close2))*100
    else:
        data.loc[data.index[i],col_name] = data.loc[data.index[i-1],col_name] 
     
  # Today Percentage Open: create column where every row displays the previous percentage open
  for i in range(0,dci[0]):
    data.loc[data.index[i],'TPO'] = data['Daily Percent'][0].round(2)
    
  for x in range(0,len(dci)-1):
    for y in range(dci[x],dci[x+1]):
        data.loc[data.index[y],'TPO'] = data['Daily Percent'][dci[x]+0].round(2)
        
  for i in range(dci[-1],len(data)):
    data.loc[data.index[i],'TPO'] = data['Daily Percent'][dci[-1]+0].round(2)
    
  # Previous Percentage Open: create column where every row displays the previous percentage open
  col_name = 'PPO'
  previous_how_many_days = 1
  days1 = previous_how_many_days +1
  today_date = dt.datetime.strptime(data.loc[data.index[0], 'Date'], '%Y-%m-%d')
  start_date1 = date_by_subtracting_business_days(today_date, days1)
  start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
  close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
  open2 = date_data[date_data['Date_str'] == start_date2]['Open'].iloc[0]
  data.loc[data.index[0],col_name] = (1 - (close1 / open2))*100
  for i in range(1,len(data)):
    if i in dci:
        today_date = dt.datetime.strptime(data.loc[data.index[i], 'Date'], '%Y-%m-%d')
        start_date1 = date_by_subtracting_business_days(today_date, days1)
        start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
        close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
        open2 = date_data[date_data['Date_str'] == start_date2]['Open'].iloc[0]
        data.loc[data.index[i],col_name] = (1 - (close1 / open2))*100
    else:
        data.loc[data.index[i],col_name] = data.loc[data.index[i-1],col_name]

  # Previous 2 Percentage Open: create column where every row displays the previous percentage open
  col_name = 'P2PO'
  previous_how_many_days = 2
  days1 = previous_how_many_days +1
  today_date = dt.datetime.strptime(data.loc[data.index[0], 'Date'], '%Y-%m-%d')
  start_date1 = date_by_subtracting_business_days(today_date, days1)
  start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
  close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
  open2 = date_data[date_data['Date_str'] == start_date2]['Open'].iloc[0]
  data.loc[data.index[0],col_name] = (1 - (close1 / open2))*100
  for i in range(1,len(data)):
    if i in dci:
        today_date = dt.datetime.strptime(data.loc[data.index[i], 'Date'], '%Y-%m-%d')
        start_date1 = date_by_subtracting_business_days(today_date, days1)
        start_date2 = date_by_subtracting_business_days(today_date, (days1-1))
        close1 = date_data[date_data['Date_str'] == start_date1]['Adj Close'].iloc[0]
        open2 = date_data[date_data['Date_str'] == start_date2]['Open'].iloc[0]
        data.loc[data.index[i],col_name] = (1 - (close1 / open2))*100
    else:
        data.loc[data.index[i],col_name] = data.loc[data.index[i-1],col_name]
 
  # Current Min of day: min price percentage up until time increment
  daylist = []
  daylist.append(data.loc[data.index[0],'Daily Percent'])
  data.loc[data.index[0],'day_min'] = data.loc[data.index[0],'Daily Percent']
  for i in range(1,len(data)-1):
    if data.loc[data.index[i],'Date'] == data.loc[data.index[i-1],'Date']:
        daylist.append(data.loc[data.index[i],'Daily Percent'])
        data.loc[data.index[i],'day_min'] = min(daylist)
    else:
        daylist = []
        daylist.append(data.loc[data.index[i],'Daily Percent'])
        data.loc[data.index[i],'day_min'] = min(daylist)

        
  # Current Max of day: max price percentage up until time increment
  daylist = []
  daylist.append(data.loc[data.index[0],'Daily Percent'])
  data.loc[data.index[0],'day_max'] = data.loc[data.index[0],'Daily Percent']
  for i in range(1,len(data)-1):
    if data.loc[data.index[i],'Date'] == data.loc[data.index[i-1],'Date']:
        daylist.append(data.loc[data.index[i],'Daily Percent'])
        data.loc[data.index[i],'day_max'] = max(daylist)
    else:
        daylist = []
        daylist.append(data.loc[data.index[i],'Daily Percent'])
        data.loc[data.index[i],'day_max'] = max(daylist)

  # Current Percent Between Min and Max: relative to the amplitude of the day, where does the current price fall? lower 10%? upper 15%?
  data['day_pw'] = (data['Daily Percent'] - data['day_min']) / (data['day_max'] - data['day_min'])

  # Current Percent Between Min and Max, moving average, trailing
  data['day_pwma_t'] = (data['DPMA_t'] - data['day_min']) / (data['day_max'] - data['day_min'])
        
  # Current Amplitude
  for i in range(0,len(data)):
    data.loc[data.index[i],'day_amp'] = data.loc[data.index[i],'day_max'] - data.loc[data.index[i],'day_min']



  ## END OF FEATURE COLUMNS
  ## START OF TARGET COLUMNS



  # 0.3p_inc TARGET VARIABLE: IS THERE 0.3% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.3p_inc'
  p = 0.3

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0
        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 0.4p_inc TARGET VARIABLE: IS THERE 0.4% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.4p_inc'
  p = 0.4

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 0.5p_inc TARGET VARIABLE: IS THERE 0.5% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.5p_inc'
  p = 0.5

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0    

  # 0.6p_inc TARGET VARIABLE: IS THERE 0.6% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.6p_inc'
  p = 0.6

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 0.7p_inc TARGET VARIABLE: IS THERE 0.7% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.7p_inc'
  p = 0.7

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 0.75p_inc TARGET VARIABLE: IS THERE 0.75% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.75p_inc'
  p = 0.75

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 0.8p_inc TARGET VARIABLE: IS THERE 0.8% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.8p_inc'
  p = 0.8

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 0.9p_inc TARGET VARIABLE: IS THERE 0.9% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '0.9p_inc'
  p = 0.9

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 1p_inc TARGET VARIABLE: IS THERE 1% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '1p_inc'
  p = 1

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 1.25p_inc TARGET VARIABLE: IS THERE 1.25% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '1.25p_inc'
  p = 1.25

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 1.5p_inc TARGET VARIABLE: IS THERE 1.5% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '1.5p_inc'
  p = 1.5

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 1.75p_inc TARGET VARIABLE: IS THERE 1.75% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '1.75p_inc'
  p = 1.75

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # 2p_inc TARGET VARIABLE: IS THERE 2% INCREASE BETWEEN TIME AND REST OF DAY    
  col = '2p_inc'
  p = 2

  daylist = []
  for i in range(0,dci[0]):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(0,dci[0]):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

        
  for x in range(0,len(dci)-1):
    daylist = []
    for y in range(dci[x],dci[x+1]):
        daylist.append(data.loc[data.index[y],'Daily Percent'])
        
    for y in range(dci[x],dci[x+1]):
        n = int(data.loc[data.index[y],'Interval'] - 1)
        if max(daylist[n:]) >= (data.loc[data.index[y],'Daily Percent'] + p):
            data.loc[data.index[y],col] = 1
        else:
            data.loc[data.index[y],col] = 0

  daylist = []        
  for i in range(dci[-1],len(data)):
    daylist.append(data.loc[data.index[i],'Daily Percent'])

  for i in range(dci[-1],len(data)):
    n = int(data.loc[data.index[i],'Interval'] - 1)
    if max(daylist[n:]) >= (data.loc[data.index[i],'Daily Percent'] + p):
        data.loc[data.index[i],col] = 1
    else:
        data.loc[data.index[i],col] = 0

  # Return DataFrame
  pd.set_option('display.max_rows',500)
  pd.set_option('display.max_columns',500)
  return(data)
  

# Select stocklist
I have created lists of strategically grouped stocks to be used. If you are using this notebook for trading, it is recommended to use one of the groups listed below, but feel free to run experiments on whichever stocks you like. 

Make sure you run this cell below at least 10 minutes before the time you want a prediction. The program needs time to collect the 5 minute data from the past 60 days for each stock and perform all the transformations. You only have to run this cell once per day.

In [4]:
# Define stocklist
thetime1 = time.time()

period = '60d'

stocklist = ["AAPL" , "GOOG" , "AMZN" , "MSFT" , "NVDA" ,"META"]
#stocklist = ["BAC" , "JPM" , "WFC" , "GS" , "C" , "MS", "COF"]
#stocklist = ["TSLA" , "NIO" , "XPEV" , "LCID" , "RIVN" , "LI"]
#stocklist = ['LUV' , 'UAL' , 'AAL' , 'ALK' , 'JBLU' ,'HA']

# Loop through stocklist and apply our "stockdata" function to create the dataframe df
df = stockdata( (stocklist[0]) , period)
for i in range(1,len(stocklist)):
    b = stockdata( (stocklist[i]) , period)
    df = pd.concat([df,b], axis=0)

thetime2 = time.time()
time_elapsed = round(thetime2-thetime1,0)
print(f"CODE RUNTIME: {time_elapsed} seconds for period of {period}")

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
CODE RUNTIME: 429.0 seconds for period of 60d


# Select the Stocktime
If you are using this to trade, it is recommended to leave the time at 12pm. Note that the time is in Eastern Standard Time. 

If you want a prediction for 12:00pm, you have to run this cell around 12:02pm, since there is a delay in the Yahoo Finance API. If you get errors in future cells, check to make sure df2 includes all the stocks in your stocklist. Sometimes the Yahoo API has trouble getting stock info, or just needs extra time. In that case, you can either run this cell again, or remove the missing stock from your stocklist.

In [5]:
# Get todays data
stocktime = '12:00:00'

df2 = stockdata( (stocklist[0]) , '2d')
for i in range(1,len(stocklist)):
    c = stockdata( (stocklist[i]) , '2d')
    df2 = pd.concat([df2,c], axis=0)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


# Select the Target Increase
Choose a target percentage increase. It is highly highly recommended to only use 0.5% here. Some targets dont have enough data to properly train a machine learning model. Each stocktime and stocklist has an appropriate target that MUST NOT BE EXCEEDED. Otherwise that models predictions are completely worthless. 

In [6]:
# Define Target and Time
target_num = 0.5

# Separate df and df2 into df_no_today and df_today
latest_date = df2['Date'].max()
df_no_today = df[df['Date']!=latest_date]
df_today = df2[df2['Date']==latest_date]

# Set data to the stocktime specified
df_no_today = df_no_today[df_no_today['Time'] == stocktime]
df_today = df_today[df_today['Time'] == stocktime]


In [7]:
# Check the first and last sample of df_no_today, too big to look at all
# Just to make sure the first date is around 60 days ago and last date is the previous days date
df_no_today.iloc[[0,-1]]

Unnamed: 0,Datetime,Open,High,Low,Close,Adj Close,Volume,Ticker,Date,Day,Day_num,Mon?,Fri?,Time,VMA,Interval,PDC,Daily Percent,DPMA,DPMA_t,DPMAS,path,bipath,Slope,Slope_t,SMA,SMA_t,Conc,Conc_t,CMA,CMA_t,rel_min,num_mins,rel_max,num_max,rise_5,rise_10,rise_15,rise_30,PPC,P2PC,TPO,PPO,P2PO,day_min,day_max,day_pw,day_pwma_t,day_amp,0.3p_inc,0.4p_inc,0.5p_inc,0.6p_inc,0.7p_inc,0.75p_inc,0.8p_inc,0.9p_inc,1p_inc,1.25p_inc,1.5p_inc,1.75p_inc,2p_inc
30,2022-07-29 12:00:00-04:00,162.139999,162.340393,162.070007,162.149994,162.149994,689017,AAPL,2022-07-29,Fri,4,0,1,12:00:00,701550.0,31.0,157.131744,3.19,2.979091,2.88,2.979091,-0.2,1.0,0.06,0.125,0.024876,-0.023333,0.002892,0.048333,0.008122,0.046389,0.0,2.0,0.0,1.0,0.0,1.0,1.0,1.0,0.355902,3.31015,2.61,0.259578,0.7801,1.73,4.07,0.623932,0.491453,2.34,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4554,2022-10-20 12:00:00-04:00,135.710007,135.735001,135.179993,135.270004,135.270004,306329,META,2022-10-20,Thu,3,0,0,12:00:00,219091.0,31.0,133.229996,1.86,1.706364,2.145,1.076364,1.0,0.0,-0.1,-0.011667,-0.115041,-0.006667,-0.015289,0.019999,-0.009579,0.008333,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.322745,-0.933728,-0.23,0.067722,2.224818,-0.23,2.63,0.730769,0.83042,2.86,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
# Check if all stocks are present for today
# If not, there will be an error when running the machine learning model code below
df_today

Unnamed: 0,Datetime,Open,High,Low,Close,Adj Close,Volume,Ticker,Date,Day,Day_num,Mon?,Fri?,Time,VMA,Interval,PDC,Daily Percent,DPMA,DPMA_t,DPMAS,path,bipath,Slope,Slope_t,SMA,SMA_t,Conc,Conc_t,CMA,CMA_t,rel_min,num_mins,rel_max,num_max,rise_5,rise_10,rise_15,rise_30,PPC,P2PC,TPO,PPO,P2PO,day_min,day_max,day_pw,day_pwma_t,day_amp,0.3p_inc,0.4p_inc,0.5p_inc,0.6p_inc,0.7p_inc,0.75p_inc,0.8p_inc,0.9p_inc,1p_inc,1.25p_inc,1.5p_inc,1.75p_inc,2p_inc
108,2022-10-21 12:00:00-04:00,145.600006,146.100006,145.589996,146.000504,146.000504,1353852,AAPL,2022-10-21,Fri,4,0,1,12:00:00,841780.0,31.0,143.389999,1.54,1.337273,1.166667,1.162273,1.0,1.0,0.078182,0.156667,0.069174,0.127222,-0.002975,0.006667,-0.002682,0.023333,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,-0.327778,0.076464,-0.36,-0.587328,-1.453876,-0.5,1.54,1.0,0.816994,2.04,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0
108,2022-10-21 12:00:00-04:00,100.040001,100.309998,100.010002,100.220001,100.220001,498077,GOOG,2022-10-21,Fri,4,0,1,12:00:00,283021.0,31.0,100.529999,-0.49,-0.747273,-0.918333,0.334394,-2.775558e-17,1.0,0.086363,0.173334,0.070413,0.120556,-0.002975,0.001668,-0.001127,0.033056,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.238733,-1.096818,-2.06,0.525688,-0.685206,-2.06,-0.07,0.788945,0.573702,1.99,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
108,2022-10-21 12:00:00-04:00,117.2201,117.690002,117.220001,117.660004,117.660004,725467,AMZN,2022-10-21,Fri,4,0,1,12:00:00,524698.0,31.0,115.25,1.71,1.522727,1.273333,1.36106,1.4,1.0,0.081818,0.136666,0.089339,0.128056,-0.001818,-0.003334,-0.0026,0.008333,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.156182,-1.121058,-0.4,-1.089342,-1.438411,-0.4,1.71,1.0,0.793049,2.11,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
108,2022-10-21 12:00:00-04:00,238.979996,239.740005,238.940002,239.5,239.5,333604,MSFT,2022-10-21,Fri,4,0,1,12:00:00,201571.0,31.0,236.149994,1.2,0.878182,0.658333,0.936515,0.4,1.0,0.084546,0.173333,0.084463,0.129444,-0.002727,0.023333,-0.002246,0.022222,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,-0.139742,-0.854197,-0.6,-0.301137,-0.615933,-0.6,1.2,1.0,0.699074,1.8,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0
108,2022-10-21 12:00:00-04:00,122.695,123.550003,122.68,123.514999,123.514999,1060856,NVDA,2022-10-21,Fri,4,0,1,12:00:00,702452.0,31.0,121.940002,0.62,0.707273,0.24,1.670606,1.0,1.0,0.166364,0.19,0.139669,0.234722,-0.006033,-0.065,-0.009068,0.021111,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.172708,0.697041,-0.79,0.511843,-0.740801,-1.58,1.6,0.691824,0.572327,3.18,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0
108,2022-10-21 12:00:00-04:00,128.169998,128.830002,128.164993,128.479996,128.479996,603195,META,2022-10-21,Fri,4,0,1,12:00:00,435399.0,31.0,131.529999,-2.55,-2.722727,-2.861667,0.452273,0.4,1.0,0.11,0.19,0.087107,0.189444,-0.001488,-0.038333,-0.002029,0.043611,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,-1.292478,0.322745,-3.97,-0.225685,0.067722,-4.49,-1.77,0.713235,0.598652,2.72,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0


# Verify df_today contains all stocks in stocklist

If one stock is missing, an error will be generated in the model code. Just go ahead and "run after" the "Get todays data" cell again until all stocks in stocklist are listed. 


# Machine Learning System

Here is where we answer the question:  
*Will there be a 0.5% percent increase after 12pm for each stock?*

This is a system that uses the past 60 days to train many models that will vote on the outcome for one day. 

Realistically it would be nice to make one model using a bunch of data like a traditional machine learning project, but due to the Yahoo API limit, we only have 60 days to work with.

Each day, a new set of models will be used to predict that day's outcome for each stock. 

Here is general process:
- Generate logistic regression models each with different set of features
- Score each model based on the precision and prediction rate
- Select the best models
- Best models vote on the outcome
- Show prediction for each stock

In [39]:
# Uncomment the next line, and follow the direction, if you get an error in this code block
# stocklist = [ new list with only the tickers in df_today ]

# Import relevant Machine Learning packages
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, precision_score
import warnings
warnings.filterwarnings('ignore')

# Set Random Seed
import random
random.seed(19)

# Define different sets of features and random states
features1 = ['Day_num', "path" , "SMA_t" , "CMA_t" , 
            "PPC" , "P2PC" , "PPO" , "P2PO" , 
            "day_min" , "day_max" , "day_amp", "TPO",
            "day_pwma_t",]

features2 = ["path" , "SMA_t" , "CMA_t" , 
            "PPC" , "P2PC"  , "P2PO" , 
            "day_min" , "day_max" , "day_amp", "TPO",
            "day_pwma_t",]

features3 = ['Day_num', "path" , "SMA_t" , "CMA_t" ,  
            "day_min" , "day_max" , "day_amp",
            "day_pwma_t",]

features4 = ["path" , "SMA_t" , "CMA_t" , 
            "day_min" , "day_max" , "day_amp", "TPO",
            "day_pwma_t",]

featureslist = [features1, features2, features3, features4]
randomstatelist = [1,2,3,4,5,6,7,8,9,10]
target =  f"{str(target_num)}p_inc"

# Generate many models for different options of features
lrlist = []
for features in featureslist:
  X = df_no_today[features]
  y = df_no_today[target]

  # Split into training and test set. Make sure distribution of y_test and y_train are similar
  for x in randomstatelist:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=x)
    train_set_inc = round(sum(y_train)/len(y_train),3)
    test_set_inc = round(sum(y_test)/len(y_test),3)
    train_test_diff = round((1 - (test_set_inc/train_set_inc)),3)

    if abs(train_test_diff) > 0.15:
      continue
    else:
      pass

    # Loop to find optimal proba for model precision
    probadict = {} # 
    steps = [('scaler', StandardScaler()),
              ('lr', LogisticRegression())]
    for i in np.arange(0.5,0.85,0.025):
      proba = round(i,3)
      pipeline = Pipeline(steps)
      lr_scaled = pipeline.fit(X_train, y_train)
      y_pred = pipeline.predict_proba(X_test)[:,1]>=proba
      tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
      p = round(precision_score(y_test, y_pred),3)
      pred_perc = round((tp+fp)/(tp+fp+tn+fn),3)
      probadict[round(i,3)] = [p,pred_perc]

    # Get scores1
    probascore1 = {}
    type1_req_precision = 0.70
    probadict1 = {k:v for (k,v) in probadict.items() if v[0] >type1_req_precision}
    if len(probadict1) == 0:
      break
    for i in range(0,len(probadict1)):
      con = 10
      exp0 = 3 
      exp1 = 2
      val = list(probadict1.values())[i]
      score1 = round((val[0]**exp0 * con)**2 * (val[1]-0.02)**exp1*con,1)
      key = list(probadict1.keys())[i]
      probascore1[key] = score1
    best_proba1 = max(probascore1, key=probascore1.get)

    g = [features,x, "type 1",
         best_proba1,probadict1[best_proba1][0],probadict1[best_proba1][1],probascore1[best_proba1]]
    lrlist.append(g)

    # Get scores2
    probascore2 = {}
    type2_req_precision = 0.75
    probadict2 = {k:v for (k,v) in probadict.items() if v[0] >type2_req_precision}
    if len(probadict2) == 0:
      break
    for i in range(0,len(probadict2)):
      con = 100
      exp0 = 8
      exp1 = 3
      val = list(probadict2.values())[i]
      score2 = round((val[0]**exp0 * con)**2 * (val[1]-0.02)**exp1*con,1)
      key = list(probadict2.keys())[i]
      probascore2[key] = score2
    best_proba2 = max(probascore2, key=probascore2.get)

    g = [features,x, "type 2",
         best_proba2,probadict2[best_proba2][0],probadict2[best_proba2][1],probascore2[best_proba2]]
    lrlist.append(g)

    # Get scores3
    probascore3 = {}
    type3_req_precision = 0.80
    type3_req_pred_rate = 0.03
    probadict3 = {k:v for (k,v) in probadict.items() if v[0] > type3_req_precision }
    if len(probadict3) == 0:
      break
    for i in range(0,len(probadict3)):
      con = 10000
      exp0 = 25
      exp1 = 4
      val = list(probadict3.values())[i]
      score3 = round((val[0]**exp0 * con)**2 * (val[1]-0.02)**exp1*con,1)
      key = list(probadict3.keys())[i]
      probascore3[key] = score3
    best_proba3 = max(probascore3, key=probascore3.get)

    g = [features,x, "type 3",
         best_proba3,probadict3[best_proba3][0],probadict3[best_proba3][1],probascore3[best_proba3]]
    lrlist.append(g)

# Create df_lr: the dataframe of the 20 best models from all the different options, partitioned into types
column_names = ['Features', 'state','type','best_proba', "precision", "perc_pred","score"]               
df_lr = pd.DataFrame(lrlist, columns=column_names)
df_lr_sorted = df_lr.sort_values(['type', 'score'],ascending=False)

a = df_lr_sorted[df_lr_sorted['type'] == 'type 1'].head(20)
b = df_lr_sorted[df_lr_sorted['type'] == 'type 2'].head(20)
c = df_lr_sorted[df_lr_sorted['type'] == 'type 3'].head(20)
df_lr = pd.concat([a,b,c])

# Add stock predictions to df_lr
steps = [('scaler', StandardScaler()),
          ('lr', LogisticRegression())]
for stock in stocklist:
  predlist = []
  for i in range(0,len(df_lr)):
    feat = df_lr['Features'].iloc[i]
    target = '0.5p_inc'
    X = df_no_today[feat]
    y = df_no_today[target]
    X_pred = df_today[df_today['Ticker']==stock][feat]
    state = df_lr['state'].iloc[i]
    proba = df_lr['best_proba'].iloc[i]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=state)

    pipeline = Pipeline(steps)
    lr_scaled = pipeline.fit(X_train, y_train)
    y_pred = ((pipeline.predict_proba(X_pred)[:,1]>=proba)*1)[0]
    predlist.append(y_pred)
  col_name = f'{stock}_pred'
  df_lr[col_name] = predlist

df_lr['Date'] = latest_date
today_date = df_lr.pop('Date')
df_lr.insert(0,'Date',today_date)


# Create df_buy_instructions. Give buy scores based on aggregation of model performance and predictions
m = []
for stock in stocklist:
  col_name = f'{stock}_pred'
  pred1 = round(df_lr.groupby('type')[col_name].mean()[0],3)
  pred1_req = round(df_lr.groupby('type')['perc_pred'].mean()[0],3)
  pred2 = round(df_lr.groupby('type')[col_name].mean()[1],3)
  pred2_req = round(df_lr.groupby('type')['perc_pred'].mean()[1],3)
  pred3 = round(df_lr.groupby('type')[col_name].mean()[2],3)
  pred3_req = round(df_lr.groupby('type')['perc_pred'].mean()[2],3)
  buy_tier = 0

  if pred1 >= pred1_req:
    buy_tier += 1
  if pred2 >= pred2_req:
    buy_tier += 1
  if pred3 >= pred3_req:
    buy_tier += 1

  buy_score = round(pred1/pred1_req + pred2/pred2_req + pred3/pred3_req + buy_tier/2, 2)
  buy_score_log = round(np.log10(buy_score),3)
  buy_instruction = 'NO BUY'

  if buy_score_log >=0.5:
    buy_instruction = "BUY"
  else:
    buy_instruction = 'NO BUY'
  m.append([stock,pred1,pred1_req,pred2,pred2_req,pred3,pred3_req,buy_tier,buy_score, buy_score_log, buy_instruction])

cols = ['stock','pred1','pred1_req','pred2','pred2_req','pred3','pred3_req','buy_tier','buy_score','buy_score_log','buy_instruction']
df_buy_instructions = pd.DataFrame(m,columns=cols)

# Create df_summary: showing summary of how method performed at end of day
df_summary = df_buy_instructions
for i in range(0,len(df_summary)):
  stock = df_summary.loc[df_summary.index[i],'stock']
  price_at_time = round(df_today[df_today['Ticker']==stock]['Open'].item(),2)
  df_summary.loc[df_summary.index[i],'time_price'] = price_at_time

# target_price: showing the price of the day we expect stock to get to
df_summary['target_price'] = round(df_summary['time_price']*(1+(target_num/100)),2)

# Date: Insert date into df_summary
for i in range(0,len(df_summary)):
  stock = df_summary.loc[df_summary.index[i],'stock']
  today_date = df_today[df_today['Ticker']==stock]['Date'].item()
  df_summary.loc[df_summary.index[i],'Date'] = today_date
today_date = df_summary.pop('Date')
df_summary.insert(0,'Date',today_date)

# peak_after_time: download yf data for the max price after time to check if method worked 
for i in range(0,len(df_summary)):
  stock = df_summary.loc[df_summary.index[i],'stock']
  a = yf.download(stock, period='1d', interval='5m')
  a = a[a.index.time >= dt.datetime.strptime(stocktime, '%H:%M:%S').time()]
  a.drop(['Open', 'High', 'Low', 'Close','Volume'],axis=1,inplace=True)
  peak_after_time = round(a.to_numpy().max(),2)
  df_summary.loc[df_summary.index[i],'peak_after_time'] = peak_after_time

# p_inc: get the percent increase from time_price to peak after time
df_summary['p_inc'] = round(( (df_summary['peak_after_time'] - df_summary['time_price']) / df_summary['time_price'] )* 100, 2)
df_summary['target_inc'] = target_num

# outcome: show if todays model was true positive, false negative, etc.
for i in range(0,len(df_summary)):
  pred = df_summary.loc[df_summary.index[i],'buy_instruction']
  peak_after_time = df_summary.loc[df_summary.index[i],'peak_after_time']
  target_price = df_summary.loc[df_summary.index[i],'target_price']
  target = target
  if (pred == 'BUY') and (peak_after_time >= target_price):
    df_summary.loc[df_summary.index[i],'outcome'] = 'true positive'
  elif (pred == 'BUY') and (peak_after_time < target_price):
    df_summary.loc[df_summary.index[i],'outcome'] = 'false positive'
  elif (pred != 'BUY') and (peak_after_time >= target_price):
    df_summary.loc[df_summary.index[i],'outcome'] = 'false negative'
  elif (pred != 'BUY') and (peak_after_time < target_price):
    df_summary.loc[df_summary.index[i],'outcome'] = 'true negative'
  else:
    df_summary.loc[df_summary.index[i],'outcome'] = 'N/A'

# true_peak_after_time: download yf data for the max price after 12pm to check if method worked 
for i in range(0,len(df_summary)):
  stock = df_summary.loc[df_summary.index[i],'stock']
  a = yf.download(stock, period='1d', interval='30m')
  a = a[a.index.time >= dt.datetime.strptime(stocktime, '%H:%M:%S').time()]
  a.drop(['Volume'],axis=1,inplace=True)
  true_peak_after_time = round(a.to_numpy().max(),2)
  df_summary.loc[df_summary.index[i],'true_peak_after_time'] = true_peak_after_time

# true_p_inc: get the percent increase from time_price to peak after time
df_summary['true_p_inc'] = round(( (df_summary['true_peak_after_time'] - df_summary['time_price']) / df_summary['time_price'] )* 100, 2)

# true_outcome: show if todays model was true positive, false negative, etc.
for i in range(0,len(df_summary)):
  pred = df_summary.loc[df_summary.index[i],'buy_instruction']
  true_peak_after_time = df_summary.loc[df_summary.index[i],'true_peak_after_time']
  target_price = df_summary.loc[df_summary.index[i],'target_price']
  if (pred == 'BUY') and (true_peak_after_time >= target_price):
    df_summary.loc[df_summary.index[i],'true_outcome'] = 'true positive'
  elif (pred == 'BUY') and (true_peak_after_time < target_price):
    df_summary.loc[df_summary.index[i],'true_outcome'] = 'false positive'
  elif (pred != 'BUY') and (true_peak_after_time >= target_price):
    df_summary.loc[df_summary.index[i],'true_outcome'] = 'false negative'
  elif (pred != 'BUY') and (true_peak_after_time < target_price):
    df_summary.loc[df_summary.index[i],'true_outcome'] = 'true negative'
  else:
    df_summary.loc[df_summary.index[i],'true_outcome'] = 'N/A'
display(df_summary)
print(f'THE TIME FOR THIS TABLE IS {stocktime}')

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


[*********************100%***********************]  1 of 1 completed


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Date,stock,pred1,pred1_req,pred2,pred2_req,pred3,pred3_req,buy_tier,buy_score,buy_score_log,buy_instruction,time_price,target_price,peak_after_time,p_inc,target_inc,outcome,true_peak_after_time,true_p_inc,true_outcome
0,2022-10-21,AAPL,0.154,0.107,0.154,0.088,0.0,0.058,2,4.19,0.622,BUY,145.6,146.33,147.82,1.52,0.5,true positive,147.85,1.55,true positive
1,2022-10-21,GOOG,0.154,0.107,0.154,0.088,0.1,0.058,3,6.41,0.807,BUY,100.04,100.54,101.49,1.45,0.5,true positive,101.62,1.58,true positive
2,2022-10-21,AMZN,0.538,0.107,0.385,0.088,0.3,0.058,3,16.08,1.206,BUY,117.22,117.81,119.38,1.84,0.5,true positive,119.59,2.02,true positive
3,2022-10-21,MSFT,0.154,0.107,0.154,0.088,0.1,0.058,3,6.41,0.807,BUY,238.98,240.17,242.57,1.5,0.5,true positive,243.0,1.68,true positive
4,2022-10-21,NVDA,1.0,0.107,1.0,0.088,1.0,0.058,3,39.45,1.596,BUY,122.69,123.3,124.87,1.78,0.5,true positive,124.98,1.87,true positive
5,2022-10-21,META,0.308,0.107,0.308,0.088,0.4,0.058,3,14.78,1.17,BUY,128.17,128.81,130.01,1.44,0.5,true positive,130.12,1.52,true positive


THE TIME FOR THIS TABLE IS 12:00:00


# Predictions

In the table "df_summary", the column "buy_instruction" tells you if it thinks you should buy and "target_price" tells you what to set your sell limit order to. "true_outcome" can only be determined after day's close, but the buy instruction can be used a few minutes after the "stocktime".

Note: I tested this system over 3 months and BUY predictions were correct 75% of the time. This does not mean it will perform that well in the future, but it has worked in the past. Also, this notebook is slightly different than the one I used for my experiment. I have been making changes to see if I can get better performance, and at this point I have not tested how the changes impacted performance. Use at your own risk. 



