I'm going to steal some parts of my original model, but now we want to use MULTIPLE models to give our main output (predicted close) a much more robust "range" to the closing price, rather than hinging on a single-point.

### Output would be: "Predicted closing price range of: $435.75 - $439.42"

Given this range, our model can now make a programmatical decision on whether it thinks the stock is going to go up, down, or remain flat. This decision will be a CONFIGURATION as it is entirely dependent on the human's threshhold / tolerance for risk and what they're trying to achieve.

### "I'm interested in making trades when the model predicts a minimum upward swing of 4% on the stock."

## Can we give ourselves a confidence rating?

It would be nice for the model to say "I'm 90% sure the closing price will fall within the predicted range" or even better would be a percentage.
How do we do this with only a "1:1" (row:output) comparison of the data?

The real answer is to do a time series model, but right now I'd like to focus on the tools at hand, while I'm learning.

To produce a confidence rating, I am able to use the "mean absolute error" of the model as it's trained on the training set and compared to the value set... but if I'm training my model on the WHOLE data set, my model should be 100% in-sync with the training set by the time it's finished training. 

## How do we produce a range?

I think attempting to predict the range of ALL the desired features is a good idea. The problem is that none of the data will be dependent on the previous "time steps". If we ask the model to chop the data randomly, we're learning based on a moment in time and not on a series of moments.

*** THIS IS TOTALLY NOT GOOD MODELING AND I KNOW IT! ***
*** THIS IS JUST TO LEARN THE BASICS BEFORE WE GET MORE ROBUST WITH "TIME SERIES" MODELING!!! ***
*** IT WILL BE OF GENERAL INTEREST IF IT PRODUCES SEMI-ACCURATE RESULTS, THOUGH! ***

With this in mind, my gut says running the model randomly 1000 times may produce some consistency or some kind of normal distribution that I can use to define a "range" for any given feature.

If I capture this data, and plot it, I should see SOME outliers, but I should also hope to see a tight clustering that I can define as the "range" for that feature's prediction

## 1st model

Determine a range for volume based on the high, low, open, and close features. (1000 trains with "volume" being the train_y = 1 RANGE PREDICTION)

## 2nd model

Determine a range for high & low based on volume, open, and close. (train on whole set of data, but make it predict based on the RANGE from 1st model and the previous open and close price. one prediction for the LOW volume, one prediction for the HIGH volume - 2 TOTAL PREDICTIONS)

*** now we can say we've done everything we can to provide the FINAL pass with the most accurate predictions possible ***

## 3rd model

This model will be trained on the whole data set and attempt to predict a single Close price based on the 2 predictions is receives from the 2nd model

The TWO predictions we receive from this 3rd model should be taken into consideration alongside the HIGH and LOW predictions of model #2 to attempt to build the final Close price RANGE prediction that a human trader would want to use as a guide for how to trade the next upcoming trading day.

Now that we have a north star, let's get coding!!!!

In [None]:
# load all relevant imports and download CSV from yfinance API call
import pandas as pd
import numpy as np
import requests 
import yfinance as yf
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from datetime import date

ticker = 'AMZN'

unsafe_session = requests.session()
unsafe_session.verify = False

def load_data(tickerSymbols):
    yf.download(tickers=tickerSymbols
                , session=unsafe_session
                ).to_csv(f'./csv/{tickerSymbols}_data.csv')

    # load data into DataFrame
    return pd.read_csv(f'./csv/{tickerSymbols}_data.csv')

def prediction_csv(dataframe, ticker):
    dataframe.to_csv(f'./predict-csv/{ticker}_data.csv')

def add_line_to_file(file_path, new_line):

    with open(file_path, "a") as file:
        file.write(new_line + "\n")

def mean_within_one_std(arr):
    """Calculates the mean of elements within one standard deviation from the mean."""

    mean = np.mean(arr)
    std = np.std(arr)

    # Filter elements within one standard deviation
    filtered_arr = [x for x in arr if mean - std <= x <= mean + std]

    # Calculate mean of filtered array
    return filtered_arr

# get only data rows by 
data_rows_only = load_data(ticker).iloc[2:]

#shift data, concat() columns, and rename for analyzing data
df_shifted = data_rows_only.shift(1)
df_shifted.columns = ['Price_prev', 'Close_prev', 'High_prev', 'Low_prev', 'Open_prev', 'Volume_prev']
df_combined = pd.concat([data_rows_only.loc[:,], df_shifted], axis=1)
df_combined.columns = ['Current Date', 'Current Close', 'Current High', 'Current Low', 'Current Open', 'Current Volume',
                       'Day_prev', 'Close_prev', 'High_prev', 'Low_prev', 'Open_prev', 'Volume_prev']

print(f"{len(df_shifted)} rows")
print(df_combined.columns)


In [2]:
# build stock model and feature sets
stock_model_1 = RandomForestRegressor(random_state=1)
stock_features_1 = ['Close_prev', 'High_prev', 'Low_prev', 'Open_prev']
### TARGET =  VOLUME ###

stock_model_2 = RandomForestRegressor(random_state=1)
stock_features_2 = ['Close_prev', 'Open_prev', 'Volume_prev'] # can share the feature set, no problem!
### TARGET =  LOW ###

stock_model_3 = RandomForestRegressor(random_state=1)
### TARGET =  HIGH  ###

stock_model_4 = RandomForestRegressor(random_state=1)
stock_features_4 = ['Close_prev', 'High_prev', 'Low_prev', 'Open_prev', 'Volume_prev']
### TARGET =  CLOSE  ###


In [None]:
recent_closed = df_combined.iloc[len(df_combined)-1]

# establish X (rows to analyze) and y (value to predict) variables
X1 = df_combined.iloc[1:][stock_features_1]
recent_closed_X1 = recent_closed[stock_features_1]
y1 = df_combined.iloc[1:]['Current Volume']

X2 = df_combined.iloc[1:][stock_features_2] # same feature set
y2 = df_combined.iloc[1:]['Current Low']

X3 = df_combined.iloc[1:][stock_features_2] # same feature set
y3 = df_combined.iloc[1:]['Current High']

X4 = df_combined.iloc[1:][stock_features_4]
y4 = df_combined.iloc[1:]['Current Close']

# global scope variables for data extraction

prediction_array = []

# let's start the 1000 random prediction loops here:
# (this is not "training" the model, we are merely producing a sample of data to derive our educated guesses from)

for i in range(1000):
    # split the training set on each loop
    train_X1, val_X1, train_y1, val_y1 = train_test_split(X1, y1)

    # fit first model on TRAINING data set, we want as many random configurations of data points analyzed as possible outcomes, hence the looping
    # from there, we'll take an average --- (and maybe throw out outliers? we may want to take an average of outcomes that are within 1 standard deviation from the mean)
    stock_model_1.fit(train_X1,train_y1)

    prediction_1 = stock_model_1.predict([recent_closed_X1])
    prediction_array.append(prediction_1[0]) # make a prediction and push it to the array

    print(f"Loop {i} prediction:: {prediction_1[0]}")

prediction_array = mean_within_one_std(prediction_array)

low_volume = np.min(prediction_array).astype(np.float64)
high_volume = np.max(prediction_array).astype(np.float64)

print("The range is:")
print(f"{low_volume} - {high_volume}")

print_txt = f"FOR TICKER '{ticker}', \n\tThe predicted volume range is:{low_volume} - {high_volume}"

add_line_to_file("./predictions/multi-predictions-log.txt", print_txt)

# fit last three models on WHOLE data set
stock_model_2.fit(X2,y2) # Low model
stock_model_3.fit(X3,y3) # high model
stock_model_4.fit(X4,y4) # high model


#### Now we have the models fitted... let's make some predictions!

In [None]:
# make 1000 predictions for good measure, track the MINIMUM and MAXIMUM volume resultss to produce a range

# for time, let's use the outputs i got for the 1st model 50 loops:
# 27891551.0 - 45309967.0

df_2 = pd.DataFrame({'Close_prev': [recent_closed["Close_prev"]], 'Open_prev': [recent_closed["Open_prev"]], 'Volume_prev': [low_volume], })
df_3 = pd.DataFrame({'Close_prev': [recent_closed["Close_prev"]], 'Open_prev': [recent_closed["Open_prev"]], 'Volume_prev': [high_volume], })

# make two predictions for the HIGH and LOW volume
prediction_2 = stock_model_2.predict(df_2)[0]
prediction_3 = stock_model_3.predict(df_3)[0]

print(f"Low price is: {prediction_2}")
print(f"High price is: {prediction_3}")
print_txt = f"\tLow price is: {prediction_2}. \nHigh price is: {prediction_3}."

add_line_to_file("./predictions/multi-predictions-log.txt", print_txt)


In [None]:
# timestamp for logging
from datetime import datetime

now = datetime.now()
timestamp = now.strftime("%H:%M %m/%d/%Y")

# Now let's make the very LAST prediction based on this new information!
df_4 = pd.DataFrame({'Close_prev': [recent_closed["Close_prev"]], 'High_prev': [prediction_3], 'Low_prev': [prediction_2], 
                     'Open_prev': [recent_closed["Open_prev"]], 'Volume_prev': [low_volume], })

df_5 = pd.DataFrame({'Close_prev': [recent_closed["Close_prev"]], 'High_prev': [prediction_3], 'Low_prev': [prediction_2], 
                     'Open_prev': [recent_closed["Open_prev"]], 'Volume_prev': [high_volume], })

prediction_4 = stock_model_4.predict(df_4)[0]
prediction_5 = stock_model_4.predict(df_5)[0]

add_line_to_file("./predictions/multi-predictions-log.txt", f"------------------------\n\n{timestamp}:\n\n")

if prediction_4 < prediction_5:
    print_txt = f"\tClose price range for next trading day on '{ticker}' is: {prediction_4} - {prediction_5}"

    add_line_to_file("./predictions/multi-predictions-log.txt", print_txt)
    print(print_txt)
else:
    print_txt = f"\tClose price range for next trading day on '{ticker}' is: {prediction_5} - {prediction_4}"
    add_line_to_file("./predictions/multi-predictions-log.txt", print_txt)
    print(print_txt)

add_line_to_file("./predictions/multi-predictions-log.txt", "\n\n------------------------\n\n")


In [None]:
from datetime import datetime

now = datetime.now()
timestamp = now.strftime("%H:%M %m/%d/%Y")

print(timestamp)
