# Agenda - Data Analytics Life Cycle


There are five relevant phases of the data analytics life cycle that are pertinent to this performance assessment.

1. Discovery Phase
2. Data Acquisition
3. Data Cleaning & Exploratory Data Analysis
4. Data Mining / Machine Learning
5. Reporting
   
Each of these phases will be talked about in detail in this document.

Please note that _**bolded and italicized letters**_ correspond to the WGU grading rubric.


# Discovery Phase

### Research Question - _**A**_

Using historical market stock prices and technical indicators (RSI, MACD, MFI), how accurately can a neural network model, specifically an LSTM-based Recurrent Neural Network, predict stock price movements after the occurrence of a bullish candlestick pattern (e.g., 1 day, 3 days, 5 days, 10 days, and 15 days afterwards)? Additionally, how does the performance differ when using binary classification versus regression for price prediction?

Note: I will only be identifying **bullish candlestick** patterns for this project, as these patterns are typically used to signal potential upward price movements. I am not identifying bearish candlestick patterns to save resources as well as I am personally more interested in upward price movement. These candlestick patterns that I will be identifying are shown in the "Data Definitions" section of this project.

### Short Summary: Benefits to Stock Traders

Using a neural network model like LSTM to predict stock price movements can benefit traders by providing more accurate forecasts based on historical stock price data and technical indicators like RSI, MACD, and MFI. By combining bullish candlestick patterns with these additional indicators, traders can make better-informed decisions instead of relying solely on pattern analysis. LSTM models excel at capturing long-term trends, making them ideal for stock price predictions. By using an LSTM, this method should enhance prediction reliability, improving investor confidence in anticipating price movements and make more profitable trades. From our data analysis we should also be able to recognize which specific candlestick patterns yield more accurate stock price predictions.

### Long Summary: Benefits to Stock Traders

My research question involves using a neural network model, specifically an LSTM-based Recurrent Neural Network (RNN), to predict stock price movements after a bullish candlestick pattern by analyzing historical market data and technical indicators like the Relative Strength Index (RSI), Moving Average Convergence Divergence (MACD), and Money Flow Index (MFI). This question benefits from data analysis because financial markets are complex and influenced by many factors that interact in unpredictable ways. Stock prices are impacted by a mix of past price movements, investor behavior, market trends, and technical indicators, making it difficult to predict future movements with simple models. By using data analysis and machine learning, we can uncover patterns in large amounts of historical data and understand how these different factors interact with one another, potentially leading to more accurate predictions.

Candlestick patterns are widely used by traders to identify trends and predict future price movements. However, analyzing candlestick patterns in isolation may not provide the most reliable predictions. By incorporating additional data, such as historical stock prices and technical indicators like RSI, MACD, and MFI, we can potentially strengthen the accuracy of these predictions. Data analysis allows us to test how well these combined factors—candlestick patterns along with RSI, MACD, and MFI—help forecast future price changes. This data-driven approach goes beyond subjective interpretation, enabling the model to detect patterns and relationships in the data that may not be obvious to human analysts. By isolating specific candlestick patterns and training an LSTM on each of my chosen candlestick patterns, we can determine which of these candlestick patterns yield the best stock price predictions.

Stock price prediction is a time-series problem, and LSTMs are well-suited for handling such data because they are designed to track long-term trends over time. In my case, I have a specific time-series problem that can be considered as a “pattern-based prediction problem” where I’m identifying specific events (candlestick patterns) and then predicting the immediate future based on these events. Data analysis helps fine-tune the LSTM model to improve its performance, ensuring it can make accurate predictions on new, unseen data. Additionally, comparing different approaches—such as binary classification (predicting whether the price will go up or down) and regression (predicting the exact price change)—requires rigorous analysis. By evaluating model performance through metrics like accuracy for classification or mean squared error for regression, data analysis enables an objective comparison of which approach provides more reliable predictions. 

In summary, data analysis is essential in this research because it allows us to better understand how candlestick patterns, when combined with historical stock prices and technical indicators like RSI, MACD, and MFI, can help predict stock price movements. By leveraging machine learning models like LSTMs, we can uncover complex patterns in the data, leading to more informed and reliable predictions. This data-driven approach provides a more objective and accurate way to forecast price movements than relying on intuition or subjective judgment.

### Hypothesis

After performing our data analysis, we should be able to conclude either the null or alternative hypothesis. The null hypothesis is the default assumption, stating that our independent variables (past stock price [low, high, open, close], RSI, MACD, MFI) have no effect, no relationship, or cause no change, to our dependent variable (future stock price).

**Null Hypothesis**: The LSTM-based Recurrent Neural Network model does not significantly predict stock price movements (up or down) better than random chance when using historical market stock prices and technical indicators (RSI, MACD, MFI) for prediction.

**Alternative Hypothesis**: The LSTM-based Recurrent Neural Network model significantly predicts stock price movements (up or down) better than random chance when using historical market stock prices and technical indicators (RSI, MACD, MFI) for prediction. 

### Data Definitions

**1) Candlestick Pattern** 

Charting technique used in technical analysis which helps traders identify bullish (belief that the price of a stock will rise) or bearish (belief that the price of a stock will decrease) patterns. Candlestick patterns are known to show the favored direction of a stock's price, but it is not guaranteed.

In the case of this project, it represents price movements of a stock over a specific time period. For example, in this project we will use daily candlesticks, this means each candlestick will provide the following information:

* Low: The lowest stock price reached during this day
* High: The highest stock price reached during this day
* Open: The stock price at the beginning of the day
* Close: The stock price at the close of the day

Below is an example of a bearish candle (left) and bullish candle (right) and how to interpet them:

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/candle_example.png">
</div>

There are many types of candlestick patterns, however as mentioned earlier in my project I will only identify bullish candlestick patterns. Specifically, I will identify five of the most popular candlestick patterns to save resources, specifically time. The following are the candlestick patterns that I will use for my data analysis and exactly how I will identify each of them:


1. Hammer
    + Description: A single candlestick with a small body at the top and a long lower shadow. It appears at the bottom of a downtrend.
    + Significance: Shows that despite strong selling pressure, buyers stepped in, potentially signaling a reversal.
    + How I'm identifying this pattern: Looking for a bullish or bearish candlestick with a small body (body is less than 30% of the total candlestick length; candlestick length is the distance between low and high), with the lower shadow (the distance between the low and the open for a bullish candle, or the low and the close for a bearish candle) being at least two times the length of the body. There also must be little to no upper shadow (upper shadow will be less than 10% of the total candle length). This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. It doesn't matter if the single candlestick is a bearish or bullish candle, they both can display the hammer pattern. The images below are visual representations of exactly how I am going to identify the hammer pattern as highlighted with blue dots; it doesn't matter if the single candlestick is a bearish or bullish candle, they both display the hammer pattern. In the images below, although it shows that there is a decently sized upper shadow for the hammer pattern, this is incorrect as it should be less than 10% of the candle's total length.



<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Hammer_wtih_bearish_candle.png">
</div>

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Hammer_wtih_bullish_candle.png">
</div>

2. Inverted Hammer
    + Description: A single candlestick with a small body near the bottom, a long upper shadow, and little to no lower shadow. This pattern appears at the bottom of a downtrend.
    + Significance: Shows that bulls may be gaining control, though confirmation from the next candle is needed.
    + How I'm identifying this pattern: Looking for a candlestick (bullish or bearish) with a small body (less than 30% of the total candlestick length; candlestick length is the distance between low and high) at the bottom, a long upper shadow (at least twice the length of the body), and little to no lower shadow (lower shadow will be less than 10% of the total candle length). This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. The images below are visual representations of exactly how I am going to identify the inverted hammer pattern as highlighted with blue dots; it doesn't matter if the single candlestick is a bearish or bullish candle, they both display the inverted hammer pattern.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Inverted_Hammer_wtih_bearish_candle.png">
</div>

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Inverted_Hammer_wtih_bullish_candle.png">
</div>

3. Bullish Engulfing Pattern
    + Description: A two-candle pattern where a small bearish candle is followed by a larger bullish candle that completely engulfs the previous one.
    + Significance: A strong signal of a shift from bearish to bullish sentiment, often indicating a reversal.
    + How I'm identifying this pattern: Looking for a small bearish candle followed by a large bullish candle that completely engulfs the range (high and low prices) of the first candle (two-candle pattern). The second candle's body (open and close price) must be larger by at least 2 times the body of the first candle and fully engulf the body of the first candle. This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. Note that the first candle (the bearish candle) in the identified pattern will be treated as the fifth candle in the slope calculation for confirming the downtrend. The image below is a visual representation of exactly how I am going to identify the bullish engulfing pattern as highlighted with blue dots.
      
<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Bullish_engulfing.png">
</div>

4. Bullish Harami
    + Description: A two-candle pattern where a small bullish candle is contained entirely within the range of the previous large bearish candle.
    + Significance: A sign of potential reversal, suggesting that selling pressure is weakening and buyers are starting to take control.
    + How I'm identifying this pattern: Looking for a large bearish candle followed by a small bullish candle that is entirely within the range of the first candle; this means that the small bullish candle should be fully contained within the previous candle's high and low. The large bearish candle will be at least twice the entire length of the following bullish candle. The body of the large bearish candle will also completely engulf the body of the small bullish candle. This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. Note that the first candle (the bearish candle) in the identified pattern will be treated as the fifth candle in the slope calculation for confirming the downtrend. The image below is a visual representation of exactly how I am going to identify the bullish harami pattern as highlighted with blue dots.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Bullish_harami.png">
</div>

5. Three White Soldiers
    + Description: A three-candle pattern consisting of three consecutive long bullish candles that close progressively higher.
    + Significance: A strong bullish signal, indicating a powerful upward trend and a continuation of the previous bullish move.
    + How I'm identifying this pattern: Looking for three consecutive bullish candles with each one closing higher than the previous candle. The candles should show a steady upward movement without large wicks. The upper and lower wicks should each be no more than 20% of the total candle length. Unlike the other patterns, this pattern does not need to occur during a downtrend. The visual below is a representation of how I am going to identify the three white soliders pattern as highlighted with blue dots.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Three_white_soldiers.png">
</div>


(Note: The way I am identifying these candlestick patterns is highly specific. This approach ensures consistency and a systematic method for recognizing patterns, which leads to more reliable and repeatable results when analyzing trends. While the specifications may differ in certain details, they still encompass the core definition of each candlestick pattern. Some examples of these adjustments include:

* The requirement for identifying a downtrend, using the slope of the 5 previous candles' closing prices. If the slope is negative, it confirms a downtrend.
* For example, just looking at the hammer pattern, the lower shadow (the distance between the low and the open for a bullish candle, or the low and the close for a bearish candle) must be at least twice the length of the body.
* Various other specific calculations for each pattern, ensuring that all conditions for pattern identification are met with precision.
  
These modifications are designed to provide a consistent, clear, and quantifiable framework for candlestick pattern identification, which enhances the accuracy of trend analysis.)


**2) Stock Price** 

Stock price is the current market value or price of a single share of a company's stock. Each stock has it's own ticker symbol, for example, Microsoft's ticker symbol is "MSFT".

I will be using four stock price associated independent variables for my analysis. My independent variables for stock price are daily variables, meaning they are recorded once daily. These variables are as follows:

1. Low: The lowest stock price reached during this day
2. High: The highest stock price reached during this day
3. Open: The stock price at the beginning of the day
4. Close: The stock price at the close of the day

My sole dependent variable for this analysis is also a stock price variable. It can be explained two different ways depending on the problem I am trying to solve:

1. Binary Classification: Predicting if the closing price will go 'up' or 'down' for multiple future time periods (1 day, 3 days, 5 days, 10 days, and 15 days), compared to the closing price from the last candle of the identified candlestick pattern.
2. Regression: Predicting the exact closing price (a continuous value) for multiple future time periods (1 day, 3 days, 5 days, 10 days, and 15 days), after the identified candlestick pattern.


**3) Relative Strength Index (RSI)** 

RSI will be another of my independent variables for this analysis. RSI is a momentum oscillator; a momentum oscillator is a type of technical indicator that reflects the rate at which a stock's price is moving. It helps traders determine whether a price movement is strengthening or weakening.

RSI is calculated on a scale of 0 to 100 and helps identify potential overbought or oversold conditions in the market. When the RSI rises above 70, it indicates that an asset may be overbought, suggesting a potential reversal or pullback. Conversely, when the RSI falls below 30, it suggests the asset may be oversold and due for a rebound.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Fidelity_RSI_70_30.png"><p style="text-align: right;">(Fidelity, n.d., RSI: Relative Strength Index)</p>
</div>

RSI is valuable not only for indicating overbought or oversold conditions but also for gauging the strength of a trend. A reading above 50 generally signals bullish momentum, while a reading below 50 suggests bearish momentum. 

Divergences between price and RSI can also provide important insights; for example, if stock price is making new highs but RSI is not, it could indicate weakening momentum and a potential reversal. Traders often use RSI to confirm entry and exit points, helping them make more informed decisions based on current market conditions and trends.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Fidelity_RSI_divergence.png"><p style="text-align: right;">(Fidelity, n.d., RSI: Relative Strength Index)</p>
</div>

Here is how RSI will be calculated in my analysis:

I will use a 14-day lookback period as it is the standard for calculating RSI, as originally proposed by J. Welles Wilder (he developed RSI) in his 1978 book _New Concepts in Technical Trading Systems_. This 14-day period is widely accepted in technical analysis and provides a balanced approach to evaluating price momentum over a short timeframe. It helps traders capture short- to medium-term trends, providing signals about the asset's momentum while still maintaining a reasonable level of reliability.

The following table is just an example illustration for 14 days, to help visualize how RSI is calculated.

| Day | Closing Price | Change | Gain | Loss |
|-----|---------------|--------|------|------|
| 1   | 100           | -      | -    | -    |
| 2   | 102           | +2     | 2    | 0    |
| 3   | 101           | -1     | 0    | 1    |
| 4   | 103           | +2     | 2    | 0    |
| 5   | 105           | +2     | 2    | 0    |
| 6   | 104           | -1     | 0    | 1    |
| 7   | 107           | +3     | 3    | 0    |
| 8   | 110           | +3     | 3    | 0    |
| 9   | 109           | -1     | 0    | 1    |
| 10  | 112           | +3     | 3    | 0    |
| 11  | 113           | +1     | 1    | 0    |
| 12  | 111           | -2     | 0    | 2    |
| 13  | 115           | +4     | 4    | 0    |
| 14  | 116           | +1     | 1    | 0    |

**Step 1)** Calculate average gain and average loss

* Sum the gains for the first 14 days = 2 + 0 + 2 + 2 + 0 + 3 + 3 + 0 + 3 + 1 + 0 + 4 + 1 = 21
* Sum the losses for the first 14 days = 0 + 1 + 0 + 0 + 1 + 0 + 0 + 1 + 0 + 0 + 2 + 0 + 0 = 5

* Average gain = 21 / 14 = 1.5
* Average loss = 5 / 14 = 0.36
  
<br>

**Step 2)** Calculate Relative Strength (RS)

* RS = Average gain / Average loss = 1.5 / 0.36 = 4.17

<br>

**Step 3)** Calculate RSI

* RSI = 100 - 100 / (1 + RS) = 100 - 100 / 5.17 = 80.65

Note: This (80.65) is the RSI value on the 14th day as it uses price data from day 1 to day 14. If I wanted the RSI value for the 15th day, it will be based on price data from day 2 to day 15.
<br>

**4) Money Flow Index (MFI)** 

In addition to the previously mentioned technical indicators, I will also incorporate the Money Flow Index (MFI) as another important independent variable in my analysis.

MFI is a momentum indicator that measures the flow of money into and out of an asset over a specific period of time, typically 14 days. The MFI combines both price and volume, which helps identify overbought or oversold conditions in the market. It is calculated by comparing the typical price (average of high, low, and close) of each period to the volume of trades during that period. When the MFI is above 80, it indicates that the asset is overbought, and when it is below 20, it indicates that the asset is oversold. Traders use the MFI to confirm price trends or spot potential reversals, as extreme values of MFI can signal that an asset is about to experience a trend change.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Fidelity_MFI.png"><p style="text-align: right;">(Fidelity, n.d., Money Flow Index)</p>
</div>

For the purposes of my analysis, I will be calculating MFI using a 14-day period, which is the standard lookback period. MFI will help me capture the relationship between price movements and volume flow, similar to Money Flow Index (MFI), but with the added complexity of considering both price and volume for identifying buying or selling pressure. 

Below is an example of how MFI might be calculated, starting with determining the typical price for each day (High + Low + Close) / 3, then multiplying by volume, and finally applying the formula to get the MFI over a 14-day period:


| Day | High | Low | Close | Volume | Typical Price | Money Flow | MFI Calculation |
|-----|------|-----|-------|--------|---------------|------------|-----------------|
| 1   | 105  | 98  | 102   | 2000   | (105 + 98 + 102) / 3 = 101.67 | Positive   | N/A (First day) |
| 2   | 107  | 100 | 104   | 2500   | (107 + 100 + 104) / 3 = 103.67 | Positive   | N/A (First 14 days) |
| 3   | 106  | 99  | 103   | 2200   | (106 + 99 + 103) / 3 = 102.67 | Negative   | N/A (First 14 days) |
| 4   | 108  | 101 | 106   | 2400   | (108 + 101 + 106) / 3 = 105.00 | Positive   | N/A (First 14 days) |
| 5   | 110  | 104 | 107   | 2300   | (110 + 104 + 107) / 3 = 107.00 | Positive   | N/A (First 14 days) |
| 6   | 111  | 105 | 109   | 2500   | (111 + 105 + 109) / 3 = 108.33 | Positive   | N/A (First 14 days) |
| 7   | 113  | 106 | 110   | 2600   | (113 + 106 + 110) / 3 = 109.67 | Positive   | N/A (First 14 days) |
| 8   | 115  | 108 | 112   | 2700   | (115 + 108 + 112) / 3 = 111.67 | Positive   | N/A (First 14 days) |
| 9   | 116  | 109 | 113   | 2800   | (116 + 109 + 113) / 3 = 112.67 | Positive   | N/A (First 14 days) |
| 10  | 118  | 110 | 114   | 2900   | (118 + 110 + 114) / 3 = 114.00 | Positive   | N/A (First 14 days) |
| 11  | 120  | 112 | 116   | 3000   | (120 + 112 + 116) / 3 = 116.00 | Positive   | N/A (First 14 days) |
| 12  | 122  | 113 | 118   | 3100   | (122 + 113 + 118) / 3 = 117.67 | Positive   | N/A (First 14 days) |
| 13  | 124  | 115 | 119   | 3200   | (124 + 115 + 119) / 3 = 119.33 | Positive   | N/A (First 14 days) |
| 14  | 126  | 116 | 121   | 3300   | (126 + 116 + 121) / 3 = 121.00 | Positive   | N/A (First 14 days) |
| 15  | 125  | 118 | 120   | 3400   | (125 + 118 + 120) / 3 = 121.00 | Negative   | MFI = (Positive Money Flow / Negative Money Flow) * 100 |




**5) Moving Average Convergence Divergence (MACD)** 

MACD will be associated with two more independent variables for my analysis, MACD line and signal line.

MACD is a trend-following momentum indicator that shows the relationship between two moving averages of a stock’s price. First, we need to understand what moving averages are in relationship to stock price. In simple terms, a moving average calculation computes the average of a stock's prices over a specific period. This helps to smooth out short-term price fluctuations, making it easier to identify long-term trends.

We have different types of moving averages: 

1. Simple Moving Average (SMA): This is the average of a stock's price over a specific period of time. For example, a 10-day SMA is the average of the last 10 days of closing prices. It gives equal weight to each price point in the calculation.
   
2. Exponential Moving Average (EMA): The EMA is a variation of the moving average that gives more weight to recent prices, making it more sensitive to new information. This is important for capturing recent price movements and trends more quickly than the SMA, which treats all price data equally.

Now that we understand moving averages, MACD is calculated by subtracting the 26-day exponential moving average (EMA) from the 12-day EMA. The resulting value is the MACD line. 
The 26-day EMA reflects long term price trends which smooths out price data over a longer period while the 12-day EMA captures short term price trends. The subtraction between them resulting in the MACD line is important as it represents the difference between short and long term price trends.

Along with the MACD line, a signal line is also calculated. This is a 9-day EMA of the MACD line, and it helps traders identify buy and sell signals. Specifically, the signal line is calculated by taking the MACD line values for the past 9 days and applying the standard EMA formula to smooth out fluctuations in the MACD line over this period.

When the MACD line crosses above the signal line, it is generally considered a bullish signal, suggesting a potential upward movement in the stock price. Conversely, when the MACD line crosses below the signal line, it is viewed as a bearish signal, indicating a potential downward price movement.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Fidelity_MACD.png"><p style="text-align: right;">(Fidelity, n.d., MACD)</p>
</div>

The MACD line also reports bullish signals when it turns up from below zero. Conversely, while it crosses below zero, it is considered bearish.

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/Fidelity_MACD_above_below_zero.png"><p style="text-align: right;">(Fidelity, n.d., MACD)</p>
</div>

Here is how MACD will be calculated in my analysis, starting with the following formula to calculate EMA:

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/MACD_EMA_formula.png">
</div>

To calculate the 12-day EMA, we need to start out by calculating the 12-day SMA (simple moving average).

| Day | Price |
|-----|-------|
| 1   | 100   |
| 2   | 102   |
| 3   | 105   |
| 4   | 107   |
| 5   | 110   |
| 6   | 108   |
| 7   | 111   |
| 8   | 113   |
| 9   | 112   |
| 10  | 115   |
| 11  | 118   |
| 12  | 120   |

**Step 1)** Calculate SMA

12-day SMA = sum of stock prices / 12 days = (100 + 102 + 105 + 107 + 110 + 108 + 111 + 113 + 112 + 115 + 118 + 120) / 12 = 100.08

Although 100.08 is the SMA of those 12 days, it is also called EMA(1) or EMA1.

<br>

**Step 2)** Calculate the EMA for subsequent days using the formula

* For Day 2

EMA(2) = (2 / (12 + 1)) × (102−100.08) + 100.08 = 0.1538 × 1.92 + 100.08 = 100.3753

* For Day 3

EMA(3) = (2 / (12 + 1)) × (105−100.3753) + 100.3753 = 0.1538 × 4.6247 + 100.3753 = 101.0866

* For Day 4

EMA(4) = (2 / (12 + 1)) × (107−101.0866) + 101.0866 = 0.1538 × 5.9134 + 101.0866 = 101.9961

Keep repeating until you get to day 12 or EMA(12); the value on this day is the 12-day EMA. When calculating 26-day EMA, the calculation is the same except the calculation will use a 26 day period. 

Then as mentioned previously, to get the MACD value (or MACD line), just subtract the 26-day EMA from the 12-day EMA.
<br>
<br>
<br>

Now, here is how the signal line will be calculated in my analysis, the calculation for the signal line is similar as it uses a similar EMA formula. To calculate the signal line, we have to use the following formula:

<div style="text-align: center;">
  <img src="C:/Users/james/Documents/WGU/Course-D214-MSDA Capstone/Screenshots/MACD_EMA_formula_signal_line.png">
</div>

(Example data)
| Day | 26-Day EMA | 12-Day EMA | MACD (12-Day EMA - 26-Day EMA) |
|-----|------------|------------|--------------------------------|
| 1   | 0.5        | 0.5        | 0                              |
| 2   | 0.515      | 0.531      | 0.016                          |
| 3   | 0.520      | 0.537      | 0.017                          |
| 4   | 0.531      | 0.555      | 0.024                          |
| 5   | 0.541      | 0.580      | 0.039                          |
| 6   | 0.563      | 0.601      | 0.038                          |
| 7   | 0.584      | 0.620      | 0.036                          |
| 8   | 0.602      | 0.634      | 0.032                          |
| 9   | 0.620      | 0.646      | 0.026                          |

**Step 1)** Calculate the 9-day SMA of MACD

9-day SMA = 0 + 0.016 + 0.017 + 0.024 + 0.039 + 0.038 + 0.036 + 0.032 + 0.026 / 9 = 0.0253

Although 0.0253 is the SMA of those 9 days, it is also called EMA(1) or EMA1 of the signal line.

<br>

**Step 2)** Apply EMA formula until we reach Day 9 which gives us the value of the 9-day EMA of the MACD Line

* For Day 2

EMA(2) = (2 / (9 + 1)) × (0.016−0.0253) + 0.0253 = 0.2 × (-0.0093) + 0.0253 = 0.02344

* For Day 3

EMA(3) = (2 / (9 + 1)) × (0.017−0.02344) + 0.02344 = 0.2 × (-0.00644
) + 0.02344 = 0.0222

* Repeat until Day 9 to get EMA(9) or EMA 9; this is the 9-day EMA of the MACD line, or in other words the value of the signal line.

<br>
To summarize, MACD (or MACD Line) will be calculated using a 12-day and 26-day lookback period in my analysis by subtracting the 26-day EMA from the 12-day EMA. The signal line will be calculated as the 9-day EMA of the MACD Line. These periods are standard and widely accepted for calculating the MACD and signal line.
<br>
<br>


**6) RNN (Recurrent Neural Network)** 

An RNN (Recurrent Neural Network) is a type of neural network designed for processing sequences of data, such as time series, speech, or text. Unlike traditional feedforward neural networks, RNNs have connections that loop back on themselves, allowing information to persist across steps in the sequence.

The key feature of an RNN is its ability to maintain a hidden state that acts as memory, enabling the network to remember past information and use it to influence predictions for future inputs. This makes RNNs suitable for tasks where the order and context of data are important.

For example, in addition to an RNN's ability to retain and process long-term dependencies from past information, it is also capable of understanding the context and meaning of a sentence based on the order of words. This ability allows the network to distinguish between different sentence structures that might use the same words but convey entirely different meanings, such as "the mouse ate the cat" versus "the cat ate the mouse."

While both sentences consist of the same words, their meaning changes dramatically depending on the order of the words. An RNN processes each word sequentially (processes each word, one at a time, from left to right) and updates it hidden state to reflect which is the word that performs the action (the eating) and which is the word that is the object being eaten.

In my project, I am deploying a time series model, where I believe the sequence of variables plays a crucial role in predicting future prices making an RNN a robust choice for my model selection. For instance, yesterday’s price action should have a stronger influence on predicting today’s stock prices than price action from ten days ago.

Since I am using 30 days of historical data to predict future stock prices, it is important that I use a model such as an RNN which can recognize the order in which events occur significantly influences future outcomes. For instance, the price action over the last few days is more relevant for predicting today's stock price than events that happened weeks ago.

However, vanilla RNNs have limitations, such as difficulty learning long-term dependencies due to issues like vanishing gradients. To address this, more advanced versions of RNNs like LSTMs (Long Short-Term Memory) were developed and is better at capturing long-range dependencies in sequences.
<br>

**7) Long Short-Term Memory (LSTM) neural network** 

Specifically, to answer my research question I will be using a variant of the RNN, which is called a Long Short-Term Memory (LSTM) neural network.

LSTMs (Long Short-Term Memory) are advanced types of RNNs that address common RNN issues like vanishing gradients. They are particularly effective for long text sequences because they retain information over longer dependencies.

To understand the issue of a vanishing gradient, it is important to understand what a gradient is. The gradient is the slope or the direction of deepest descent in relation to minimizing the error (cost) function. For a simplified understanding let us take the equation for univariate linear regression: y = mx + b. By getting the best fit line through our set of data points we now have 'm' (slope) and 'b' (intercept). But the values for 'm' and 'b' did not come from just anywhere - it came from minimizing the the linear regression error function. In essence the error function minimizes SSR which is the sum of the square root of the residuals (observed vs. predicted values) to get the best values for 'm' and 'b'. 

Let us pretend the error function for linear regression was f(x) = x^2 (which it is not, but pretend it is). We can see that the error function graphs a parabola. Let us for example plug in the value where x = 4, this returns a value of 16 as 4^2 = 16. So we get the coordinates on our parabola of (4,16), which are simple (x,y) coordinates, where the value of 16 is our total error when x = 4. If we take the partial derivative of the error function, f(x) = x^2 in respect to 'x', we get the following equation: d/dx = 2x. Here the 'd/dx' just refers to the partial derivative in respect to 'x'. So now, using the same x = 4 value to plug into our partial derivative formula we get: 2 * 4 = 8. Here, the value 8 represents the slope tangent to the coordinates (4,16); or we can say that at coordinates (4,16) the value 8 represents the direction of deepest descent in relation to minimizing the error function. Specifically, the value of 8 is our gradient when x = 4. We mentioned previously that for our error function, when x = 4, y = 16, the y-value of '16' represents the total error. Now, that we have our gradient we can subtract our current 'x' value of 4 by the gradient of 8, which gives us a new value, x = -4. However, when we do that we run into a problem where the error function will never be minimized as when we plug x = -4 back into our error function it returns the same total error value of 16. This is why it is recommended to multiply the gradient by a learning rate before subtracting it from the previous 'x' value. If our learning rate = 0.5, we can multiply our gradient: 0.25 * 8 = 2. So now we can take our 'x' value of 4 and subtract it with our gradient with the learning rate applied: 4 - (0.25 * 8) = 4 - 2 = 2. The value '2' will now be plugged into our error function f(x) = x^2 -> 2^2 = 4. Now we can see our total error is at '4', which is much lower than it was at '16' previously. We repeat this process iteratively, adjusting x each time, until the error is as small as possible. This in a nutshell is gradient descent. However just to clarify, in the case of linear regression, the actual error function requires computing the partial derivatives of the error function with respect to both 'm' (slope) and 'b' (intercept). These gradients are then used to minimize the error by updating both 'm' and 'b' iteratively.

Now that we understand what gradient descent is, neural networks rely on a process called backpropagation to adjust their weights (think of this as the slope and intercept in the previous example) based on the gradient, which indicates the direction of deepest descent in relation to minimizing error. However, in very deep networks or sequential models like Recurrent Neural Networks (RNNs), this process can lead to the vanishing gradient problem. As the gradients are passed backward through multiple layers or time steps, they are repeatedly multiplied, and if they are small values (fractions), this causes them to shrink exponentially. Eventually, the gradients become nearly zero, meaning earlier layers or time steps receive insignificant updates during training. As a result, the model struggles to learn long-term dependencies or patterns, making it difficult to capture important relationships in the data.

LSTM networks address this issue as their architecture consists of "cells", rather than traditional "nodes" found in other neural networks. These LSTM cells are the fundamental building blocks of LSTM networks which help address the vanishing gradient problem.

Each LSTM cell has a memory cell, a hidden state, and three main gates: the forget gate, the input gate, and the output gate. The memory cell stores important information over time. The forget gate decides what information in the memory cell should be discarded, helping the network forget irrelevant or outdated data. The input gate controls which new information should be added to the memory cell, allowing the LSTM to update its knowledge with new data as it comes in. The output gate determines which part of the information in the memory cell should be passed on to the hidden state, which is then passed on the the next LSTM cell. 

By controlling the flow of information in and out of the memory cell, the LSTM can remember long-term patterns in the data while preventing the vanishing gradient problem, which helps the model learn more effectively over longer sequences. 

I am using a sequence of 30 days of stock data (one observation), then the LSTM network will process each day of the sequence as a separate time step. For each of these time steps (daily data), there will be an LSTM cell that processes the data.

In my case:

* One obervation consists of 30 days of data (including candlestick data like open, close, high, low prices, as well as technical indicators like MFI, MACD, and RSI)
* There are 30 time steps because each day is treated as one time step
* There are 30 LSTM cells because one LSTM cell processes only one time step (one day)

Thus, the LSTM network will process the full 30-day sequence, one day (or time step) at a time, with each LSTM cell handling a single time step in the sequence. After the LSTM processes the one time step, it passes information to the next cell (via the output gate to the hidden state, then to the next cell) in the sequence. This allows the network to capture the temporal dependencies and patterns over the 30 days of data, meaning that earlier time steps are influencing later time steps. This allows the network to learn patterns over time and remember information from previous time steps, which is crucial for time-series data like stock prices.

After training the model, the LSTM will make predictions about future stock price movements by leveraging the patterns it has learned from the 30 time steps (days) across all of my observations.

The combination of these gates allows the LSTM to maintain and manage long-term dependencies in the data, unlike traditional RNNs, which struggle with this due to the vanishing gradient problem. As the gates selectively control the flow of information, the LSTM network can "remember" important information over long periods, making it very effective for time-series data such as stock prices, where past data can influence future trends. This is particularly useful in my research on stock price prediction, where patterns and trends from previous days or months could significantly impact predictions for the next day.

So, that we fully understand, I am going to use a quick second example. If we have a dataset with sentences that are all 10 words long--each sentence would be an observation, the sequence is the entire sentence, so 10 words long--one time step corresponds to one word in the sequence--since I have 10 time steps (one per word), I will have 10 LSTM cells, one for each word in the sentence. So, for each sentence (observation) with 10 words, the LSTM will process the sentence word by word (one time step at a time), and each word will be processed by an individual LSTM cell, with information being passed from one cell to the next.


### Summary of Variables and Process of Analysis

* Independent variables (8 variables)
    + Stock Price: Low
    + Stock Price: High
    + Stock Price: Open
    + Stock Price: Close
    + Relative Strength Index (RSI)
    + Money Flow Index (MFI)
    + Moving Average Convergence Divergence (MACD): MACD line
    + Moving Average Convergence Divergence (MACD): Signal line
<br><br>

* Dependent variable
    + Binary Classification: Predicting if the closing price will go 'up' or 'down' for multiple future time periods (1 day, 3 days, 5 days, 10 days, and 15 days), compared to the closing price from the last candle of the identified candlestick pattern.
    + Regression: Predicting the exact closing price (a continuous value) for multiple future time periods (1 day, 3 days, 5 days, 10 days, and 15 days), after the identified candlestick pattern.

<br>
As a reminder, we are going to identify these five candlestick patterns and train an LSTM on their 30-day sequences:

1. Hammer
2. Inverted Hammer
3. Bullish Englufing Pattern
4. Bullish Harami
5. Three White Soldiers

For example, with our data of daily stock prices we will identify all "hammer" candlestick patterns using the data definition I previously defined. I am then going to lookback 30 days from the end of my identified candlestick pattern to gather all the data for my independent variables; the last candle in the candlestick pattern is the 30th day in the sequence.

A specific example is that I identified the last candle in a "hammer" candlestick pattern to occur on January 30th. So, since my sequence is 30 days, I will take all the data from January 1 to January 30th. On each of those days, I will gather my 8 independent variables. Each time-step represents one day, meaning I have 8 independent variables per day, and across the entire 30-day sequence, I will have a dataset with 30 time-steps, each containing 8 features. This 30-day sequence will serve as the input to the LSTM model, which will learn to recognize patterns in the time series data and predict the closing price for future days, conditioned on the identified candlestick pattern.

Although, some of the identified patterns are of different lengths; the hammer pattern is identified using one days's worth of data, while the three white soldiers is identified using three day's worth of data--my sequence will always incorporate only 30 days worth of data.

Just something to note, since my independent variable data is a 30-day sequence, it is on the 30th day of that sequence which includes the data of the last candle for the identified candlestick pattern. On the 30th day of the sequence is where I retrieve my closing prices to compare against the closing price of a future day.

To conclude, at the end of my analysis I will have five different trained LSTMs each using a different set of 30-day sequence data which represents a specific candlestick pattern.

# Data Acquisition - _B, C_

### Load necessary packages for data acquisition:

1. yfinance: Used to access Yahoo finance data via their API
2. pandas: Used to read the finance data into a data frame and inspect data

An important thing to note about the 'yfinance' package is that it automatically adjusts the stock prices for stock splits when retrieving historical data. This is great, so that I do not have to manually adjust them.

In [1]:
#load packages
import yfinance as yf
import pandas as pd

### Preview Data

I am going to select the stock ticker 'SPY' to perform my initial data cleaning. Once I create my data cleaning function, I will be able to apply it to other stock tickers. However, for this project, I am selecting 'SPY' because it encompasses many stocks and represents a broad market index, making it a reliable proxy for overall market performance.

In [2]:
ticker = "SPY" #The stock ticker for 'S&P 500'
ticker_symbol = ticker #used for the training step in a later section as need to save ticker data
ticker = yf.Ticker(ticker)

# Define the custom date range (start and end dates in 'YYYY-MM-DD' format)
start_date = "2000-01-01"
end_date = "2025-2-14"

# Fetch the historical data for the defined date range
finance_df = ticker.history(start=start_date, end=end_date)
#finance_df = ticker.history(period="10y") #Get 10 years worth of data, #10y, #max

#write out CSV
finance_df.to_csv(f'{ticker_symbol}_financials_output.csv', index=False)  # `index=False` avoids writing the index column

#preview data
finance_df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Capital Gains
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2000-01-03 00:00:00-05:00,94.485455,94.485455,91.697098,92.69294,8164300,0.0,0.0,0.0
2000-01-04 00:00:00-05:00,91.478006,91.816592,88.998361,89.068069,8089800,0.0,0.0,0.0
2000-01-05 00:00:00-05:00,89.18756,90.203319,87.474713,89.227394,12177900,0.0,0.0,0.0
2000-01-06 00:00:00-05:00,88.988414,90.183424,87.793404,87.793404,6227200,0.0,0.0,0.0
2000-01-07 00:00:00-05:00,89.426569,92.892097,89.267234,92.892097,8066500,0.0,0.0,0.0


# Data Cleaning and Exploratory Data Analysis

### Load necessary packages for data acquisition:

1. numpy: Used for numerical computations, in my case, computations with data frames

In [3]:
#load package
import numpy as np

### Data Cleaning Process

1. Identify the 'hammer' pattern

2. Identify the 'inverted hammer' pattern

3. Identify the 'bullish engulfing' pattern

4. Identify the 'bullish harami' pattern

5. Identify the 'three white soldiers' pattern

6. Perform Further Cleaning on Variables / Calculate Variables (RSI, MACD, MFI)

### Data Cleaning Outcomes

Each step in this 'data cleaning outcomes' section corresponds to the numeric step in the 'data cleaning process' section.

**1) Identify the Hammer Pattern**

Remember my following definition of the hammer candlestick pattern:

_Looking for a bullish or bearish candlestick with a small body (body is less than 30% of the total candlestick length; candlestick length is the distance between low and high), with the lower shadow (the distance between the low and the open for a bullish candle, or the low and the close for a bearish candle) being at least two times the length of the body. There also must be little to no upper shadow (upper shadow will be less than 10% of the total candle length). This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. It doesn't matter if the single candlestick is a bearish or bullish candle, they both can display the hammer pattern._

In [4]:
finance_df = finance_df.drop('Dividends', axis=1) #remove 'Dividends' column
finance_df = finance_df.drop('Stock Splits', axis=1) #remove 'Dividends' column
finance_df = finance_df.drop('Capital Gains', axis=1) #remove 'Dividends' column
finance_df['Row_index'] = range(1, len(finance_df) + 1) #creates index column
finance_df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Row_index
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-03 00:00:00-05:00,94.485455,94.485455,91.697098,92.69294,8164300,1
2000-01-04 00:00:00-05:00,91.478006,91.816592,88.998361,89.068069,8089800,2
2000-01-05 00:00:00-05:00,89.18756,90.203319,87.474713,89.227394,12177900,3
2000-01-06 00:00:00-05:00,88.988414,90.183424,87.793404,87.793404,6227200,4
2000-01-07 00:00:00-05:00,89.426569,92.892097,89.267234,92.892097,8066500,5


In [5]:
#Creates a conditional statement based on conditions, and applies the 'choices' label.
#This will result in a new column 'Candle', which describes if the daily candle was 'Bullish',
#Bearish, or Neutral
conditions = [
    finance_df['Close'] > finance_df['Open'],
    finance_df['Close'] == finance_df['Open'],
    finance_df['Open'] > finance_df['Close']
]
choices = ['Bullish', 'Neutral', 'Bearish']
finance_df['Candle'] = np.select(conditions, choices, default='Unknown')
finance_df.head() 

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Row_index,Candle
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2000-01-03 00:00:00-05:00,94.485455,94.485455,91.697098,92.69294,8164300,1,Bearish
2000-01-04 00:00:00-05:00,91.478006,91.816592,88.998361,89.068069,8089800,2,Bearish
2000-01-05 00:00:00-05:00,89.18756,90.203319,87.474713,89.227394,12177900,3,Bullish
2000-01-06 00:00:00-05:00,88.988414,90.183424,87.793404,87.793404,6227200,4,Bearish
2000-01-07 00:00:00-05:00,89.426569,92.892097,89.267234,92.892097,8066500,5,Bullish


In [6]:
#Create column 'Body_length' which calculates the body length of the 
#candle. 
finance_df['Body_length'] = abs(finance_df['Close'] - finance_df['Open']) #get the absolute number

In [7]:
#Create column 'Lower_shadow_length' which calculates the distance between 
#the low and the open for a bullish candle, or the low and the close for a bearish candle.
conditions = [
    finance_df['Candle'] == 'Bullish',
    finance_df['Candle'] == 'Neutral',
    finance_df['Candle'] == 'Bearish'
]
choices = [finance_df['Open'] - finance_df['Low'], 
           finance_df['Open'] - finance_df['Low'], 
           finance_df['Close'] - finance_df['Low']]
finance_df['Lower_shadow_length'] = np.select(conditions, choices, default=0.0)

In [8]:
#Create column 'Upper_shadow_length' which calculates the distance between 
#the high and the close for a bullish candle, or the high and the open for a bearish candle.
conditions = [
    finance_df['Candle'] == 'Bullish',
    finance_df['Candle'] == 'Neutral',
    finance_df['Candle'] == 'Bearish'
]
choices = [finance_df['High'] - finance_df['Close'], 
           finance_df['High'] - finance_df['Close'], 
           finance_df['High'] - finance_df['Open']]
finance_df['Upper_shadow_length'] = np.select(conditions, choices, default=0.0)

In [9]:
#Create column 'Total_candle_length' which calculates the distance between low and high prices
finance_df['Total_candle_length'] = finance_df['High'] - finance_df['Low']
finance_df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Row_index,Candle,Body_length,Lower_shadow_length,Upper_shadow_length,Total_candle_length
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2000-01-03 00:00:00-05:00,94.485455,94.485455,91.697098,92.69294,8164300,1,Bearish,1.792515,0.995842,0.0,2.788357
2000-01-04 00:00:00-05:00,91.478006,91.816592,88.998361,89.068069,8089800,2,Bearish,2.409937,0.069709,0.338586,2.818232
2000-01-05 00:00:00-05:00,89.18756,90.203319,87.474713,89.227394,12177900,3,Bullish,0.039834,1.712847,0.975925,2.728606
2000-01-06 00:00:00-05:00,88.988414,90.183424,87.793404,87.793404,6227200,4,Bearish,1.19501,0.0,1.19501,2.39002
2000-01-07 00:00:00-05:00,89.426569,92.892097,89.267234,92.892097,8066500,5,Bullish,3.465529,0.159335,0.0,3.624863


In [10]:
hammer = [] #initialize empty list
for index, row in finance_df.iterrows():
    start_index = row['Row_index'] - 5 #Get the starting index of the fifth previous candle
    end_index = row['Row_index'] - 1 #subtacting 1 because only getting previous 5 candlestick closing prices, not current candle's close
    #for example, if row['Row_index'] = 6; start_index = 6 - 5 = 1; end_index = 6 - 1 = 5 --- getting rows 1 (start_index) through 5 (end_index) which 
    #is the previous five candles data since our current candle is the sixth candle
    
    
    if (start_index < 1): #since we are subtracting to get starting index, it will be negative numbers at first; skip these
        hammer.append("No")
        continue
        
    temp_df = finance_df[(finance_df['Row_index'] >= start_index) & (finance_df['Row_index'] <= end_index)]
    closing_prices = temp_df['Close'].values

    #Fit a regression line (line of best fit) through the closing prices; negative slope represent current downtrend
    x = np.arange(len(closing_prices))
    slope, intercept = np.polyfit(x, closing_prices, 1) 

    #if slope <= 0, then in a current downtrend the previous 5 days in terms of closing prices
    #to clarify, although slope may be zero which means no change, I will still be counting this as a downtrend
    if (slope <= 0):
        if ((row['Lower_shadow_length']) >= (row['Body_length'] * 2)): #lower shadow must be at least twice as long as body length
            if ((row['Upper_shadow_length']) < (row['Total_candle_length'] * 0.10)): #Upper shadow is less than 10% of the total candle length
                if ((row['Body_length']) < (row['Total_candle_length'] * 0.30)): #body is less than 30% of the total candlestick length
                    hammer.append("Yes")
                else:
                    hammer.append("No")
            else:
                hammer.append("No")
        else:
            hammer.append("No")
            
    elif (slope > 0):
        hammer.append("No")


#Create new column
finance_df['Hammer_pattern'] = hammer

In [11]:
#We can see that we now have a new column 'Hammer_pattern' -- 'Yes' = hammer pattern is identified
finance_df[finance_df['Hammer_pattern'] == "Yes"].head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Row_index,Candle,Body_length,Lower_shadow_length,Upper_shadow_length,Total_candle_length,Hammer_pattern
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2000-02-14 00:00:00-05:00,89.087943,89.087943,88.151852,88.908691,8528800,30,Bearish,0.179251,0.756839,0.0,0.936091,Yes
2000-05-22 00:00:00-04:00,90.252842,90.392615,87.53727,89.49408,10839400,98,Bearish,0.758763,1.956809,0.139772,2.855344,Yes
2000-09-14 00:00:00-04:00,95.989388,96.029417,94.888592,95.839279,3397100,178,Bearish,0.150109,0.950687,0.040029,1.140825,Yes
2001-02-14 00:00:00-05:00,85.432068,85.432068,84.150432,85.052086,8400100,283,Bearish,0.379982,0.901654,0.0,1.281636,Yes
2001-02-22 00:00:00-05:00,81.374623,81.496993,79.229964,81.02684,21281600,288,Bearish,0.347783,1.796876,0.122369,2.267028,Yes


In [12]:
#We can see how many hammer candlestick patterns are present
finance_df['Hammer_pattern'].value_counts()

Hammer_pattern
No     6248
Yes      70
Name: count, dtype: int64

**2) Identify the Inverted Hammer Pattern**

Remember my following definition of the inverted hammer candlestick pattern:

_Looking for a candlestick (bullish or bearish) with a small body (less than 30% of the total candlestick length; candlestick length is the distance between low and high) at the bottom, a long upper shadow (at least twice the length of the body), and little to no lower shadow (lower shadow will be less than 10% of the total candle length). This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. It doesn't matter if the single candlestick is a bearish or bullish candle, they both can display the inverted hammer pattern._

In [13]:
inverted_hammer = [] #initialize empty list
for index, row in finance_df.iterrows():
    start_index = row['Row_index'] - 5 #Get the starting index of the fifth previous candle
    end_index = row['Row_index'] - 1 #subtacting 1 because only getting previous 5 candlestick closing prices, not current candle's close
    
    if (start_index < 1): #since we are subtracting to get starting index, it will be a negative number; skip these
        inverted_hammer.append("No")
        continue
        
    
    temp_df = finance_df[(finance_df['Row_index'] >= start_index) & (finance_df['Row_index'] <= end_index)]
    closing_prices = temp_df['Close'].values

    #Fit a regression line (line of best fit) through the closing prices; negative slope represents current downtrend
    x = np.arange(len(closing_prices))
    slope, intercept = np.polyfit(x, closing_prices, 1) 

    #if slope <= 0, then in a current downtrend the previous 5 days in terms of closing prices
    #to clarify, although slope may be zero which means no change, I will still be counting this as a downtrend
    if (slope <= 0):
        if ((row['Upper_shadow_length']) >= (row['Body_length'] * 2.0)): #upper shadow is at least twice the length of the candle body
            if ((row['Lower_shadow_length']) < (row['Total_candle_length'] * 0.10)): #lower shadow is less than 20% of the total candle length
                if ((row['Body_length']) < (row['Total_candle_length'] * 0.30)): #body is less than 30% of the total candlestick length
                    inverted_hammer.append("Yes")
                else:
                    inverted_hammer.append("No")
            else:
                inverted_hammer.append("No")
        else:
            inverted_hammer.append("No")
                
    elif (slope > 0):
        inverted_hammer.append("No")

#Create new column
finance_df['InvertedHammer_pattern'] = inverted_hammer

In [14]:
#We can see how many inverted hammer candlestick patterns are present
finance_df['InvertedHammer_pattern'].value_counts()

InvertedHammer_pattern
No     6266
Yes      52
Name: count, dtype: int64

**3) Identify the Bullish Engulfing Pattern**

Remember my following definition of the bullish engulfing candlestick pattern:

_Looking for a small bearish candle followed by a large bullish candle that completely engulfs the range (high and low prices) of the first candle (two-candle pattern). The second candle's body (open and close price) must be larger by at least 2 times the body of the first candle and fully engulf the body of the first candle. This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. Note that the first candle (the bearish candle) in the identified pattern will be treated as the fifth candle in the slope calculation for confirming the downtrend._

In [15]:
bullish_engulfing = [] #initialize empty list
slope = np.array([])
for index, row in finance_df.iterrows():
    start_index = row['Row_index'] - 5 #Get the starting index of the fifth previous candle
    end_index = row['Row_index'] - 1 #subtacting 1 because only getting previous 5 candlestick closing prices, not current candle's close
    
    if (start_index < 1): #since we are subtracting to get starting index, it will be a negative number; skip these
        bullish_engulfing.append("No")
        continue
        
    #get data for five previous days worth of candlestick data
    temp_df = finance_df[(finance_df['Row_index'] >= start_index) & (finance_df['Row_index'] <= end_index)]
    closing_prices = temp_df['Close'].values

    #check to see if the directly previous candle is a bearish candle, this is a requirement for identifying bullish engulfing pattern
    check_df = finance_df[finance_df['Row_index'] == end_index]
    if (check_df['Candle'].iloc[0] == 'Bullish'):
        bullish_engulfing.append("No")
        continue
        

    #Fit a regression line (line of best fit) through the closing prices; negative slope represents current downtrend
    x = np.arange(len(closing_prices))
    slope, intercept = np.polyfit(x, closing_prices, 1) 

    #if slope <= 0, then in a current downtrend the previous 5 days in terms of closing prices
    #to clarify, although slope may be zero which means no change, I will still be counting this as a downtrend.
    #The final candle in the two-candlestick pattern must be bullish
    if ((slope <= 0) & (row['Candle'] == 'Bullish')):
        
        #Looking for a small bearish candle followed by a large bullish candle that completely engulfs the range (high and low prices) of the first candle (two-candle pattern)
        if ((row['High'] > check_df['High'].iloc[0]) & (row['Low'] < check_df['Low'].iloc[0])):
        
            if ((row['Body_length']) >= (check_df['Body_length'].iloc[0] * 2.0)): #the second candle's body must be larger by at least 2 times the body of the previous candle
                
                #The second candle's body must fully engulf the body of the first candle.
                #Specifically, the second candle's close must be larger than the previous candle's open and the second candle's open must be less
                #than the second candle's close. This is because the second candle is a bullish candle and the previous candle is a bearish candle.
                if ((row['Close'] > check_df['Open'].iloc[0]) & (row['Open'] < check_df['Close'].iloc[0])): 
                    bullish_engulfing.append("Yes")
                else:
                    bullish_engulfing.append("No")
            else:
                    bullish_engulfing.append("No")
        else:
            bullish_engulfing.append("No")
                
    else:
        bullish_engulfing.append("No")

#Create new column
finance_df['BullishEngulfing_pattern'] = bullish_engulfing

In [16]:
#We can see how many bullish engulfing candlestick patterns are present
finance_df['BullishEngulfing_pattern'].value_counts()

BullishEngulfing_pattern
No     6294
Yes      24
Name: count, dtype: int64

**4) Identify the Bullish Harami Pattern**

Remember my following definition of the bullish harami candlestick pattern:

_Looking for a large bearish candle followed by a small bullish candle that is entirely within the range of the first candle; this means that the small bullish candle should be fully contained within the previous candle's high and low. The large bearish candle will be at least twice the entire length of the following bullish candle. The body of the large bearish candle will also completely engulf the body of the small bullish candle. This pattern must occur during a downtrend; to confirm a downtrend, I will fit a regression line (line of best fit) through the closing prices for the previous five candles-if the slope is negative or equal to zero, this means that a downtrend is present. Note that the first candle (the bearish candle) in the identified pattern will be treated as the fifth candle in the slope calculation for confirming the downtrend._

In [17]:
bullish_harami = [] #initialize empty list
slope = np.array([])
for index, row in finance_df.iterrows():
    start_index = row['Row_index'] - 5 #Get the starting index of the fifth previous candle
    end_index = row['Row_index'] - 1 #subtacting 1 because only getting previous 5 candlestick closing prices, not current candle's close
    
    if (start_index < 1): #since we are subtracting to get starting index, it will be a negative number; skip these
        bullish_harami.append("No")
        continue
        
    #get data for five previous days worth of candlestick data
    temp_df = finance_df[(finance_df['Row_index'] >= start_index) & (finance_df['Row_index'] <= end_index)]
    closing_prices = temp_df['Close'].values

    #check to see if the directly previous candle is a bearish candle, this is a requirement for identifying bullish harami pattern
    check_df = finance_df[finance_df['Row_index'] == end_index]
    if (check_df['Candle'].iloc[0] == 'Bullish'):
        bullish_harami.append("No")
        continue
        

    #Fit a regression line (line of best fit) through the closing prices; negative slope represents current downtrend
    x = np.arange(len(closing_prices))
    slope, intercept = np.polyfit(x, closing_prices, 1) 

    #if slope <= 0, then in a current downtrend the previous 5 days in terms of closing prices
    #to clarify, although slope may be zero which means no change, I will still be counting this as a downtrend.
    #The final candle in the two-candlestick pattern must be bullish.
    if ((slope <= 0) & (row['Candle'] == 'Bullish')):
        if ((check_df['Total_candle_length'].iloc[0]) >= (row['Total_candle_length'] * 2.0)): #the bearish candle will be at least twice the length of the bullish candle
            
            #The second candle must be fully contained within the previous candle's high and low.
            if ((row['High'] < check_df['High'].iloc[0]) & (row['Low'] > check_df['Low'].iloc[0])): 

                #The second candle's body must also be fully contained within the previous candle's body (close and open).
                if ((row['Open'] > check_df['Close'].iloc[0]) & (row['Close'] < check_df['Open'].iloc[0])): 
                    bullish_harami.append("Yes")
                else:
                    bullish_harami.append("No")
            
            else:
                bullish_harami.append("No")
        else:
                bullish_harami.append("No")
                
    else:
        bullish_harami.append("No")

#Create new column
finance_df['BullishHarami_pattern'] = bullish_harami

In [18]:
#We can see how many bullish harami candlestick patterns are present
finance_df['BullishHarami_pattern'].value_counts()

BullishHarami_pattern
No     6288
Yes      30
Name: count, dtype: int64

**5) Identify the Three White Soldiers Pattern**

Remember my following definition of the three white soliders candlestick pattern:

_Looking for three consecutive bullish candles with each one closing higher than the previous candle. The candles should show a steady upward movement without large wicks. The upper and lower wicks should each be no more than 20% of the total candle length. Unlike the other patterns, this pattern does not need to occur during a downtrend. The visual below is a representation of how I am going to identify the three white soliders pattern as highlighted with blue dots._

In [19]:
three_white_soldiers = [] #initialize empty list
slope = np.array([])
for index, row in finance_df.iterrows():
    start_index = row['Row_index'] - 2 #Get the starting index of the second previous candle
    end_index = row['Row_index'] - 1 #subtacting 1 because only getting previous 2 candlestick closing prices, not current candle's close
    
    if (start_index < 1): #since we are subtracting to get starting index, it will be a negative number; skip these
        three_white_soldiers.append("No")
        continue
        
    #get data for five previous days worth of candlestick data. Check to see if previous two candles are bullish; since need three total bullish candles
    #for this pattern.
    check_df = finance_df[(finance_df['Row_index'] >= start_index) & (finance_df['Row_index'] <= end_index)]
    if ((check_df['Candle'].iloc[0] == 'Bearish') or (check_df['Candle'].iloc[1] == 'Bearish')):
        three_white_soldiers.append("No")
        continue
    

    #The final candle in the three-candlestick pattern must be bullish.
    if (row['Candle'] == 'Bullish'):

        #the first candle in the pattern should have upper and lower wicks no more than 20% of the total candle length
        if (((check_df['Lower_shadow_length'].iloc[0]) <= (check_df['Total_candle_length'].iloc[0] * 0.20)) & ((check_df['Upper_shadow_length'].iloc[0]) <= (check_df['Total_candle_length'].iloc[0] * 0.20))):

            #the second candle in the pattern should have upper and lower wicks no more than 20% of the total candle length
            if (((check_df['Lower_shadow_length'].iloc[1]) <= (check_df['Total_candle_length'].iloc[1] * 0.20)) & ((check_df['Upper_shadow_length'].iloc[1]) <= (check_df['Total_candle_length'].iloc[1] * 0.20))): 
                
                #The three candles should have increasing closing prices
                if ((row['Close'] > (check_df['Close'].iloc[1])) & ((check_df['Close'].iloc[1]) > (check_df['Close'].iloc[0]))): 
                    three_white_soldiers.append("Yes")
                else:
                    three_white_soldiers.append("No")
            else:
                three_white_soldiers.append("No")
        else:
            three_white_soldiers.append("No")
                
    else:
        three_white_soldiers.append("No")

#Create new column
finance_df['ThreeWhiteSoldiers_pattern'] = three_white_soldiers

In [20]:
#We can see how many three white soldier candlestick patterns are present
finance_df['ThreeWhiteSoldiers_pattern'].value_counts()

ThreeWhiteSoldiers_pattern
No     6284
Yes      34
Name: count, dtype: int64

### Variable Cleaning

**6) Perform Further Cleaning on Variables**

I will also perform further cleaning below to calculate the following values needed to train my model:

* MACD and Signal Line
* RSI
* MFI
* Normalize stock prices two different ways: Log transform & Sklearn's scaler function
* Creating the 'random' column in the dataset with randomly assigned "Yes" and "No" values

In [21]:
from sklearn.preprocessing import MinMaxScaler #Used to normalize data

###Calculate MACD and Signal Line
# Calculate the 12-day EMA
finance_df['EMA12'] = finance_df['Close'].ewm(span=12, adjust=False).mean()

# Calculate the 26-day EMA
finance_df['EMA26'] = finance_df['Close'].ewm(span=26, adjust=False).mean()

# Calculate the MACD (12-day EMA - 26-day EMA)
finance_df['MACD'] = finance_df['EMA12'] - finance_df['EMA26']

# Calculate the Signal Line (9-day EMA of MACD)
finance_df['Signal_Line'] = finance_df['MACD'].ewm(span=9, adjust=False).mean()



###RSI
# Calculate the daily price changes
finance_df['Price_Change'] = finance_df['Close'].diff()

# Separate gains and losses
finance_df['Gain'] = finance_df['Price_Change'].apply(lambda x: x if x > 0 else 0)
finance_df['Loss'] = finance_df['Price_Change'].apply(lambda x: -x if x < 0 else 0)

# Calculate the average gain and loss over a 14-day period
period = 14
finance_df['Avg_Gain'] = finance_df['Gain'].rolling(window=period, min_periods=1).mean()
finance_df['Avg_Loss'] = finance_df['Loss'].rolling(window=period, min_periods=1).mean()

# Calculate the relative strength (RS)
finance_df['RS'] = finance_df['Avg_Gain'] / finance_df['Avg_Loss']

# Calculate the RSI
finance_df['RSI'] = 100 - (100 / (1 + finance_df['RS']))


####Used to calculate MFI
# Step 1: Calculate the Typical Price (TP)
finance_df['TP'] = (finance_df['High'] + finance_df['Low'] + finance_df['Close']) / 3

# Step 2: Calculate the Money Flow (MF)
finance_df['MF'] = finance_df['TP'] * finance_df['Volume']

# Step 3: Calculate Positive and Negative Money Flow
finance_df['Positive_MF'] = finance_df['MF'].where(finance_df['TP'] > finance_df['TP'].shift(1), 0)
finance_df['Negative_MF'] = finance_df['MF'].where(finance_df['TP'] < finance_df['TP'].shift(1), 0)

# Step 4: Calculate the rolling sum of Positive and Negative Money Flow over the specified period (e.g., 14 periods)
window = 14
finance_df['Positive_MF_sum'] = finance_df['Positive_MF'].rolling(window=window).sum()
finance_df['Negative_MF_sum'] = finance_df['Negative_MF'].rolling(window=window).sum()

# Step 5: Calculate the Money Flow Ratio
finance_df['Money_Flow_Ratio'] = finance_df['Positive_MF_sum'] / finance_df['Negative_MF_sum']

# Step 6: Calculate the Money Flow Index (MFI)
finance_df['MFI'] = 100 - (100 / (1 + finance_df['Money_Flow_Ratio']))


###Normalize stock price variables
#normalize via log transform
finance_df['Log_Close'] = np.log(finance_df['Close'])
finance_df['Log_Open'] = np.log(finance_df['Open'])
finance_df['Log_High'] = np.log(finance_df['High'])
finance_df['Log_Low'] = np.log(finance_df['Low'])

#normalize via Sklearn's scaler function
scaler_close = MinMaxScaler()
scaler_open = MinMaxScaler()
scaler_high = MinMaxScaler()
scaler_low = MinMaxScaler()
finance_df['Normalized_Close'] = scaler_close.fit_transform(finance_df[['Close']])
finance_df['Normalized_Open'] = scaler_open.fit_transform(finance_df[['Open']])
finance_df['Normalized_High'] = scaler_high.fit_transform(finance_df[['High']])
finance_df['Normalized_Low'] = scaler_low.fit_transform(finance_df[['Low']])


####Used to create a new column to test random values of 'yes' to simulate presence of a random pattern
# Specify the number of "Yes" values you want, may show up as less during training due to location of the "Yes" value, as need at least 30 days
#of data for the 30-day sequence, or if the future closing price is not available (only have data to 2/14)
num_yes = 200

# Create a list of "Yes" and "No" values
yes_no_list = ["Yes"] * num_yes + ["No"] * (len(finance_df) - num_yes)

#set seed for reproducibility
np.random.seed(6) 

# Shuffle the list to randomize the order
np.random.shuffle(yes_no_list)

# Add the list as a new column in the DataFrame
finance_df['Random_Yes_No'] = yes_no_list

#exclude the first 26 rows because calculations from MACD needs at least 26 days to calculate
finance_df = finance_df.iloc[26:].reset_index(drop=True)

#write out CSV of cleaned dataset
finance_df.to_csv(f'{ticker_symbol}_financials_cleaned_output.csv', index=False)  # `index=False` avoids writing the index column

#### Advantages and Disadvantages During Data Collection Phase

An advantage of collecting the data is that there is built in stock packages to collect stock market data. I used the 'yfinance' package to easily get stock price data for the ticker 'SPY'. A disadvantage would be that after collecting the data for the candle stick patterns, I realized that they occur very rarely. For that reason I have decided to not implement the bullish harami, bullish engulfing, and three white soldiers patterns as they are the most rarely occurring patterns and I want to make sure I have enough data to train a reliable model.

## Data Modeling / Machine Learning - _D_



#### LSTM Classification Model

Prior to running my machine learning model, the code below is designed to analyze a financial dataset (finance_df) by identifying a specific trading pattern, such as the "Hammer pattern," and subsequently predicting whether the stock's price will increase based on the pattern's occurrence. The code first filters the dataset to only include rows where the 'Hammer_pattern' is marked as "Yes." The pattern_df dataframe contains these filtered rows, and the code then iterates through these rows to collect a sequence of independent variables that can be used for prediction. These independent variables include various stock metrics like open, close, high, low prices, as well as technical indicators such as RSI (Relative Strength Index), MFI (Money Flow Index), and MACD (Moving Average Convergence Divergence). The code ensures that for each identified pattern, a series of previous days' data (up to 30 days) is gathered to build these feature sets, which are stored in separate lists. Additionally, a dependent variable is created to capture whether the stock price increases by a specified percentage (e.g., 1% or more) in the days following the identified pattern.

The code also handles the logic of determining whether the stock's price has increased by the desired percentage after the pattern appears. For each identified pattern, the code checks if the stock price on the following day (determined by 'days_out') exceeds the original price by the specified percentage. If the price increase criterion is met, the dependent variable is set to 1 (indicating a positive class), and if it isn't, it is set to 0 (indicating a negative class). The independent variables, which include multiple arrays of stock data for each identified pattern, are stored in corresponding lists and then converted into NumPy arrays for further processing or machine learning tasks. This approach prepares the data for machine learning models to predict future stock price movements based on the identified pattern and accompanying technical indicators.

**Note!!!**: I have chosen not to implement machine learning models to predict the future prices for three candlestick patterns. The reason is because of the small sample size of data. These are the Bullish Engulfing, Bullish Harami, and Three White Soldiers pattern.

To summarize or expound on what I have just said, the code directly below will allow us to:

1. Select the candlestick pattern ("Random", "Hammer", "Inverted Hammer"): The random pattern is a column in the dataset with randomly assigned "Yes" and "No" values. These "Yes" and "No" values are distributed randomly, and my goal is to compare the model's performance using these random patterns labeled "Yes" versus patterns that are specifically identified as candlestick patterns. This will help me understand if the model behaves differently when dealing with randomly assigned patterns versus known candlestick patterns.
<br>
2. Generate our independent variables that can be used for prediction. There will be 15 different sets of independent variables. Each one of them has a different shape and includes a different subset of independent variables. This allows us to compare the performance of independent variables so that we can evaluate the best performing combination of independent variables.
<br>
3. Generate our dependent variable. Our value of the dependent variable is dependent on the following two model parameters that is specified by me:
    + My research question states that I will predict a stock's closing price sometime after the candlestick pattern concludes. I will try different values for this such as one days afterwards, three days, five, ten, and fifteen.
    + Another parameter for defining the dependent variable is the percentage increase required to classify a future stock closing price as a positive class. For example, if the parameter is set to '1.0', the future closing price only needs to be greater than the last closing price identified within the 30-day sequence. However, if set to '1.1', the future closing price must exceed the last closing price in the sequence by 1%, i.e., the last closing price multiplied by 1.01.


The result of the code will be a numpy array with values of from all of our cleaned variables. This array incorporates the idea that one observation (one sequence) is 30 days of stock data. Each of these 30 time-steps (remember, one time-step = one day), will represent daily data of my eight chosen independent variables. The dependent variable in my array will be future closing stock price.

Depending on my selected candlestick pattern, I may have more observations as some candlestick patterns occur more often than others. Here is an example on how my independent and dependent variables will look like:

-Shape: (30, 8)<br>
_independent_observation = np.array([ <br>
    [0.054, 0.062, 0.054, 0.059, 45.3, 1000000, 0.0012, 0.0010],  # Day 1 <br>
    [0.059, 0.063, 0.059, 0.062, 46.1, 1010000, 0.0013, 0.0011],  # Day 2 <br>
    [0.061, 0.064, 0.060, 0.063, 47.2, 1025000, 0.0014, 0.0012],  # Day 3 <br>
    # ... 27 more rows ... <br>
    [0.070, 0.074, 0.071, 0.073, 55.0, 1200000, 0.0020, 0.0018]   # Day 30 <br>
])_

-Binary label: 1 if the future closing price is higher, 0 otherwise <br>
_dependent_classification = np.array([1])_

-Future day's closing price (e.g., 0.075) <br>
_dependent_regression = np.array([0.075])_

In [127]:
#Subset data frame for desired pattern
pattern_df = finance_df[finance_df['Hammer_pattern'] == "Yes"]
#pattern_df = finance_df[finance_df['Random_Yes_No'] == "Yes"]

#How many days after the pattern is identified to use for the dependent variable
days_out = 1

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = 1.00

#Gather independent variables
independent_list1 = []
independent_list2 = []
independent_list3 = []
independent_list4 = []
independent_list5 = []
independent_list6 = []
independent_list7 = []
independent_list8 = []
independent_list9 = []
independent_list10 = []
independent_list11 = []
independent_list12 = []
independent_list13 = []
independent_list14 = []
independent_list15 = []

#gather dependent variables
dependent_list = []

pattern_index = list(pattern_df["Row_index"])
#pattern_index = [60, 62]
for i in pattern_index:
    #if (i == 62):
    #    break
    
    #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
    if (i < 56):
        continue

    #get 30 days worth of data to gather data for indpendent variables
    subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
    #subset_df = finance_df[(finance_df["Row_index"] >= (i - 13)) & (finance_df["Row_index"] <= (i))]
    
    #Get day after data to gather closing price for dependent variable
    dependent_df = finance_df[finance_df["Row_index"] == (i)]
    dependent2_df = finance_df[finance_df["Row_index"] == (i + days_out)]
    
    temp_list1 = []
    temp_list2 = []
    temp_list3 = []
    temp_list4 = []
    temp_list5 = []
    temp_list6 = []
    temp_list7 = []
    temp_list8 = []
    temp_list9 = []
    temp_list10 = []
    temp_list11 = []
    temp_list12 = []
    temp_list13 = []
    temp_list14 = []
    temp_list15 = []

    #append temp_list to independent_list
    if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
    

        for index, row in subset_df.iterrows():
                
                test_array1 = np.array([row['Open'], row['Close'], row['High'], row['Low']])
                test_array2 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low']])
                test_array3 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low']])
        
                test_array4 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI']])
                test_array5 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI']])
                test_array6 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI']])
        
                test_array7 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MFI']])
                test_array8 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MFI']])
                test_array9 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MFI']])
        
                test_array10 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MACD'], row['Signal_Line']])
                test_array11 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MACD'], row['Signal_Line']])
                test_array12 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MACD'], row['Signal_Line']])
        
                test_array13 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                test_array14 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
        
                
                temp_list1.append(test_array1)
                temp_list2.append(test_array2)
                temp_list3.append(test_array3)
                temp_list4.append(test_array4)
                temp_list5.append(test_array5)
                temp_list6.append(test_array6)
                temp_list7.append(test_array7)
                temp_list8.append(test_array8)
                temp_list9.append(test_array9)
                temp_list10.append(test_array10)
                temp_list11.append(test_array11)
                temp_list12.append(test_array12)
                temp_list13.append(test_array13)
                temp_list14.append(test_array14)
                temp_list15.append(test_array15)
                
        independent_list1.append(temp_list1)
        independent_list2.append(temp_list2)
        independent_list3.append(temp_list3)
        independent_list4.append(temp_list4)
        independent_list5.append(temp_list5)
        independent_list6.append(temp_list6)
        independent_list7.append(temp_list7)
        independent_list8.append(temp_list8)
        independent_list9.append(temp_list9)
        independent_list10.append(temp_list10)
        independent_list11.append(temp_list11)
        independent_list12.append(temp_list12)
        independent_list13.append(temp_list13)
        independent_list14.append(temp_list14)
        independent_list15.append(temp_list15)
    
        if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * pct_increase):
            dependent_list.append(1)
        else:
            dependent_list.append(0)


independent_array1 = np.array(independent_list1)
independent_array2 = np.array(independent_list2)
independent_array3 = np.array(independent_list3)
independent_array4 = np.array(independent_list4)
independent_array5 = np.array(independent_list5)
independent_array6 = np.array(independent_list6)
independent_array7 = np.array(independent_list7)
independent_array8= np.array(independent_list8)
independent_array9 = np.array(independent_list9)
independent_array10 = np.array(independent_list10)
independent_array11 = np.array(independent_list11)
independent_array12 = np.array(independent_list12)
independent_array13 = np.array(independent_list13)
independent_array14 = np.array(independent_list14)
independent_array15 = np.array(independent_list15)
dependent_array = np.array(dependent_list)

In [128]:
print(np.shape(independent_array1)) # 30 time-steps and 4 features per time-step
print(np.shape(independent_array4)) # 30 time-steps and 5 features per time-step
print(np.shape(independent_array10)) # 30 time-steps and 6 features per time-step
print(np.shape(independent_array15)) # 30 time-steps and 8 features per time-step

(69, 30, 4)
(69, 30, 5)
(69, 30, 6)
(69, 30, 8)


In [129]:
#the number of observations of each class
np.unique(dependent_array, return_counts=True)

(array([0, 1]), array([37, 32]))

This is my LSTM classification model as shown below.

This model is designed for binary classification tasks, where the goal is to predict one of two possible outcomes ("Yes" or "No"). It uses an LSTM (Long Short-Term Memory) network, which is a type of Recurrent Neural Network (RNN) that works well with sequential data, like time series in my case. The model begins with an LSTM layer of 128 units, using the "tanh" activation function to capture patterns in the sequential input data. The return_sequences=True means this layer outputs sequences, which are passed on to the next LSTM layer. A dropout layer is added to reduce overfitting by randomly "dropping" some of the units during training.

The second LSTM layer, with 64 units, processes the sequence further, but without returning sequences (return_sequences=False). This makes the output a single vector, which is then passed through another dropout layer. The model ends with a dense output layer, which has one unit with a sigmoid activation function, giving a probability between 0 and 1 for binary classification. The model is compiled using the Adam optimizer, a commonly used optimization algorithm, with binary cross-entropy as the loss function, as this is suitable for binary classification problems. The performance of the model is evaluated using accuracy as the metric.

In [130]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.preprocessing import StandardScaler

#Define the LSTM classification model
def create_lstm_classification(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for binary classification
    model.add(Dense(1, activation='sigmoid'))  # Sigmoid for binary classification (probability)
    
    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])  # Binary cross-entropy for classification
    
    return model

#independent_array(s) and dependent_array are already defined
#independent variables (features)
X = independent_array1  # Shape: (890, 30, [4, 5, 6, or 8])

#dependent variable (target)
y = dependent_array  # Shape: (890,) (binary labels, 0 or 1)

#split data into training and testing sets (80% training, 20% testing), random state for reproducible results
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=6)

#define the input shape based on your data; for example independent array_1 has input_shape of (30, 4); independent array_15's shape is (30,8)
input_shape = (30, X.shape[2])  # 30 time-steps and 8 features per time-step for independent_array_15

#create the LSTM model
classification_model = create_lstm_classification(input_shape)

#train the classification model and store history. verbose = 0 -> hides the training output
history = classification_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose = 0)

#get the validation accuracy from history object
val_accuracy = history.history['val_accuracy']

#find the highest validation accuracy achieved during training
best_val_accuracy = max(val_accuracy)
avg_val_accuracy = np.mean(val_accuracy)

# Evaluate the model on the test data, returning best validation accuracy and average validation accuracy throughout all epochs
print(f"Max Accuracy: {best_val_accuracy * 100:.2f}%; Average accuray: {avg_val_accuracy}")

  super().__init__(**kwargs)


Max Accuracy: 57.14%; Average accuray: 0.48571428954601287


The model above was ran using data from independent_array #1, which had the shape of (69, 30, 4) or in other words 69 observations, 30 time-steps and 4 features per time-step. Independent array #1 as shown in the code only includes non-normalized stock price variables (open, low, close, high).

The model was also run using 30-day sequences, with each sequence concluding on the 30th day featuring a hammer candlestick pattern on that 30th day. Lastly, I selected the two parameter combinations for the dependent variable: that the future closing price was one day in the future, the day directly after the 30 day sequence. The second parameter was that the dependent variable was labeled as a positive class if it's future closing price was higher than it was previously than the closingn price of the last candle in the 30 day sequence. 

We can see that the best accuracy score achieved by the model was 57.14% after 10 epochs. 

#### Exploring Model Performance with Parameter Variations and Stratified K-Fold Cross Validation - LSTM Classification Model

Now, using the same LSTM model as defined in the previous step I will try various combinations of parameters and variables for my model. 

For the parameter combinations, I will evaluate my model using three patterns: random days, the hammer pattern, and the inverted hammer pattern. I’ve chosen not to evaluate the other patterns due to insufficient observations in the dataset. When I refer to evaluating my model on random days, I mean that I previously created a column in the dataset with randomly assigned "Yes" values. These "Yes" values are distributed randomly, and my goal is to compare the model's performance using these random patterns versus patterns that are specifically identified as candlestick patterns. This will help me understand if the model behaves differently when dealing with randomly assigned patterns versus known candlestick patterns.

Other parameters I will test will be how many days out after the pattern is identified to use for the dependent variable. For example, if this is set to the value of "1", the closing price for the day directly after the candlestick pattern will be what influences the dependent variable. Another example, is if this parameter is "10", then the future closing price is associated with 10 days after the last candle of the candle stick pattern.

To build off from the previous paragraph, my last parameter will be what percent increase from the original price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if the original price is 100 dollars, anything greater than 101 dollars is considered a positive class. In my analysis I have set these values to [1.0, 1.01, and 1.02]. Note, if the value is set to '1.0' then if the original price is 100 dollars, anything greater than 100 dollars is considered a positive class.

I will also use statified K-fold Cross validation. Stratified K-Fold Cross Validation is often preferred over regular K-Fold because it ensures that each fold of the data has a similar distribution of the target classes. This is particularly important when dealing with imbalanced datasets where some classes may be underrepresented. In regular K-Fold cross-validation, the data is randomly split, which could result in some folds having disproportionately many samples from one class and too few from another. This can lead to biased model performance estimates, as the model might not be exposed to enough of the minority class to learn effectively.

In contrast, Stratified K-Fold ensures that each fold contains roughly the same proportion of each class as in the original dataset. This helps the model train and validate on a more balanced representation of the target variable, leading to more reliable and generalizable performance metrics. Stratified K-Fold is particularly beneficial for classification tasks where the goal is to maintain fairness in model evaluation, and it can help prevent skewed results caused by class imbalances.

In my case, when splitting my dataset, I want to make sure I have the same ratio of positive and negative classes in each fold as in the original dataset. This will ensure that my model is consistently evaluated on balanced data and help improve its ability to predict both classes effectively.

The result of running the code below will output a CSV file which shows the accuracy scores of each parameter and variable combination. I am going to have to run this code below multiple times, each time changing the stock ticker (if I choose to analyze another stock, but I won't in this project; I will only evaluate the stock ticker "SPY" to save resources.) I want to evaluate and the selected pattern (the code below can only run one selected pattern and one stock ticker at a time).

In [133]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
import numpy as np
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.optimizers import RMSprop
import pandas as pd


### User inputs ###
selected_pattern = "InvertedHammer"   #choices: 'Random', 'Hammer', 'InvertedHammer'

#How many days after the pattern is identified to use for the dependent variable
days_out = [1, 3, 5, 10, 15]

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = [1.00, 1.01, 1.02]

######


# Define the classification model
def create_lstm_classification(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for binary classification
    model.add(Dense(1, activation='sigmoid'))  # Sigmoid for binary classification (probability)
    
    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])  # Binary cross-entropy for classification
    return model

    

#Subset data frame for desired pattern
if (selected_pattern == "Random"):
    pattern_df = finance_df[finance_df['Random_Yes_No'] == "Yes"]
elif (selected_pattern == "Hammer"):
    pattern_df = finance_df[finance_df['Hammer_pattern'] == "Yes"]
else:
    pattern_df = finance_df[finance_df['InvertedHammer_pattern'] == "Yes"]


#initialize an empty DataFrame with column names
accuracy_df = pd.DataFrame(columns=['ticker', 'pattern', 'independent_array', 'best_accuracy', 'avg_accuracy', 'days_out', 'Total_observations', 
                                   'Negative_observations', 'Positive_observations', 'Percent_increase_parameter'])


for percent in pct_increase:

    for day in days_out:
        #Gather independent variables
        independent_list1 = []
        independent_list2 = []
        independent_list3 = []
        independent_list4 = []
        independent_list5 = []
        independent_list6 = []
        independent_list7 = []
        independent_list8 = []
        independent_list9 = []
        independent_list10 = []
        independent_list11 = []
        independent_list12 = []
        independent_list13 = []
        independent_list14 = []
        independent_list15 = []
        
        #gather dependent variables
        dependent_list = []

        #these are the row indexes that have the identified patterns; loop through
        pattern_index = list(pattern_df["Row_index"])
        for i in pattern_index:
            #if (i == 62):
            #    break
            
            #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
            if (i < 56):
                continue
        
            #get 30 days worth of data to gather data for indpendent variables
            subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
            #subset_df = finance_df[(finance_df["Row_index"] >= (i - 13)) & (finance_df["Row_index"] <= (i))]
            
            #Get day after data to gather closing price for dependent variable
            dependent_df = finance_df[finance_df["Row_index"] == (i)]
            dependent2_df = finance_df[finance_df["Row_index"] == (i + day)]
            
            temp_list1 = []
            temp_list2 = []
            temp_list3 = []
            temp_list4 = []
            temp_list5 = []
            temp_list6 = []
            temp_list7 = []
            temp_list8 = []
            temp_list9 = []
            temp_list10 = []
            temp_list11 = []
            temp_list12 = []
            temp_list13 = []
            temp_list14 = []
            temp_list15 = []
        
            #append temp_list to independent_list
            if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
            
        
                for index, row in subset_df.iterrows():
                        
                        test_array1 = np.array([row['Open'], row['Close'], row['High'], row['Low']])
                        test_array2 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low']])
                        test_array3 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low']])
                
                        test_array4 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI']])
                        test_array5 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI']])
                        test_array6 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI']])
                
                        test_array7 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MFI']])
                        test_array8 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MFI']])
                        test_array9 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MFI']])
                
                        test_array10 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MACD'], row['Signal_Line']])
                        test_array11 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MACD'], row['Signal_Line']])
                        test_array12 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MACD'], row['Signal_Line']])
                
                        test_array13 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                        test_array14 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                        test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                
                
                        temp_list1.append(test_array1)
                        temp_list2.append(test_array2)
                        temp_list3.append(test_array3)
                        temp_list4.append(test_array4)
                        temp_list5.append(test_array5)
                        temp_list6.append(test_array6)
                        temp_list7.append(test_array7)
                        temp_list8.append(test_array8)
                        temp_list9.append(test_array9)
                        temp_list10.append(test_array10)
                        temp_list11.append(test_array11)
                        temp_list12.append(test_array12)
                        temp_list13.append(test_array13)
                        temp_list14.append(test_array14)
                        temp_list15.append(test_array15)
                        
                independent_list1.append(temp_list1)
                independent_list2.append(temp_list2)
                independent_list3.append(temp_list3)
                independent_list4.append(temp_list4)
                independent_list5.append(temp_list5)
                independent_list6.append(temp_list6)
                independent_list7.append(temp_list7)
                independent_list8.append(temp_list8)
                independent_list9.append(temp_list9)
                independent_list10.append(temp_list10)
                independent_list11.append(temp_list11)
                independent_list12.append(temp_list12)
                independent_list13.append(temp_list13)
                independent_list14.append(temp_list14)
                independent_list15.append(temp_list15)
            
                if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * percent):
                    dependent_list.append(1)
                else:
                    dependent_list.append(0)
        
        independent_array1 = np.array(independent_list1)
        independent_array2 = np.array(independent_list2)
        independent_array3 = np.array(independent_list3)
        independent_array4 = np.array(independent_list4)
        independent_array5 = np.array(independent_list5)
        independent_array6 = np.array(independent_list6)
        independent_array7 = np.array(independent_list7)
        independent_array8= np.array(independent_list8)
        independent_array9 = np.array(independent_list9)
        independent_array10 = np.array(independent_list10)
        independent_array11 = np.array(independent_list11)
        independent_array12 = np.array(independent_list12)
        independent_array13 = np.array(independent_list13)
        independent_array14 = np.array(independent_list14)
        independent_array15 = np.array(independent_list15)
        dependent_array = np.array(dependent_list)
    
    
        y = dependent_array
        independent_array = []
        best_accuracy = []
        avg_accuracy = []
        counter_independentarray = 0
        for i in range(1, 16):
            #if i != 12: #testing what seems is the most well performing model
                #continue
            
            # Select which independent_array to use
            if i == 1:
                X = independent_array1  # Shape: (890, 30, 4)
                independent_array.append("independent_array1")
            if i == 2:
                X = independent_array2  # Shape: (890, 30, 4)
                independent_array.append("independent_array2")
            if i == 3:
                X = independent_array3  # Shape: (890, 30, 4)
                independent_array.append("independent_array3")
            if i == 4:
                X = independent_array4  # Shape: (890, 30, 5)
                independent_array.append("independent_array4")
            if i == 5:
                X = independent_array5  # Shape: (890, 30, 5)
                independent_array.append("independent_array5")
            if i == 6:
                X = independent_array6  # Shape: (890, 30, 5)
                independent_array.append("independent_array6")
            if i == 7:
                X = independent_array7  # Shape: (890, 30, 5)
                independent_array.append("independent_array7")
            if i == 8:
                X = independent_array8  # Shape: (890, 30, 5)
                independent_array.append("independent_array8")
            if i == 9:
                X = independent_array9  # Shape: (890, 30, 5)
                independent_array.append("independent_array9")
            if i == 10:
                X = independent_array10  # Shape: (890, 30, 6)
                independent_array.append("independent_array10")
            if i == 11:
                X = independent_array11  # Shape: (890, 30, 6)
                independent_array.append("independent_array11")
            if i == 12:
                X = independent_array12  # Shape: (890, 30, 6)
                independent_array.append("independent_array12")
            if i == 13:
                X = independent_array13  # Shape: (890, 30, 8)
                independent_array.append("independent_array13")
            if i == 14:
                X = independent_array14  # Shape: (890, 30, 8)
                independent_array.append("independent_array14")
            if i == 15:
                X = independent_array15  # Shape: (890, 30, 8)
                independent_array.append("independent_array15")
        
            counter_independentarray = counter_independentarray + 1
            
            # Define the input shape based on the number of features
            input_shape = (30, X.shape[2])  # 30 time-steps and `X.shape[2]` features per time-step
            
            # Create the LSTM model
            classification_model = create_lstm_classification(input_shape)
            
            # Initialize k-fold cross-validation
            #kf = KFold(n_splits=5, shuffle=True, random_state=6)  #regular 5-fold cross-validation w/out stratification
            kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=6)  # 5-fold cross-validation with stratification

            #initialize to gather all the accuracy scores at each epoch for all 5 folds
            fold_accuracies = []
            
            #stratified K-fold Cross-Validation
            counter_kfold = 0
            for train_index, val_index in kf.split(X, y): #used for stratified k-fold
            #for train_index, val_index in kf.split(X): #used for regular k-fold
                
                counter_kfold = counter_kfold + 1
                print(f"Now running, pct_increase: {percent}; days out: {day}; independent_array: {counter_independentarray}; K-fold: {counter_kfold}")
                
                X_train, X_val = X[train_index], X[val_index]
                y_train, y_val = y[train_index], y[val_index]
                
                # Train the classification model and store the history; verbose = 0 to hide epoch running info in cell output
                history = classification_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=0)
                
                # Get the validation accuracies for this fold. What this does is that an accuracy score is calculated at each epoch,
                #and in this list I am getting all the accuracy scores from all five folds
                val_accuracy = history.history['val_accuracy']
                fold_accuracies.append(val_accuracy)
        
        
            
            # Calculate the best and average validation accuracy across all folds
            best_val_accuracy = np.max(fold_accuracies) #get the max accuracy across all epochs across all five folds
            avg_val_accuracy = np.mean(fold_accuracies) #get the mean accuracy across all epochs across all five folds
            best_accuracy.append(best_val_accuracy)
            avg_accuracy.append(avg_val_accuracy)
        
        
        # Example of new data to add
        df_new = pd.DataFrame({
            'ticker': ticker_symbol,
            'pattern': selected_pattern,
            'independent_array': independent_array,
            'best_accuracy': best_accuracy,
            'avg_accuracy': avg_accuracy,
            'days_out': day,
            'Total_observations': sum(np.unique(dependent_array, return_counts=True)[1]),
            'Negative_observations': np.unique(dependent_array, return_counts=True)[1][0],
            'Positive_observations': np.unique(dependent_array, return_counts=True)[1][1],
            'Percent_increase_parameter': percent
        })
    
        # Concatenate the new data to the empty DataFrame
        accuracy_df = pd.concat([accuracy_df, df_new], ignore_index=True)
    

  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.0; days out: 1; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.0; days out: 1; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.0; days out: 1; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.0; days out: 1; independent_array: 15; K-fold: 5


  accuracy_df = pd.concat([accuracy_df, df_new], ignore_index=True)
  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 3; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.0; days out: 3; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.0; days out: 3; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.0; days out: 3; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.0; days out: 3; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 5; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.0; days out: 5; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.0; days out: 5; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.0; days out: 5; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.0; days out: 5; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 10; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.0; days out: 10; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.0; days out: 10; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.0; days out: 10; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.0; days out: 10; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 15; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.0; days out: 15; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.0; days out: 15; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.0; days out: 15; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.0; days out: 15; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 1; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.01; days out: 1; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.01; days out: 1; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.01; days out: 1; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.01; days out: 1; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 3; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.01; days out: 3; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.01; days out: 3; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.01; days out: 3; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.01; days out: 3; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 5; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.01; days out: 5; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.01; days out: 5; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.01; days out: 5; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.01; days out: 5; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 10; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.01; days out: 10; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.01; days out: 10; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.01; days out: 10; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.01; days out: 10; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.01; days out: 15; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.01; days out: 15; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.01; days out: 15; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.01; days out: 15; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.01; days out: 15; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 1; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.02; days out: 1; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.02; days out: 1; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.02; days out: 1; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.02; days out: 1; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 3; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.02; days out: 3; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.02; days out: 3; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.02; days out: 3; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.02; days out: 3; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 5; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.02; days out: 5; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.02; days out: 5; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.02; days out: 5; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.02; days out: 5; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 10; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.02; days out: 10; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.02; days out: 10; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.02; days out: 10; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.02; days out: 10; independent_array: 15; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 1; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 1; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 1; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 1; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 1; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 2; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 2; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 2; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 2; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 2; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 3; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 3; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 3; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 3; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 3; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 4; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 4; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 4; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 4; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 4; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 5; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 5; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 5; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 5; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 5; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 6; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 6; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 6; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 6; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 6; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 7; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 7; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 7; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 7; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 7; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 8; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 8; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 8; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 8; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 8; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 9; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 9; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 9; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 9; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 9; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 10; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 10; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 10; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 10; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 10; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 11; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 11; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 11; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 11; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 11; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 12; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 12; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 12; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 12; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 12; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 13; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 13; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 13; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 13; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 13; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 14; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 14; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 14; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 14; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 14; K-fold: 5


  super().__init__(**kwargs)


Now running, pct_increase: 1.02; days out: 15; independent_array: 15; K-fold: 1
Now running, pct_increase: 1.02; days out: 15; independent_array: 15; K-fold: 2
Now running, pct_increase: 1.02; days out: 15; independent_array: 15; K-fold: 3
Now running, pct_increase: 1.02; days out: 15; independent_array: 15; K-fold: 4
Now running, pct_increase: 1.02; days out: 15; independent_array: 15; K-fold: 5


In [134]:
#will output multiple CSV files with training results for classification model
accuracy_df.to_csv(f'{ticker_symbol}_{selected_pattern}_classification_output.csv', index=False)  # `index=False` avoids writing the index column

After I have ran the above code, multiple times for each pattern (random, hammer, and inverted hammer) and received a CSV output for each, I will determine the results in the reporting section of this project.

#### LSTM Regression Model

Now, I am going to transition from classification to the regression model. As the second part of my research question is aimed at comparing the performance between these two models.

Below, as we did with prior to running the classification model, the first thing we need to do is get the data to run the regression model. The independent variables will be the same as when they were run through the classification model. What changes is the dependent variable. Instead of a binary variable, we are going to have a continous dependent variable that represents the stock's closing price on a future day.

In [135]:
#Subset data frame for desired pattern
pattern_df = finance_df[finance_df['Hammer_pattern'] == "Yes"]
#pattern_df = finance_df[finance_df['Random_Yes_No'] == "Yes"]

#How many days after the pattern is identified to use for the dependent variable
days_out = 1

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = 1.00

#Gather independent variables
independent_list1 = []
independent_list2 = []
independent_list3 = []
independent_list4 = []
independent_list5 = []
independent_list6 = []
independent_list7 = []
independent_list8 = []
independent_list9 = []
independent_list10 = []
independent_list11 = []
independent_list12 = []
independent_list13 = []
independent_list14 = []
independent_list15 = []

#gather dependent variables
dependent_list = [] #this is for classification tasks
dependent_list_regression = [] #this is for regression tasks
dependent_list_regression_log = []
dependent_list_regression_normalized = []

pattern_index = list(pattern_df["Row_index"])
#pattern_index = [60, 62]
for i in pattern_index:
    #if (i == 62):
    #    break
    
    #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
    if (i < 56):
        continue

    #get 30 days worth of data to gather data for indpendent variables
    subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
    
    #Get day after data to gather closing price for dependent variable
    dependent_df = finance_df[finance_df["Row_index"] == (i)]
    dependent2_df = finance_df[finance_df["Row_index"] == (i + days_out)]
    
    temp_list1 = []
    temp_list2 = []
    temp_list3 = []
    temp_list4 = []
    temp_list5 = []
    temp_list6 = []
    temp_list7 = []
    temp_list8 = []
    temp_list9 = []
    temp_list10 = []
    temp_list11 = []
    temp_list12 = []
    temp_list13 = []
    temp_list14 = []
    temp_list15 = []

    #append temp_list to independent_list
    if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
    

        for index, row in subset_df.iterrows():
                
                test_array1 = np.array([row['Open'], row['Close'], row['High'], row['Low']])
                test_array2 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low']])
                test_array3 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low']])
        
                test_array4 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI']])
                test_array5 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI']])
                test_array6 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI']])
        
                test_array7 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MFI']])
                test_array8 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MFI']])
                test_array9 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MFI']])
        
                test_array10 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MACD'], row['Signal_Line']])
                test_array11 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MACD'], row['Signal_Line']])
                test_array12 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MACD'], row['Signal_Line']])
        
                test_array13 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                test_array14 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
        
                
                temp_list1.append(test_array1)
                temp_list2.append(test_array2)
                temp_list3.append(test_array3)
                temp_list4.append(test_array4)
                temp_list5.append(test_array5)
                temp_list6.append(test_array6)
                temp_list7.append(test_array7)
                temp_list8.append(test_array8)
                temp_list9.append(test_array9)
                temp_list10.append(test_array10)
                temp_list11.append(test_array11)
                temp_list12.append(test_array12)
                temp_list13.append(test_array13)
                temp_list14.append(test_array14)
                temp_list15.append(test_array15)
               
        independent_list1.append(temp_list1)
        independent_list2.append(temp_list2)
        independent_list3.append(temp_list3)
        independent_list4.append(temp_list4)
        independent_list5.append(temp_list5)
        independent_list6.append(temp_list6)
        independent_list7.append(temp_list7)
        independent_list8.append(temp_list8)
        independent_list9.append(temp_list9)
        independent_list10.append(temp_list10)
        independent_list11.append(temp_list11)
        independent_list12.append(temp_list12)
        independent_list13.append(temp_list13)
        independent_list14.append(temp_list14)
        independent_list15.append(temp_list15)
    
        dependent_list_regression.append(dependent2_df['Close'].iloc[0])
        dependent_list_regression_log.append(dependent2_df['Log_Close'].iloc[0])
        dependent_list_regression_normalized.append(dependent2_df['Normalized_Close'].iloc[0])

        if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * pct_increase):
            dependent_list.append(1)
        else:
            dependent_list.append(0)


independent_array1 = np.array(independent_list1)
independent_array2 = np.array(independent_list2)
independent_array3 = np.array(independent_list3)
independent_array4 = np.array(independent_list4)
independent_array5 = np.array(independent_list5)
independent_array6 = np.array(independent_list6)
independent_array7 = np.array(independent_list7)
independent_array8= np.array(independent_list8)
independent_array9 = np.array(independent_list9)
independent_array10 = np.array(independent_list10)
independent_array11 = np.array(independent_list11)
independent_array12 = np.array(independent_list12)
independent_array13 = np.array(independent_list13)
independent_array14 = np.array(independent_list14)
independent_array15 = np.array(independent_list15)
dependent_array = np.array(dependent_list) #used to see how many positive and negative classes
dependent_array_regression = np.array(dependent_list_regression)
dependent_array_regression_log = np.array(dependent_list_regression_log)
dependent_array_regression_normalized = np.array(dependent_list_regression_normalized)

This code below defines and trains a regression model using an LSTM (Long Short-Term Memory) network, a type of recurrent neural network (RNN) designed for sequence prediction tasks. The regression model aims to predict a continuous target variable based on a series of input features. First, the code imports the necessary libraries, including Keras for building and training the model, scikit-learn for data preprocessing and splitting, and other utility libraries like numpy and pandas. It defines a function create_lstm_regression() to create the LSTM-based regression model. The model consists of two LSTM layers, each followed by a Dropout layer to reduce the risk of overfitting. The first LSTM layer has 128 units and returns sequences of data, allowing the next LSTM layer to process the sequence. The second LSTM layer has 64 units and does not return sequences. The final layer is a Dense layer with a single neuron and a linear activation function, which outputs a continuous value suitable for regression tasks. The model is compiled using the Adam optimizer and mean squared error (MSE) loss function, which is commonly used for regression.

In the second part of the code, the input data (X) and target labels (y) are prepared for training. The independent variables (X) are chosen from the earlier dataset (e.g., independent_array1), which contains time-series features such as stock prices and technical indicators, while the dependent variable (y) contains the regression targets (e.g., future stock price movements). The data is then split into training and testing sets using train_test_split(), where 80% is used for training and 20% for testing. The input_shape for the LSTM model is determined by the number of time steps (30) and the number of features per time-step (e.g., 8 for independent_array_15). The LSTM model is then created by calling create_lstm_regression() with the input_shape, and the model is trained using the fit() method. The training process runs for 10 epochs, with a batch size of 32, and the model's performance is evaluated on the test set using the validation data. The verbose = 1 option displays the training progress during each epoch.

In [136]:
#code for regression
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.optimizers import RMSprop
import pandas as pd

# Define the regression model
def create_lstm_regression(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for regression
    model.add(Dense(1, activation='linear'))  # Predicting a continuous value
    
    # Compile the model
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])  # MSE for regression tasks
    
    return model


# Independent variables (features)
X = independent_array1  # Shape: (890, 30, )

# Dependent variable (target); make sure to match the dependent array with the correct independent variables-- if X == independent_array3 then y should be set to dependent_array_regression_normalized 
y = dependent_array_regression  # Shape: (890,)

#split data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=6)

#define the input shape based on your data; for example independent array_1 has input_shape of (30, 4); independent array_15's shape is (30,8)
input_shape = (30, X.shape[2])  # 30 time-steps and 8 features per time-step for independent_array_15

#create the LSTM model
regression_model = create_lstm_regression(input_shape)

#train the classification model and store history. verbose = 0 -> hides the training output
history = regression_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose = 1)

Epoch 1/10


  super().__init__(**kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 758ms/step - loss: 65740.9219 - mae: 203.2404 - val_loss: 52114.9180 - val_mae: 187.3432
Epoch 2/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 159ms/step - loss: 59008.4766 - mae: 191.7096 - val_loss: 51820.3789 - val_mae: 186.5379
Epoch 3/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 161ms/step - loss: 59213.7461 - mae: 189.7480 - val_loss: 51560.4414 - val_mae: 185.8472
Epoch 4/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 163ms/step - loss: 61183.7383 - mae: 196.5097 - val_loss: 51328.4375 - val_mae: 185.2185
Epoch 5/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 167ms/step - loss: 64968.6641 - mae: 201.8303 - val_loss: 51115.5391 - val_mae: 184.6420
Epoch 6/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 181ms/step - loss: 65988.2812 - mae: 202.6831 - val_loss: 50930.0391 - val_mae: 184.1364
Epoch 7/10
[1m2/2[0m [32m━━━━━━━━━━━

In [137]:
#get the predictions of the model when applied on the test set
y_pred = regression_model.predict(X_test)

#Get the actual closing prices from the test set; we know the closing price on the final day of the candlestick pattern, is always the 30th day, and 2nd item in the array
last_closing_price = X_test[:, 29, 1]

comparison = (y_pred.flatten() > last_closing_price * pct_increase) #are the predictions greater than the actual closing prices
comparison_2 = (y_pred.flatten() <= last_closing_price * pct_increase) #are the predictions less than or equal to the actual closing prices

comparison_3 = (y_test > last_closing_price * pct_increase) #are the actual closing prices from the test set greater than the actual closing prices
comparison_4 = (y_test <= last_closing_price * pct_increase) #are the actual closing prices from the test set less than or equal to the actual closing prices

# Case 1: When both predicted and actual values are greater than the closing price
correct_greater = comparison & comparison_3
# Case 2: When both predicted and actual values are less than or equal to the closing price
correct_lesser_or_equal = comparison_2 & comparison_4

#total correct predictions
print(f'Total correct predictions: {np.sum(correct_greater) + np.sum(correct_lesser_or_equal)}; out of {len(y_test)} observations in test set')
print(f'Total observations: {sum(np.unique(dependent_array, return_counts=True)[1])}')
print(f'Number of observations each class from dependent variable: {np.unique(dependent_array, return_counts=True)}')

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 332ms/step
Total correct predictions: 7; out of 14 observations in test set
Total observations: 69
Number of observations each class from dependent variable: (array([0, 1]), array([37, 32]))


A regression model is designed to predict continuous values, not for classification tasks. However, I needed a way to compare the performance of both my regression and classification models. To do this, I evaluated the predicted values from my regression model using the following approach:

1. If the predicted closing price is greater than the closing price of the last candle in the identified candlestick pattern, and the actual closing price (from the test set) is also greater than the last candle’s closing price, this is considered a correct prediction.
2. If the predicted closing price is less than or equal to the closing price of the last candle in the identified candlestick pattern, and the actual closing price (from the test set) is also less than or equal to the last candle’s closing price, this is also considered a correct prediction.
3. I then count the number of correct predictions from both cases (steps 1 and 2), and divide this by the total number of observations in the test set to calculate the accuracy.

In this example, we have a total of 69 observations, with 37 of those observations belonging to the negative class, meaning the future closing price was less than the closing price from the last candle in the identified candlestick pattern. Dividing 37 by 69 gives us a percentage of 53.6%. This percentage represents the proportion of negative class observations in the test set.

When comparing this to the performance of our regression model, we see that it correctly classified 7 out of 14 total in the test set observations, or 50.0%. This means that if I had predicted every observation to belong to the negative class, my model would have performed better than the regression model (since 53.6% of the observations were negative class).

#### Exploring Model Performance with Parameter Variations and Stratified K-Fold Cross Validation - LSTM Regression Model

I am going to repeat the analysis that I did with various parameter combinations and stratified K-fold cross validation from when I performed the training of the LSTM classification model.

This time, as mentioned before, I will have a different dependent variable as instead of having binary dependent variables, I will have a continuous dependent variable.

For the parameter combinations, I will evaluate my model using three patterns: random days, the hammer pattern, and the inverted hammer pattern. I’ve chosen not to evaluate the other patterns due to insufficient observations in the dataset. When I refer to evaluating my model on random days, I mean that I previously created a column in the dataset with randomly assigned "Yes" values. These "Yes" values are distributed randomly, and my goal is to compare the model's performance using these random patterns versus patterns that are specifically identified as candlestick patterns. This will help me understand if the model behaves differently when dealing with randomly assigned patterns versus known candlestick patterns.

Other parameters I will test will be how many days out after the pattern is identified to use for the dependent variable. For example, if this is set to the value of "1", the closing price for the day directly after the candlestick pattern will be the dependent variable. If set to "10" for example, the closing price 10 days after the last candle in the candlestick pattern will be the dependent variable.

My last parameter combination will be what percent increase from the original price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if the original price is 100 dollars, anything greater than 101 dollars is considered a positive class.

I will also use statified K-fold Cross validation as I previously did when training the classification model. 

The result of running the code below will output a CSV file which shows the accuracy scores of each parameter and variable combination. I am going to have to run this code below multiple times, each time for the selected candle stick pattern (the code below can only run one selected pattern and one stock ticker at a time). Again, I will only run the stock ticker "SPY" in an effort to save resources.

In [140]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.preprocessing import StandardScaler
import pandas as pd


### User inputs ###
selected_pattern = "InvertedHammer"   #choices: 'Random', 'Hammer', 'InvertedHammer'

#How many days after the pattern is identified to use for the dependent variable
days_out = [1, 3, 5, 10, 15]

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = [1.00, 1.01, 1.02]

######


# Define the regression model
def create_lstm_regression(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for regression
    model.add(Dense(1, activation='linear'))  # Predicting a continuous value
    
    # Compile the model
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])  # MSE for regression tasks
    
    return model

    

#Subset data frame for desired pattern
if (selected_pattern == "Random"):
    pattern_df = finance_df[finance_df['Random_Yes_No'] == "Yes"]
elif (selected_pattern == "Hammer"):
    pattern_df = finance_df[finance_df['Hammer_pattern'] == "Yes"]
else:
    pattern_df = finance_df[finance_df['InvertedHammer_pattern'] == "Yes"]


#initialize an empty DataFrame with column names
accuracy_df = pd.DataFrame(columns=['ticker', 'pattern', 'independent_array', 'best_accuracy', 'avg_accuracy', 'days_out', 'Total_observations', 
                                   'Negative_observations', 'Positive_observations', 'Percent_increase_parameter'])


for percent in pct_increase:

    for day in days_out:
        #Gather independent variables
        independent_list1 = []
        independent_list2 = []
        independent_list3 = []
        independent_list4 = []
        independent_list5 = []
        independent_list6 = []
        independent_list7 = []
        independent_list8 = []
        independent_list9 = []
        independent_list10 = []
        independent_list11 = []
        independent_list12 = []
        independent_list13 = []
        independent_list14 = []
        independent_list15 = []
        dependent_list_regression = []
        dependent_list_regression_log = [] 
        dependent_list_regression_normalized = [] 
        
        #gather dependent variables
        dependent_list = []

        #these are the row indexes that have the identified patterns; loop through
        pattern_index = list(pattern_df["Row_index"])
        for i in pattern_index:
            #if (i == 62):
            #    break
            
            #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
            if (i < 56):
                continue
        
            #get 30 days worth of data to gather data for indpendent variables
            subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
            #subset_df = finance_df[(finance_df["Row_index"] >= (i - 13)) & (finance_df["Row_index"] <= (i))]
            
            #Get day after data to gather closing price for dependent variable
            dependent_df = finance_df[finance_df["Row_index"] == (i)]
            dependent2_df = finance_df[finance_df["Row_index"] == (i + day)]
            
            temp_list1 = []
            temp_list2 = []
            temp_list3 = []
            temp_list4 = []
            temp_list5 = []
            temp_list6 = []
            temp_list7 = []
            temp_list8 = []
            temp_list9 = []
            temp_list10 = []
            temp_list11 = []
            temp_list12 = []
            temp_list13 = []
            temp_list14 = []
            temp_list15 = []
        
            #append temp_list to independent_list
            if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
            
        
                for index, row in subset_df.iterrows():
                        
                        test_array1 = np.array([row['Open'], row['Close'], row['High'], row['Low']])
                        test_array2 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low']])
                        test_array3 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low']])
                
                        test_array4 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI']])
                        test_array5 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI']])
                        test_array6 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI']])
                
                        test_array7 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MFI']])
                        test_array8 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MFI']])
                        test_array9 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MFI']])
                
                        test_array10 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MACD'], row['Signal_Line']])
                        test_array11 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MACD'], row['Signal_Line']])
                        test_array12 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MACD'], row['Signal_Line']])
                
                        test_array13 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                        test_array14 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                        test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                
                
                        temp_list1.append(test_array1)
                        temp_list2.append(test_array2)
                        temp_list3.append(test_array3)
                        temp_list4.append(test_array4)
                        temp_list5.append(test_array5)
                        temp_list6.append(test_array6)
                        temp_list7.append(test_array7)
                        temp_list8.append(test_array8)
                        temp_list9.append(test_array9)
                        temp_list10.append(test_array10)
                        temp_list11.append(test_array11)
                        temp_list12.append(test_array12)
                        temp_list13.append(test_array13)
                        temp_list14.append(test_array14)
                        temp_list15.append(test_array15)
                        
                independent_list1.append(temp_list1)
                independent_list2.append(temp_list2)
                independent_list3.append(temp_list3)
                independent_list4.append(temp_list4)
                independent_list5.append(temp_list5)
                independent_list6.append(temp_list6)
                independent_list7.append(temp_list7)
                independent_list8.append(temp_list8)
                independent_list9.append(temp_list9)
                independent_list10.append(temp_list10)
                independent_list11.append(temp_list11)
                independent_list12.append(temp_list12)
                independent_list13.append(temp_list13)
                independent_list14.append(temp_list14)
                independent_list15.append(temp_list15)
            
                dependent_list_regression.append(dependent2_df['Close'].iloc[0])
                dependent_list_regression_log.append(dependent2_df['Log_Close'].iloc[0])
                dependent_list_regression_normalized.append(dependent2_df['Normalized_Close'].iloc[0])
        
                if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * percent):
                    dependent_list.append(1)
                else:
                    dependent_list.append(0)
        
        independent_array1 = np.array(independent_list1)
        independent_array2 = np.array(independent_list2)
        independent_array3 = np.array(independent_list3)
        independent_array4 = np.array(independent_list4)
        independent_array5 = np.array(independent_list5)
        independent_array6 = np.array(independent_list6)
        independent_array7 = np.array(independent_list7)
        independent_array8= np.array(independent_list8)
        independent_array9 = np.array(independent_list9)
        independent_array10 = np.array(independent_list10)
        independent_array11 = np.array(independent_list11)
        independent_array12 = np.array(independent_list12)
        independent_array13 = np.array(independent_list13)
        independent_array14 = np.array(independent_list14)
        independent_array15 = np.array(independent_list15)
        dependent_array = np.array(dependent_list)
        dependent_array_regression = np.array(dependent_list_regression)
        dependent_array_regression_log = np.array(dependent_list_regression_log)
        dependent_array_regression_normalized = np.array(dependent_list_regression_normalized)
    
    
        independent_array = []
        best_accuracy = []
        avg_accuracy = []
        counter_independentarray = 0
        for i in range(1, 16):
            #if i != 12: #testing what seems is the most well performing model
                #continue
            
            # Select which independent_array to use
            if i == 1:
                X = independent_array1  # Shape: (890, 30, 4)
                independent_array.append("independent_array1")
                y = dependent_array_regression
            if i == 2:
                X = independent_array2  # Shape: (890, 30, 4)
                independent_array.append("independent_array2")
                y = dependent_array_regression_log
            if i == 3:
                X = independent_array3  # Shape: (890, 30, 4)
                independent_array.append("independent_array3")
                y = dependent_array_regression_normalized
            if i == 4:
                X = independent_array4  # Shape: (890, 30, 5)
                independent_array.append("independent_array4")
                y = dependent_array_regression
            if i == 5:
                X = independent_array5  # Shape: (890, 30, 5)
                independent_array.append("independent_array5")
                y = dependent_array_regression_log
            if i == 6:
                X = independent_array6  # Shape: (890, 30, 5)
                independent_array.append("independent_array6")
                y = dependent_array_regression_normalized
            if i == 7:
                X = independent_array7  # Shape: (890, 30, 5)
                independent_array.append("independent_array7")
                y = dependent_array_regression
            if i == 8:
                X = independent_array8  # Shape: (890, 30, 5)
                independent_array.append("independent_array8")
                y = dependent_array_regression_log
            if i == 9:
                X = independent_array9  # Shape: (890, 30, 5)
                independent_array.append("independent_array9")
                y = dependent_array_regression_normalized
            if i == 10:
                X = independent_array10  # Shape: (890, 30, 6)
                independent_array.append("independent_array10")
                y = dependent_array_regression
            if i == 11:
                X = independent_array11  # Shape: (890, 30, 6)
                independent_array.append("independent_array11")
                y = dependent_array_regression_log
            if i == 12:
                X = independent_array12  # Shape: (890, 30, 6)
                independent_array.append("independent_array12")
                y = dependent_array_regression_normalized
            if i == 13:
                X = independent_array13  # Shape: (890, 30, 8)
                independent_array.append("independent_array13")
                y = dependent_array_regression
            if i == 14:
                X = independent_array14  # Shape: (890, 30, 8)
                independent_array.append("independent_array14")
                y = dependent_array_regression_log
            if i == 15:
                X = independent_array15  # Shape: (890, 30, 8)
                independent_array.append("independent_array15")
                y = dependent_array_regression_normalized
        
            counter_independentarray = counter_independentarray + 1
            
            # Define the input shape based on the number of features
            input_shape = (30, X.shape[2])  # 30 time-steps and `X.shape[2]` features per time-step
            
            # Create the LSTM model
            regression_model = create_lstm_regression(input_shape)
            
            # Initialize k-fold cross-validation
            kf = KFold(n_splits=5, shuffle=True, random_state=6)  #regular 5-fold cross-validation w/out stratification, for regression tasks because no class imbalance
            #kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=6)  # 5-fold cross-validation with stratification
            
            #create list to gather accuracy scores after training each fold
            fold_accuracies = []
            
            #K-fold Cross-Validation
            counter_kfold = 0
            #for train_index, val_index in kf.split(X, y): #used for stratified k-fold
            for train_index, val_index in kf.split(X): #used for regular k-fold
                
                counter_kfold = counter_kfold + 1
                print(f"Now running, pct_increase: {percent}; days out: {day}; independent_array: {counter_independentarray}; K-fold: {counter_kfold}")
                
                X_train, X_val = X[train_index], X[val_index]
                y_train, y_val = y[train_index], y[val_index]
                
                # Train the classification model and store the history; verbose = 0 to hide epoch running info in cell output
                history = regression_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=0)

                #get the predictions of the model when applied on the test set
                y_pred = regression_model.predict(X_val)
                
                #Get the actual closing prices from the test set; we know the closing price on the final day of the candlestick pattern, 
                #is always the 30th day, and 2nd item in the array
                last_closing_price = X_val[:, 29, 1]
                
                comparison = (y_pred.flatten() > last_closing_price * percent) #are the predictions greater than the actual closing prices
                comparison_2 = (y_pred.flatten() <= last_closing_price * percent) #are the predictions less than or equal to the actual closing prices
                
                comparison_3 = (y_val > last_closing_price * percent) #are the actual closing prices from the test set greater than the actual closing prices
                comparison_4 = (y_val <= last_closing_price * percent) #are the actual closing prices from the test set less than or equal to the actual closing prices
                
                # Case 1: When both predicted and actual values are greater than the closing price
                correct_greater = comparison & comparison_3
                # Case 2: When both predicted and actual values are less than or equal to the closing price
                correct_lesser_or_equal = comparison_2 & comparison_4
                
                #total correct predictions
                print(f'Total correct predictions: {np.sum(correct_greater) + np.sum(correct_lesser_or_equal)}; out of {len(y_val)} observations in test set')
                print(f'Total observations (combined train and test sets): {sum(np.unique(dependent_array, return_counts=True)[1])}')
                print(f'Number of observations each class from dependent variable (combined): {np.unique(dependent_array, return_counts=True)}')

                #the accuracy score at the 10th epoch for each fold is appended to this list; this is different than gathering the accuracy
                #scores for the classification model, because the classification model gets all accuracy scores from each epoch
                fold_accuracies.append((np.sum(correct_greater) + np.sum(correct_lesser_or_equal)) / len(y_val))
        
        
            
            # Calculate the best and average validation accuracy across all folds
            best_val_accuracy = np.max(fold_accuracies) #from the 10th epoch for each of the five folds, the accuracy is collected, and the max accuracy is stored
            avg_val_accuracy = np.mean(fold_accuracies) #from the 10th epoch for each of the five folds, the accuracy is collected, and the mean accuracy is stored
            best_accuracy.append(best_val_accuracy)
            avg_accuracy.append(avg_val_accuracy)
        
        
        # Example of new data to add
        df_new = pd.DataFrame({
            'ticker': ticker_symbol,
            'pattern': selected_pattern,
            'independent_array': independent_array,
            'best_accuracy': best_accuracy,
            'avg_accuracy': avg_accuracy,
            'days_out': day,
            'Total_observations': sum(np.unique(dependent_array, return_counts=True)[1]),
            'Negative_observations': np.unique(dependent_array, return_counts=True)[1][0],
            'Positive_observations': np.unique(dependent_array, return_counts=True)[1][1],
            'Percent_increase_parameter': percent
        })
    
        # Concatenate the new data to the empty DataFrame
        accuracy_df = pd.concat([accuracy_df, df_new], ignore_index=True)
    

  super().__init__(**kwargs)


Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 1
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 245ms/step
Total correct predictions: 4; out of 11 observations in test set
Total observations (combined train and test sets): 52
Number of observations each class from dependent variable (combined): (array([0, 1]), array([22, 30]))
Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 2
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step
Total correct predictions: 6; out of 11 observations in test set
Total observations (combined train and test sets): 52
Number of observations each class from dependent variable (combined): (array([0, 1]), array([22, 30]))
Now running, pct_increase: 1.0; days out: 1; independent_array: 1; K-fold: 3
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 346ms/step
Total correct predictions: 4; out of 10 observations in test set
Total observations (combined train and tes

  accuracy_df = pd.concat([accuracy_df, df_new], ignore_index=True)


Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 1
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 312ms/step
Total correct predictions: 3; out of 11 observations in test set
Total observations (combined train and test sets): 52
Number of observations each class from dependent variable (combined): (array([0, 1]), array([19, 33]))
Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 2
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
Total correct predictions: 5; out of 11 observations in test set
Total observations (combined train and test sets): 52
Number of observations each class from dependent variable (combined): (array([0, 1]), array([19, 33]))
Now running, pct_increase: 1.0; days out: 3; independent_array: 1; K-fold: 3
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 251ms/step
Total correct predictions: 1; out of 10 observations in test set
Total observations (combined train and tes

In [141]:
#will output multiple CSV files with training results for classification model
accuracy_df.to_csv(f'{ticker_symbol}_{selected_pattern}_regression_output.csv', index=False)  # `index=False` avoids writing the index column

#### Increasing number of observations for best performing model (Chose Classification Over Regression as Best Performing Model)

After reviewing all of the CSV outputs with the training result data and respective accuracy scores, I have determined that the classification model performs better and more consistent with independent array #15 as the best performing set of independent variables. **I further explain why I chose the classification model over the regression model in the 'Reporting' section of this document.**

**At this point, I have enough data to answer my research question, however, now I am going to increase the number of randomly generated sequences in order to make my model more adaptable for use on any given day as the hammer and inverted candlestick patterns have around a 1% occurrence rate. My hope is that even when a true candlestick pattern is not present, the model could still make reliable predictions for future closing prices.**

Now, I am going to see if increasing the number of observations will change the accuracy scores for my best performing set of independent variables. While I cannot increase the number of observations when a candlestick pattern is identified, as that number is already fixed, I can increase the number of observations for my random "Yes" and "No" values. By simulating the presence of a random pattern with more data, I will test whether adding additional random observations (via the newly generated 'Random_Yes_No_2' column) will influence the accuracy scores. This approach will allow me to assess the impact of a larger dataset on model performance using the same model architecture.

In [142]:
####Used to create another new column to test random values of 'yes' to simulate presence of a random pattern
# Specify the number of "Yes" values you want, may show up as less during training due to location of the "Yes" value, as need at least 30 days
#of data for the 30-day sequence, or if the future closing price is not available (only have data to 2/14)
num_yes = 2200

# Create a list of "Yes" and "No" values
yes_no_list = ["Yes"] * num_yes + ["No"] * (len(finance_df) - num_yes)

#set seed for reproducibility
np.random.seed(6) 

# Shuffle the list to randomize the order
np.random.shuffle(yes_no_list)

# Add the list as a new column in the DataFrame; we already have a column 'Random_Yes_No' which was used to train the occurence of a random pattern
#this mimics that idea but will now be a new column 'Random_Yes_No_2', but this time with more generated random observations
finance_df['Random_Yes_No_2'] = yes_no_list

In [143]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import KFold
from sklearn.model_selection import StratifiedKFold
import numpy as np
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.optimizers import RMSprop
import pandas as pd


### User inputs ###
selected_pattern = "Random"   #choices: 'Random', 'Hammer', 'InvertedHammer'

#How many days after the pattern is identified to use for the dependent variable
days_out = [1, 3, 5, 10, 15]

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = [1.00, 1.01, 1.02]

######


# Define the classification model
def create_lstm_classification(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for binary classification
    model.add(Dense(1, activation='sigmoid'))  # Sigmoid for binary classification (probability)
    
    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])  # Binary cross-entropy for classification
    return model

    

#Subset data frame for desired pattern
if (selected_pattern == "Random"):
    pattern_df = finance_df[finance_df['Random_Yes_No_2'] == "Yes"] #this time, will select the newly created column with more observations
elif (selected_pattern == "Hammer"):
    pattern_df = finance_df[finance_df['Hammer_pattern'] == "Yes"]
else:
    pattern_df = finance_df[finance_df['InvertedHammer_pattern'] == "Yes"]


#initialize an empty DataFrame with column names
accuracy_df = pd.DataFrame(columns=['ticker', 'pattern', 'independent_array', 'best_accuracy', 'avg_accuracy', 'days_out', 'Total_observations', 
                                   'Negative_observations', 'Positive_observations', 'Percent_increase_parameter'])


for percent in pct_increase:

    for day in days_out:
        #Gather independent variables
        independent_list1 = []
        independent_list2 = []
        independent_list3 = []
        independent_list4 = []
        independent_list5 = []
        independent_list6 = []
        independent_list7 = []
        independent_list8 = []
        independent_list9 = []
        independent_list10 = []
        independent_list11 = []
        independent_list12 = []
        independent_list13 = []
        independent_list14 = []
        independent_list15 = []
        
        #gather dependent variables
        dependent_list = []

        #these are the row indexes that have the identified patterns; loop through
        pattern_index = list(pattern_df["Row_index"])
        for i in pattern_index:
            #if (i == 62):
            #    break
            
            #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
            if (i < 56):
                continue
        
            #get 30 days worth of data to gather data for indpendent variables
            subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
            #subset_df = finance_df[(finance_df["Row_index"] >= (i - 13)) & (finance_df["Row_index"] <= (i))]
            
            #Get day after data to gather closing price for dependent variable
            dependent_df = finance_df[finance_df["Row_index"] == (i)]
            dependent2_df = finance_df[finance_df["Row_index"] == (i + day)]
            
            temp_list1 = []
            temp_list2 = []
            temp_list3 = []
            temp_list4 = []
            temp_list5 = []
            temp_list6 = []
            temp_list7 = []
            temp_list8 = []
            temp_list9 = []
            temp_list10 = []
            temp_list11 = []
            temp_list12 = []
            temp_list13 = []
            temp_list14 = []
            temp_list15 = []
        
            #append temp_list to independent_list
            if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
            
        
                for index, row in subset_df.iterrows():
                        
                        test_array1 = np.array([row['Open'], row['Close'], row['High'], row['Low']])
                        test_array2 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low']])
                        test_array3 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low']])
                
                        test_array4 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI']])
                        test_array5 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI']])
                        test_array6 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI']])
                
                        test_array7 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MFI']])
                        test_array8 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MFI']])
                        test_array9 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MFI']])
                
                        test_array10 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['MACD'], row['Signal_Line']])
                        test_array11 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['MACD'], row['Signal_Line']])
                        test_array12 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['MACD'], row['Signal_Line']])
                
                        test_array13 = np.array([row['Open'], row['Close'], row['High'], row['Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                        test_array14 = np.array([row['Log_Open'], row['Log_Close'], row['Log_High'], row['Log_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                        test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
                
                
                        temp_list1.append(test_array1)
                        temp_list2.append(test_array2)
                        temp_list3.append(test_array3)
                        temp_list4.append(test_array4)
                        temp_list5.append(test_array5)
                        temp_list6.append(test_array6)
                        temp_list7.append(test_array7)
                        temp_list8.append(test_array8)
                        temp_list9.append(test_array9)
                        temp_list10.append(test_array10)
                        temp_list11.append(test_array11)
                        temp_list12.append(test_array12)
                        temp_list13.append(test_array13)
                        temp_list14.append(test_array14)
                        temp_list15.append(test_array15)
                        
                independent_list1.append(temp_list1)
                independent_list2.append(temp_list2)
                independent_list3.append(temp_list3)
                independent_list4.append(temp_list4)
                independent_list5.append(temp_list5)
                independent_list6.append(temp_list6)
                independent_list7.append(temp_list7)
                independent_list8.append(temp_list8)
                independent_list9.append(temp_list9)
                independent_list10.append(temp_list10)
                independent_list11.append(temp_list11)
                independent_list12.append(temp_list12)
                independent_list13.append(temp_list13)
                independent_list14.append(temp_list14)
                independent_list15.append(temp_list15)
            
                if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * percent):
                    dependent_list.append(1)
                else:
                    dependent_list.append(0)
        
        independent_array1 = np.array(independent_list1)
        independent_array2 = np.array(independent_list2)
        independent_array3 = np.array(independent_list3)
        independent_array4 = np.array(independent_list4)
        independent_array5 = np.array(independent_list5)
        independent_array6 = np.array(independent_list6)
        independent_array7 = np.array(independent_list7)
        independent_array8= np.array(independent_list8)
        independent_array9 = np.array(independent_list9)
        independent_array10 = np.array(independent_list10)
        independent_array11 = np.array(independent_list11)
        independent_array12 = np.array(independent_list12)
        independent_array13 = np.array(independent_list13)
        independent_array14 = np.array(independent_list14)
        independent_array15 = np.array(independent_list15)
        dependent_array = np.array(dependent_list)
    
    
        y = dependent_array
        independent_array = []
        best_accuracy = []
        avg_accuracy = []
        counter_independentarray = 0
        for i in range(1, 16):
            if i != 15: #testing what seems is the most well performing model
                continue
            
            # Select which independent_array to use
            if i == 1:
                X = independent_array1  # Shape: (890, 30, 4)
                independent_array.append("independent_array1")
            if i == 2:
                X = independent_array2  # Shape: (890, 30, 4)
                independent_array.append("independent_array2")
            if i == 3:
                X = independent_array3  # Shape: (890, 30, 4)
                independent_array.append("independent_array3")
            if i == 4:
                X = independent_array4  # Shape: (890, 30, 5)
                independent_array.append("independent_array4")
            if i == 5:
                X = independent_array5  # Shape: (890, 30, 5)
                independent_array.append("independent_array5")
            if i == 6:
                X = independent_array6  # Shape: (890, 30, 5)
                independent_array.append("independent_array6")
            if i == 7:
                X = independent_array7  # Shape: (890, 30, 5)
                independent_array.append("independent_array7")
            if i == 8:
                X = independent_array8  # Shape: (890, 30, 5)
                independent_array.append("independent_array8")
            if i == 9:
                X = independent_array9  # Shape: (890, 30, 5)
                independent_array.append("independent_array9")
            if i == 10:
                X = independent_array10  # Shape: (890, 30, 6)
                independent_array.append("independent_array10")
            if i == 11:
                X = independent_array11  # Shape: (890, 30, 6)
                independent_array.append("independent_array11")
            if i == 12:
                X = independent_array12  # Shape: (890, 30, 6)
                independent_array.append("independent_array12")
            if i == 13:
                X = independent_array13  # Shape: (890, 30, 8)
                independent_array.append("independent_array13")
            if i == 14:
                X = independent_array14  # Shape: (890, 30, 8)
                independent_array.append("independent_array14")
            if i == 15:
                X = independent_array15  # Shape: (890, 30, 8)
                independent_array.append("independent_array15")
        
            #counter_independentarray = counter_independentarray + 1
            
            # Define the input shape based on the number of features
            input_shape = (30, X.shape[2])  # 30 time-steps and `X.shape[2]` features per time-step
            
            # Create the LSTM model
            classification_model = create_lstm_classification(input_shape)
            
            # Initialize k-fold cross-validation
            #kf = KFold(n_splits=5, shuffle=True, random_state=6)  #regular 5-fold cross-validation w/out stratification
            kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=6)  # 5-fold cross-validation with stratification

            #initialize to gather all the accuracy scores at each epoch for all 5 folds
            fold_accuracies = []
            
            #stratified K-fold Cross-Validation
            counter_kfold = 0
            for train_index, val_index in kf.split(X, y): #used for stratified k-fold
            #for train_index, val_index in kf.split(X): #used for regular k-fold
                
                counter_kfold = counter_kfold + 1
                print(f"Now running, pct_increase: {percent}; days out: {day}; independent_array: {independent_array[(len(independent_array) - 1)]}; K-fold: {counter_kfold}")
                
                X_train, X_val = X[train_index], X[val_index]
                y_train, y_val = y[train_index], y[val_index]
                
                # Train the classification model and store the history; verbose = 0 to hide epoch running info in cell output
                history = classification_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=1)
                
                # Get the validation accuracies for this fold. What this does is that an accuracy score is calculated at each epoch,
                #and in this list I am getting all the accuracy scores from all five folds
                val_accuracy = history.history['val_accuracy']
                fold_accuracies.append(val_accuracy)
        
        
            
            # Calculate the best and average validation accuracy across all folds
            best_val_accuracy = np.max(fold_accuracies) #get the max accuracy across all epochs across all five folds
            avg_val_accuracy = np.mean(fold_accuracies) #get the mean accuracy across all epochs across all five folds
            best_accuracy.append(best_val_accuracy)
            avg_accuracy.append(avg_val_accuracy)
        
        
        # Example of new data to add
        df_new = pd.DataFrame({
            'ticker': ticker_symbol,
            'pattern': selected_pattern,
            'independent_array': independent_array,
            'best_accuracy': best_accuracy,
            'avg_accuracy': avg_accuracy,
            'days_out': day,
            'Total_observations': sum(np.unique(dependent_array, return_counts=True)[1]),
            'Negative_observations': np.unique(dependent_array, return_counts=True)[1][0],
            'Positive_observations': np.unique(dependent_array, return_counts=True)[1][1],
            'Percent_increase_parameter': percent
        })
    
        # Concatenate the new data to the empty DataFrame
        accuracy_df = pd.concat([accuracy_df, df_new], ignore_index=True)
    

Now running, pct_increase: 1.0; days out: 1; independent_array: independent_array15; K-fold: 1
Epoch 1/10


  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 92ms/step - accuracy: 0.5199 - loss: 0.7155 - val_accuracy: 0.4521 - val_loss: 0.6954
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 72ms/step - accuracy: 0.5184 - loss: 0.6940 - val_accuracy: 0.5000 - val_loss: 0.6950
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 86ms/step - accuracy: 0.5173 - loss: 0.6976 - val_accuracy: 0.5297 - val_loss: 0.6909
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 80ms/step - accuracy: 0.5402 - loss: 0.6860 - val_accuracy: 0.5388 - val_loss: 0.6906
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 83ms/step - accuracy: 0.5505 - loss: 0.6877 - val_accuracy: 0.5342 - val_loss: 0.6907
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 62ms/step - accuracy: 0.5293 - loss: 0.6888 - val_accuracy: 0.5000 - val_loss: 0.6941
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  accuracy_df = pd.concat([accuracy_df, df_new], ignore_index=True)


Now running, pct_increase: 1.0; days out: 3; independent_array: independent_array15; K-fold: 1
Epoch 1/10


  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 70ms/step - accuracy: 0.5491 - loss: 0.7074 - val_accuracy: 0.5753 - val_loss: 0.6837
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 83ms/step - accuracy: 0.5599 - loss: 0.6887 - val_accuracy: 0.5753 - val_loss: 0.6815
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 81ms/step - accuracy: 0.5764 - loss: 0.6848 - val_accuracy: 0.5753 - val_loss: 0.6811
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 81ms/step - accuracy: 0.5442 - loss: 0.6902 - val_accuracy: 0.5753 - val_loss: 0.6847
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 81ms/step - accuracy: 0.5657 - loss: 0.6843 - val_accuracy: 0.5753 - val_loss: 0.6809
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 77ms/step - accuracy: 0.5926 - loss: 0.6824 - val_accuracy: 0.5753 - val_loss: 0.6806
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 61ms/step - accuracy: 0.5584 - loss: 0.6965 - val_accuracy: 0.5936 - val_loss: 0.6938
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 53ms/step - accuracy: 0.5735 - loss: 0.6907 - val_accuracy: 0.5936 - val_loss: 0.6763
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 51ms/step - accuracy: 0.5904 - loss: 0.6795 - val_accuracy: 0.5936 - val_loss: 0.6760
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 53ms/step - accuracy: 0.6004 - loss: 0.6717 - val_accuracy: 0.5936 - val_loss: 0.6762
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 51ms/step - accuracy: 0.5798 - loss: 0.6759 - val_accuracy: 0.5936 - val_loss: 0.6775
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 53ms/step - accuracy: 0.5948 - loss: 0.6779 - val_accuracy: 0.5913 - val_loss: 0.6761
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 57ms/step - accuracy: 0.5815 - loss: 0.6986 - val_accuracy: 0.6224 - val_loss: 0.6612
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.6310 - loss: 0.6637 - val_accuracy: 0.6224 - val_loss: 0.6590
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 51ms/step - accuracy: 0.6299 - loss: 0.6563 - val_accuracy: 0.6087 - val_loss: 0.6686
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.6116 - loss: 0.6708 - val_accuracy: 0.6224 - val_loss: 0.6652
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 51ms/step - accuracy: 0.6076 - loss: 0.6710 - val_accuracy: 0.6224 - val_loss: 0.6640
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 52ms/step - accuracy: 0.6364 - loss: 0.6571 - val_accuracy: 0.6201 - val_loss: 0.6635
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 55ms/step - accuracy: 0.6381 - loss: 0.6637 - val_accuracy: 0.6362 - val_loss: 0.6484
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.6494 - loss: 0.6584 - val_accuracy: 0.6362 - val_loss: 0.6449
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.6283 - loss: 0.6575 - val_accuracy: 0.6362 - val_loss: 0.6496
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.6382 - loss: 0.6529 - val_accuracy: 0.6362 - val_loss: 0.6407
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.6261 - loss: 0.6523 - val_accuracy: 0.6362 - val_loss: 0.6644
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.6541 - loss: 0.6382 - val_accuracy: 0.6384 - val_loss: 0.6728
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.7796 - loss: 0.4872 - val_accuracy: 0.8539 - val_loss: 0.4109
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.8573 - loss: 0.4031 - val_accuracy: 0.8539 - val_loss: 0.3968
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.8617 - loss: 0.3908 - val_accuracy: 0.8562 - val_loss: 0.3964
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.8634 - loss: 0.3899 - val_accuracy: 0.8562 - val_loss: 0.3921
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 52ms/step - accuracy: 0.8611 - loss: 0.3836 - val_accuracy: 0.8539 - val_loss: 0.4006
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.8469 - loss: 0.4110 - val_accuracy: 0.8539 - val_loss: 0.3916
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.7132 - loss: 0.6077 - val_accuracy: 0.7215 - val_loss: 0.5856
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 45ms/step - accuracy: 0.7120 - loss: 0.5978 - val_accuracy: 0.7260 - val_loss: 0.5850
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 43ms/step - accuracy: 0.7046 - loss: 0.6058 - val_accuracy: 0.7215 - val_loss: 0.5827
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 47ms/step - accuracy: 0.7100 - loss: 0.6011 - val_accuracy: 0.7215 - val_loss: 0.5847
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.7077 - loss: 0.6101 - val_accuracy: 0.7215 - val_loss: 0.5834
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 48ms/step - accuracy: 0.7219 - loss: 0.5840 - val_accuracy: 0.7215 - val_loss: 0.5826
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.6005 - loss: 0.6756 - val_accuracy: 0.6370 - val_loss: 0.6490
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 44ms/step - accuracy: 0.6172 - loss: 0.6624 - val_accuracy: 0.6393 - val_loss: 0.6546
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 44ms/step - accuracy: 0.6254 - loss: 0.6625 - val_accuracy: 0.6393 - val_loss: 0.6514
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 46ms/step - accuracy: 0.6432 - loss: 0.6473 - val_accuracy: 0.6370 - val_loss: 0.6465
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 48ms/step - accuracy: 0.6174 - loss: 0.6613 - val_accuracy: 0.6393 - val_loss: 0.6550
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 50ms/step - accuracy: 0.6498 - loss: 0.6416 - val_accuracy: 0.6393 - val_loss: 0.6472
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.5340 - loss: 0.7088 - val_accuracy: 0.5515 - val_loss: 0.6919
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 42ms/step - accuracy: 0.5699 - loss: 0.6843 - val_accuracy: 0.5057 - val_loss: 0.6932
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 45ms/step - accuracy: 0.5442 - loss: 0.6878 - val_accuracy: 0.4943 - val_loss: 0.6917
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 45ms/step - accuracy: 0.5501 - loss: 0.6863 - val_accuracy: 0.5080 - val_loss: 0.6974
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 44ms/step - accuracy: 0.5396 - loss: 0.6860 - val_accuracy: 0.5011 - val_loss: 0.6994
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 41ms/step - accuracy: 0.5438 - loss: 0.6911 - val_accuracy: 0.5126 - val_loss: 0.6951
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 48ms/step - accuracy: 0.5054 - loss: 0.7049 - val_accuracy: 0.5172 - val_loss: 0.6907
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 42ms/step - accuracy: 0.4985 - loss: 0.6967 - val_accuracy: 0.5080 - val_loss: 0.6913
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 43ms/step - accuracy: 0.5082 - loss: 0.6971 - val_accuracy: 0.5263 - val_loss: 0.6885
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 44ms/step - accuracy: 0.5215 - loss: 0.6908 - val_accuracy: 0.5400 - val_loss: 0.6886
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 44ms/step - accuracy: 0.5394 - loss: 0.6863 - val_accuracy: 0.5149 - val_loss: 0.6922
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 41ms/step - accuracy: 0.5293 - loss: 0.6925 - val_accuracy: 0.5584 - val_loss: 0.6885
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 52ms/step - accuracy: 0.8981 - loss: 0.2653 - val_accuracy: 0.9703 - val_loss: 0.1295
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 43ms/step - accuracy: 0.9720 - loss: 0.1297 - val_accuracy: 0.9703 - val_loss: 0.1181
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 43ms/step - accuracy: 0.9703 - loss: 0.1282 - val_accuracy: 0.9703 - val_loss: 0.1107
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 42ms/step - accuracy: 0.9730 - loss: 0.1163 - val_accuracy: 0.9703 - val_loss: 0.1097
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 44ms/step - accuracy: 0.9674 - loss: 0.1371 - val_accuracy: 0.9703 - val_loss: 0.1138
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 46ms/step - accuracy: 0.9671 - loss: 0.1298 - val_accuracy: 0.9703 - val_loss: 0.1090
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 49ms/step - accuracy: 0.8883 - loss: 0.3802 - val_accuracy: 0.8881 - val_loss: 0.3406
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 40ms/step - accuracy: 0.8957 - loss: 0.3192 - val_accuracy: 0.8881 - val_loss: 0.3387
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 42ms/step - accuracy: 0.8772 - loss: 0.3444 - val_accuracy: 0.8881 - val_loss: 0.3409
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 43ms/step - accuracy: 0.8918 - loss: 0.3181 - val_accuracy: 0.8881 - val_loss: 0.3466
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 44ms/step - accuracy: 0.8809 - loss: 0.3351 - val_accuracy: 0.8881 - val_loss: 0.3409
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 43ms/step - accuracy: 0.8918 - loss: 0.3279 - val_accuracy: 0.8881 - val_loss: 0.3477
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 48ms/step - accuracy: 0.8427 - loss: 0.4823 - val_accuracy: 0.8333 - val_loss: 0.4301
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 41ms/step - accuracy: 0.8293 - loss: 0.4393 - val_accuracy: 0.8333 - val_loss: 0.4267
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 41ms/step - accuracy: 0.8364 - loss: 0.4388 - val_accuracy: 0.8333 - val_loss: 0.4273
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 41ms/step - accuracy: 0.8482 - loss: 0.3977 - val_accuracy: 0.8333 - val_loss: 0.4291
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 44ms/step - accuracy: 0.8257 - loss: 0.4405 - val_accuracy: 0.8333 - val_loss: 0.4274
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 41ms/step - accuracy: 0.8384 - loss: 0.4257 - val_accuracy: 0.8333 - val_loss: 0.4419
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 72ms/step - accuracy: 0.6856 - loss: 0.6356 - val_accuracy: 0.7323 - val_loss: 0.5729
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 60ms/step - accuracy: 0.7426 - loss: 0.5660 - val_accuracy: 0.7323 - val_loss: 0.5691
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 44ms/step - accuracy: 0.7261 - loss: 0.5779 - val_accuracy: 0.7323 - val_loss: 0.5682
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 43ms/step - accuracy: 0.7295 - loss: 0.5750 - val_accuracy: 0.7323 - val_loss: 0.5640
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 42ms/step - accuracy: 0.7196 - loss: 0.5886 - val_accuracy: 0.7300 - val_loss: 0.5683
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 38ms/step - accuracy: 0.7472 - loss: 0.5589 - val_accuracy: 0.7323 - val_loss: 0.5690
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

  super().__init__(**kwargs)


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 74ms/step - accuracy: 0.6253 - loss: 0.6607 - val_accuracy: 0.6384 - val_loss: 0.6517
Epoch 2/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 61ms/step - accuracy: 0.6276 - loss: 0.6496 - val_accuracy: 0.6293 - val_loss: 0.6449
Epoch 3/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 64ms/step - accuracy: 0.6516 - loss: 0.6359 - val_accuracy: 0.6201 - val_loss: 0.6469
Epoch 4/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 62ms/step - accuracy: 0.6398 - loss: 0.6461 - val_accuracy: 0.6339 - val_loss: 0.6533
Epoch 5/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 67ms/step - accuracy: 0.6677 - loss: 0.6314 - val_accuracy: 0.6384 - val_loss: 0.6494
Epoch 6/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 70ms/step - accuracy: 0.6525 - loss: 0.6318 - val_accuracy: 0.6247 - val_loss: 0.6463
Epoch 7/10
[1m55/55[0m [32m━━━━━━━━━━━━━━━

In [144]:
#will output one CSV files with training results for classification model
accuracy_df.to_csv(f'{ticker_symbol}_{selected_pattern}_classification_2200_Observations_output.csv', index=False)  # `index=False` avoids writing the index column

#### Final LSTM Classification Model with: Increased Observations, Stratified K-fold Cross Validation, Ensemble Learning Approach

To build off from the previous code, I will now implement ensemble methods. Below, I am going to create and train 9 models, all of which have the same architecture and train them all on my best performing set of independent variables, independent array #15.  Each of these 9 models will undergo Stratified 5-fold cross validation as we did previously, and I will save the best weights for each model during training using a ModelCheckpoint callback - the epoch with the highest validation accuracy across all five folds will have its model weights saved. 

After the models have been trained and their best weights saved, I will load these models and combine their predictions using an ensemble approach. Specifically, I will take each model and output its predictions; I will use a voting-based system where if any 5 of the 9 models (the majority) predict the class as a positive class, then its final prediction will be classified as positive, and if less than 5 models predict it as positive, it will be classified as negative. This approach ensures that the final decision is based on the collective judgment of the majority of the models, reducing the likelihood of errors from individual models. 

Lastly, the accuracy of the ensemble model will be evaluated on a separate test dataset to assess the effectiveness of this combined approach. This ensemble method aims to leverage the strengths of each individual model and provide a more robust prediction.

For this example below, I will just set the 'days_out' parameter to 10 and the 'pct_increase' parameter to 1.01. I am currently choosing this parameter combination because when analyzing the model results from the last step, there appeared to be a noticeable increase in the accuracy score from my model (67.3%) when compared to always guessing the majority class (55.3%).

In [145]:
#Subset data frame for desired pattern
pattern_df = finance_df[finance_df['Random_Yes_No_2'] == "Yes"]

#How many days after the pattern is identified to use for the dependent variable
days_out = 10

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = 1.01

#Gather independent variables
independent_list15 = []

#gather dependent variables
dependent_list = []

pattern_index = list(pattern_df["Row_index"])
#pattern_index = [60, 62]
for i in pattern_index:
    #if (i == 62):
    #    break
    
    #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
    if (i < 56):
        continue

    #get 30 days worth of data to gather data for indpendent variables
    subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
    #subset_df = finance_df[(finance_df["Row_index"] >= (i - 13)) & (finance_df["Row_index"] <= (i))]
    
    #Get day after data to gather closing price for dependent variable
    dependent_df = finance_df[finance_df["Row_index"] == (i)]
    dependent2_df = finance_df[finance_df["Row_index"] == (i + days_out)]
    
    temp_list15 = []

    #append temp_list to independent_list
    if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
    

        for index, row in subset_df.iterrows():
                
                test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
            
                temp_list15.append(test_array15)
                
        independent_list15.append(temp_list15)
    
        if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * pct_increase):
            dependent_list.append(1)
        else:
            dependent_list.append(0)

independent_array15 = np.array(independent_list15)
dependent_array = np.array(dependent_list)

The resulting shape of the independent variable dataset is: 30 time-steps and 8 features per time-step; with a total of 2185 observations.

In [146]:
print(np.shape(independent_array15)) # 30 time-steps and 8 features per time-step; with a total of 2185 observations

(2185, 30, 8)


Now, after my independent variables associated with independent array #15 has been gathered above and my dependent array has also been gathered above, I will train 9 models with the same architecture on the data, applying 5-fold stratified cross-validation to each model. For each of the 9 models, the epoch with the best performance (highest validation accuracy) will have its weights saved.

In [147]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint
from sklearn.model_selection import StratifiedKFold
import numpy as np
from sklearn.preprocessing import StandardScaler

# Define the LSTM classification model
def create_lstm_classification(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for binary classification
    model.add(Dense(1, activation='sigmoid'))  # Sigmoid for binary classification (probability)
    
    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])  # Binary cross-entropy for classification
    
    return model

# Define independent and dependent variables
X = independent_array15  # Shape: (890, 30, [4, 5, 6, or 8]) 
y = dependent_array  # Shape: (890,) (binary labels, 0 or 1)

# Initialize list to hold models and their weights
models = []

# Loop for training 9 identical models
for model_num in range(9):
    print(f"Training model {model_num + 1}/9...")

    # Initialize the KFold
    kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=(model_num+1))

    # Create and compile the model once per model_num (outside of the KFold loop)
    # This model will be reused for each fold
    input_shape = (30, X.shape[2])  # Define input shape based on your data
    model = create_lstm_classification(input_shape)

    # Initialize an empty list for storing fold accuracies
    fold_accuracies = []

    # Set up a ModelCheckpoint callback to save the model's weights when validation accuracy is improved
    checkpoint = ModelCheckpoint(f'model_{model_num + 1}_best_weights.keras', 
                                 save_best_only=True, 
                                 monitor='val_accuracy', 
                                 mode='max', 
                                 verbose=1)

    K_fold_counter = 0
    # Perform stratified K-fold cross-validation
    for train_index, val_index in kf.split(X, y):
        print(f'Model: {model_num + 1}; K-fold: {(K_fold_counter + 1)}')
        K_fold_counter = K_fold_counter + 1
        
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]

        # Train the model
        model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=0, callbacks=[checkpoint])

    
    # After training all folds, load the best weights into the model
    #best_model = create_lstm_classification(input_shape)
    #best_model.load_weights(f'model_{model_num + 1}_best_weights.keras')

    # Append the model to the list of trained models
    #models.append(best_model)

Training model 1/9...
Model: 1; K-fold: 1


  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.55149, saving model to model_1_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.55149

Epoch 3: val_accuracy improved from 0.55149 to 0.55378, saving model to model_1_best_weights.keras

Epoch 4: val_accuracy improved from 0.55378 to 0.56293, saving model to model_1_best_weights.keras

Epoch 5: val_accuracy did not improve from 0.56293

Epoch 6: val_accuracy did not improve from 0.56293

Epoch 7: val_accuracy improved from 0.56293 to 0.56979, saving model to model_1_best_weights.keras

Epoch 8: val_accuracy improved from 0.56979 to 0.57895, saving model to model_1_best_weights.keras

Epoch 9: val_accuracy did not improve from 0.57895

Epoch 10: val_accuracy did not improve from 0.57895
Model: 1; K-fold: 2

Epoch 1: val_accuracy did not improve from 0.57895

Epoch 2: val_accuracy did not improve from 0.57895

Epoch 3: val_accuracy did not improve from 0.57895

Epoch 4: val_accuracy did not improve from 0.57895

Epoch 5: val_

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.55835, saving model to model_2_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.55835

Epoch 3: val_accuracy did not improve from 0.55835

Epoch 4: val_accuracy improved from 0.55835 to 0.56979, saving model to model_2_best_weights.keras

Epoch 5: val_accuracy did not improve from 0.56979

Epoch 6: val_accuracy did not improve from 0.56979

Epoch 7: val_accuracy did not improve from 0.56979

Epoch 8: val_accuracy did not improve from 0.56979

Epoch 9: val_accuracy did not improve from 0.56979

Epoch 10: val_accuracy did not improve from 0.56979
Model: 2; K-fold: 2

Epoch 1: val_accuracy improved from 0.56979 to 0.57895, saving model to model_2_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.57895

Epoch 3: val_accuracy did not improve from 0.57895

Epoch 4: val_accuracy did not improve from 0.57895

Epoch 5: val_accuracy did not improve from 0.57895

Epoch 6: val_accuracy improved from 0.57895 to 0.59954, s

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.46453, saving model to model_3_best_weights.keras

Epoch 2: val_accuracy improved from 0.46453 to 0.55378, saving model to model_3_best_weights.keras

Epoch 3: val_accuracy did not improve from 0.55378

Epoch 4: val_accuracy did not improve from 0.55378

Epoch 5: val_accuracy did not improve from 0.55378

Epoch 6: val_accuracy did not improve from 0.55378

Epoch 7: val_accuracy did not improve from 0.55378

Epoch 8: val_accuracy did not improve from 0.55378

Epoch 9: val_accuracy improved from 0.55378 to 0.55606, saving model to model_3_best_weights.keras

Epoch 10: val_accuracy did not improve from 0.55606
Model: 3; K-fold: 2

Epoch 1: val_accuracy improved from 0.55606 to 0.58352, saving model to model_3_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.58352

Epoch 3: val_accuracy did not improve from 0.58352

Epoch 4: val_accuracy improved from 0.58352 to 0.59039, saving model to model_3_best_weights.keras

Epoch 5: val_

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.57437, saving model to model_4_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.57437

Epoch 3: val_accuracy did not improve from 0.57437

Epoch 4: val_accuracy improved from 0.57437 to 0.58810, saving model to model_4_best_weights.keras

Epoch 5: val_accuracy did not improve from 0.58810

Epoch 6: val_accuracy did not improve from 0.58810

Epoch 7: val_accuracy did not improve from 0.58810

Epoch 8: val_accuracy did not improve from 0.58810

Epoch 9: val_accuracy did not improve from 0.58810

Epoch 10: val_accuracy did not improve from 0.58810
Model: 4; K-fold: 2

Epoch 1: val_accuracy did not improve from 0.58810

Epoch 2: val_accuracy did not improve from 0.58810

Epoch 3: val_accuracy did not improve from 0.58810

Epoch 4: val_accuracy did not improve from 0.58810

Epoch 5: val_accuracy did not improve from 0.58810

Epoch 6: val_accuracy improved from 0.58810 to 0.60641, saving model to model_4_best_weights.keras

Epoch

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.56751, saving model to model_5_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.56751

Epoch 3: val_accuracy did not improve from 0.56751

Epoch 4: val_accuracy did not improve from 0.56751

Epoch 5: val_accuracy did not improve from 0.56751

Epoch 6: val_accuracy did not improve from 0.56751

Epoch 7: val_accuracy did not improve from 0.56751

Epoch 8: val_accuracy did not improve from 0.56751

Epoch 9: val_accuracy did not improve from 0.56751

Epoch 10: val_accuracy did not improve from 0.56751
Model: 5; K-fold: 2

Epoch 1: val_accuracy did not improve from 0.56751

Epoch 2: val_accuracy did not improve from 0.56751

Epoch 3: val_accuracy did not improve from 0.56751

Epoch 4: val_accuracy did not improve from 0.56751

Epoch 5: val_accuracy did not improve from 0.56751

Epoch 6: val_accuracy did not improve from 0.56751

Epoch 7: val_accuracy did not improve from 0.56751

Epoch 8: val_accuracy did not improve from 0.5675

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.50114, saving model to model_6_best_weights.keras

Epoch 2: val_accuracy improved from 0.50114 to 0.54233, saving model to model_6_best_weights.keras

Epoch 3: val_accuracy did not improve from 0.54233

Epoch 4: val_accuracy did not improve from 0.54233

Epoch 5: val_accuracy did not improve from 0.54233

Epoch 6: val_accuracy did not improve from 0.54233

Epoch 7: val_accuracy did not improve from 0.54233

Epoch 8: val_accuracy did not improve from 0.54233

Epoch 9: val_accuracy did not improve from 0.54233

Epoch 10: val_accuracy did not improve from 0.54233
Model: 6; K-fold: 2

Epoch 1: val_accuracy improved from 0.54233 to 0.60412, saving model to model_6_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.60412

Epoch 3: val_accuracy did not improve from 0.60412

Epoch 4: val_accuracy did not improve from 0.60412

Epoch 5: val_accuracy did not improve from 0.60412

Epoch 6: val_accuracy improved from 0.60412 to 0.60870, s

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.54920, saving model to model_7_best_weights.keras

Epoch 2: val_accuracy improved from 0.54920 to 0.55378, saving model to model_7_best_weights.keras

Epoch 3: val_accuracy improved from 0.55378 to 0.57895, saving model to model_7_best_weights.keras

Epoch 4: val_accuracy did not improve from 0.57895

Epoch 5: val_accuracy did not improve from 0.57895

Epoch 6: val_accuracy did not improve from 0.57895

Epoch 7: val_accuracy did not improve from 0.57895

Epoch 8: val_accuracy did not improve from 0.57895

Epoch 9: val_accuracy did not improve from 0.57895

Epoch 10: val_accuracy did not improve from 0.57895
Model: 7; K-fold: 2

Epoch 1: val_accuracy improved from 0.57895 to 0.58810, saving model to model_7_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.58810

Epoch 3: val_accuracy did not improve from 0.58810

Epoch 4: val_accuracy improved from 0.58810 to 0.59039, saving model to model_7_best_weights.keras

Epoch 5: val_

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.53547, saving model to model_8_best_weights.keras

Epoch 2: val_accuracy improved from 0.53547 to 0.54233, saving model to model_8_best_weights.keras

Epoch 3: val_accuracy did not improve from 0.54233

Epoch 4: val_accuracy improved from 0.54233 to 0.55835, saving model to model_8_best_weights.keras

Epoch 5: val_accuracy improved from 0.55835 to 0.58352, saving model to model_8_best_weights.keras

Epoch 6: val_accuracy did not improve from 0.58352

Epoch 7: val_accuracy did not improve from 0.58352

Epoch 8: val_accuracy did not improve from 0.58352

Epoch 9: val_accuracy did not improve from 0.58352

Epoch 10: val_accuracy did not improve from 0.58352
Model: 8; K-fold: 2

Epoch 1: val_accuracy improved from 0.58352 to 0.59725, saving model to model_8_best_weights.keras

Epoch 2: val_accuracy did not improve from 0.59725

Epoch 3: val_accuracy did not improve from 0.59725

Epoch 4: val_accuracy did not improve from 0.59725

Epoch 5: val_

  super().__init__(**kwargs)



Epoch 1: val_accuracy improved from -inf to 0.55606, saving model to model_9_best_weights.keras

Epoch 2: val_accuracy improved from 0.55606 to 0.57437, saving model to model_9_best_weights.keras

Epoch 3: val_accuracy did not improve from 0.57437

Epoch 4: val_accuracy did not improve from 0.57437

Epoch 5: val_accuracy did not improve from 0.57437

Epoch 6: val_accuracy did not improve from 0.57437

Epoch 7: val_accuracy did not improve from 0.57437

Epoch 8: val_accuracy did not improve from 0.57437

Epoch 9: val_accuracy improved from 0.57437 to 0.57895, saving model to model_9_best_weights.keras

Epoch 10: val_accuracy did not improve from 0.57895
Model: 9; K-fold: 2

Epoch 1: val_accuracy did not improve from 0.57895

Epoch 2: val_accuracy did not improve from 0.57895

Epoch 3: val_accuracy did not improve from 0.57895

Epoch 4: val_accuracy did not improve from 0.57895

Epoch 5: val_accuracy did not improve from 0.57895

Epoch 6: val_accuracy did not improve from 0.57895

Epoch

Now that I have trained my 9 models of the same architecture, using the best performing set of independent variables (independent array #15), I will now evaluate the accuracy of the ensemble model on a separate test dataset to assess the effectiveness of this combined approach. This ensemble method aims to leverage the strengths of each individual model and provide a more robust prediction.

I am going to create a test dataset by creating a list of new random values to evaluate the ensemble model on. I will do this by changing the random seed. Previously it was 6, now it is 7. What this does is ensure that the new dataset is generated deterministically with a different sequence of random numbers, providing a new set of test data. This change in the random seed helps evaluate how the ensemble model performs on different variations of the test data, enabling better validation of its generalization capability.

In [22]:
# Specify the number of "Yes" values you want, may show up as less during training due to location of the "Yes" value, as need at least 30 days
#of data for the 30-day sequence, or if the future closing price is not available (only have data to 2/14)
num_yes = 2200

# Create a list of "Yes" and "No" values
yes_no_list = ["Yes"] * num_yes + ["No"] * (len(finance_df) - num_yes)

#set seed for reproducibility; previously this seed was set to 6, so it is going to create an entirely new list of random "Yes" and "No" values
np.random.seed(7) 

# Shuffle the list to randomize the order
np.random.shuffle(yes_no_list)

# Create a new column in the dataframe; we already have a column 'Random_Yes_No_2' which was used to train the occurence of a random pattern
finance_df['Random_Yes_No_3'] = yes_no_list

In [23]:
#Subset data frame for desired pattern
pattern_df = finance_df[finance_df['Random_Yes_No_3'] == "Yes"]

#How many days after the pattern is identified to use for the dependent variable
days_out = 10

#What percent increase from the current price is considered a positive class. For example 1.01 = 1% increase; 100 * 1.01 = 101. So if original
#price is $100, anything greater than $101 is considered a positive class.
pct_increase = 1.01

#Gather independent variables
independent_list15 = []

#gather dependent variables
dependent_list = []

pattern_index = list(pattern_df["Row_index"])
#pattern_index = [60, 62]
for i in pattern_index:
    #if (i == 62):
    #    break
    
    #unable to get 30 days worth of data if index is less than 56, because previously removed first 26 observations
    if (i < 56):
        continue

    #get 30 days worth of data to gather data for indpendent variables
    subset_df = finance_df[(finance_df["Row_index"] >= (i - 29)) & (finance_df["Row_index"] <= (i))]
    #subset_df = finance_df[(finance_df["Row_index"] >= (i - 13)) & (finance_df["Row_index"] <= (i))]
    
    #Get day after data to gather closing price for dependent variable
    dependent_df = finance_df[finance_df["Row_index"] == (i)]
    dependent2_df = finance_df[finance_df["Row_index"] == (i + days_out)]
    
    temp_list15 = []

    #append temp_list to independent_list
    if len(dependent2_df) > 0: #dependent2_df may have length of zero as it is a future date, data may not be available
    

        for index, row in subset_df.iterrows():
                
                test_array15 = np.array([row['Normalized_Open'], row['Normalized_Close'], row['Normalized_High'], row['Normalized_Low'], row['RSI'], row['MFI'], row['MACD'], row['Signal_Line']])
            
                temp_list15.append(test_array15)
                
        independent_list15.append(temp_list15)
    
        if (dependent2_df['Close'].iloc[0] > dependent_df['Close'].iloc[0] * pct_increase):
            dependent_list.append(1)
        else:
            dependent_list.append(0)

independent_array15 = np.array(independent_list15)
dependent_array = np.array(dependent_list)
X = independent_array15
y = dependent_array

After collecting my new set of independent and dependent variables above based on the new random split, I am ready to load my previously trained models and perform ensemble learning using those 9 models as coded below.

In [29]:
from sklearn.model_selection import train_test_split
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

# Assuming you already have your trained models and their weights
models = []

#Define the LSTM classification model; already defined it previously, so don't have to but for testing purposes if I skip chunks, I will include it here
def create_lstm_classification(input_shape):
    model = Sequential()
    
    # LSTM layers
    model.add(LSTM(128, activation='tanh', return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))  # Dropout to reduce overfitting
    
    model.add(LSTM(64, activation='tanh', return_sequences=False))  # Final LSTM layer
    model.add(Dropout(0.2))
    
    # Dense output layer for binary classification
    model.add(Dense(1, activation='sigmoid'))  # Sigmoid for binary classification (probability)
    
    # Compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])  # Binary cross-entropy for classification
    
    return model

#Specify input shape
input_shape = (30, X.shape[2]) 

# Load models and weights (adjust the range to include all your models)
for model_num in range(0, 9):  # Change this range to include more models if necessary
    best_model = create_lstm_classification(input_shape)
    best_model.load_weights(f'model_{model_num + 1}_best_weights.keras')
    models.append(best_model)

# Split data for final training/testing; only using X_test_final for predicting and y_test_final for comparing my predictions
X_train_final, X_test_final, y_train_final, y_test_final = train_test_split(X, y, test_size=0.2, random_state=8)

# Initialize a list to store binary predictions from each model
ensemble_predictions = []

#initialize to get votes from each model
votes = np.zeros(len(X_test_final), dtype=int)

# Generate predictions from each model
for model in models:
    # Get the predictions from the model (probabilities)
    predictions = model.predict(X_test_final)
    
    # Convert predictions to binary (0 or 1)
    binary_predictions = (predictions > 0.5).astype(int)  # Convert to 0 or 1

    counter = 0
    for value in binary_predictions: #for each value in the model's predictions, if the value is predicted 0, then no votes added, if 1, then add 1 vote
        if (value == 0):
            votes[counter] = votes[counter] + 0
        else:
            votes[counter] = votes[counter] + 1
        counter = counter + 1

#because there are 9 different models, if the vote is greater than 5, then it is a positive class
results = []
for vote in votes:
    if (vote >= 5):
        results.append(1)
    else:
        results.append(0)

# Create a pandas Series for comparison
comparison = pd.Series(results == y_test_final)

# Use value_counts() to count True/False occurrences
counts = comparison.value_counts()
num_true = counts.get(True, 0)  # 0 if True is not found
num_false = counts.get(False, 0)  # 0 if False is not found

#Get total amount of observations in training/validation dataset
counts_2 = pd.Series(dependent_array).value_counts()
total_true_labels = counts_2.get(1, 0)  # Will return 0 if no '1' exists
total_false_labels = counts_2.get(0, 0)  # Will return 0 if no '0' exists

# Convert results and y_test_final to numpy arrays for easier handling if they aren't already
results = np.array(results)
y_test_final = np.array(y_test_final)

# Count the number of true (1) and false (0) labels
label_counts = pd.Series(y_test_final).value_counts()

# Get counts of True (1) and False (0) specifically for the test set
total_true_labels_test = label_counts.get(1, 0)  # Will return 0 if no '1' exists
total_false_labels_test = label_counts.get(0, 0)  # Will return 0 if no '0' exists

#compute accuracy and most_frequent_class_test_pct
accuracy = float(num_true) / (len(y_test_final))
if (total_true_labels_test > total_false_labels_test):
    most_frequent_class_test_pct = float(total_true_labels_test) / float(len(y_test_final))
else:
    most_frequent_class_test_pct = float(total_false_labels_test) / float(len(y_test_final))


#print out final results statements
print(f'Number of total observations in entire dependent array dataset: {len(dependent_array)}')

print(f'Number of actual false labels in entire dependent array dataset: {total_false_labels}')
print(f'Number of actual true labels in entire dependent array dataset: {total_true_labels}')

print(f'Number of total observations in test dataset: {len(y_test_final)}')
print(f'Number of incorrect predictions on test dataset: {num_false}')
print(f'Number of correct predictions on test dataset: {num_true}')
print(f'Prediction accuracy on test dataset: {accuracy}')

print(f'Number of actual false labels in test dataset: {total_false_labels_test}')
print(f'Number of actual true labels test dataset: {total_true_labels_test}')
print(f'Majority class label percent in test dataset: {most_frequent_class_test_pct}')

  super().__init__(**kwargs)
  saveable.load_own_variables(weights_store.get(inner_path))


[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 26ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step 
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step 
Number of total observations in entire dependent array dataset: 2188
Number of actual false labels in entire dependent array dataset: 1230
Number of actual true labels in entire dependent array dataset: 958
Number of total observations in test dataset: 438
Number of incorrect predictions on test dataset: 137
Number of corre

The results of using ensemble methods are significant as we can see the accuracy scores. Based on this data sample, if we always predicted that the future closing price would be false, we would be correct 1230 out of 2188 times or 56.22% of the time. However, our model performs better, as our model has 301 correct predictions out of 438 observations or a correct prediction rate of 68.72%. Also, when looking at the actual false labels in the test dataset, which is the majority class, we see 256 observations; 256 out of 438 is 58.45%. Our model clearly performs better when compared against the actual labels in the entire dataset and when compared against the actual labels in the test dataset.

The results are significant, as we can outperform the expected market outcome (which assumes predicting the future price as false) by around 10% with this parameter combination when comparing against the actual values of the test set labels. This demonstrates the effectiveness of the model in identifying patterns and making predictions that exceed a baseline strategy of predicting no price increase of greater than 1% over the next ten days (which was the parameter combination I used for this ensemble model (pct_increase=1.01)).

I have demonstrated the implementation of the ensemble learning method (with stratified 5-fold cross-validation used to train these models) using the following parameters: The dependent variable represents the closing price 10 days in the future. The label for the dependent variable is assigned as 0 (false) if the future closing price is less than or equal to a 1% increase from the closing price of the last identified candle. It is labeled as 1 (true) if the future closing price is greater than a 1% increase from the closing price of the last identified candle.

I will run this implementation multiple times through multiple parameter combinations using independent array #15 as mentioned before, I found it to be the best performing combination of independent variables. I will run this separately outside of this document, for the submission for the next part of this project. 

# Reporting - *E*

To restate my research question:

_Using historical market stock prices and technical indicators (RSI, MACD, MFI), how accurately can a neural network model, specifically an LSTM-based Recurrent Neural Network, predict stock price movements after the occurrence of a bullish candlestick pattern (e.g., 1 day, 3 days, 5 days, 10 days, and 15 days afterwards)? Additionally, how does the performance differ when using binary classification versus regression for price prediction?_

To directly answer the first question, my LSTM-based Recurrent Neural Network can accurately predict stock price movements following the occurrence of a bullish candlestick pattern. Specifically, it performs about 9-10% better in predicting the outcome compared to when a random occurrence is used. By "random occurrence," I mean a randomly generated set of 30-day sequences that do not depend on the presence of a candlestick pattern on the 30th day of the sequence.

To directly answer the second question, the performance is more consistent and predictable while using a classification model in predicting price fluctuations.

Below, I will answer in high detail the first and second part of the research question and how I came to each conclusion.


#### Answering the second part of my research question

To answer the second part of my research question, I conclude that the LSTM classification model was the better model overall for my purposes of this research project. The reason for this is as follows:

When comparing the results from the classification and regression models, it’s easier to identify the most effective independent variables for the classification model. It seems that the top-performing independent variables for the regression models were somewhat random. This is because, for the classification model, independent array #15 yielded the best results, showing the highest average and maximum accuracy scores across different combinations of model parameters.

These model parameters were used to define the dependent variable. Specifically, I had two key parameters when training the models: the number of days ahead from the last closing price in the 30-day sequence that I am predicting, and the percentage change from the last closing price of the 30-day sequence (e.g., whether the closing price on the specified future day is greater than, 1% higher than, or 2% higher than the last closing price on the 30th day of the sequence).

The classification results indicate that when the candlestick pattern is set to "Random" (**I am using the "Random" pattern because the actual candlestick patterns have too small a data size**) — meaning that random 30-day sequences are generated without relying on a specific candlestick pattern — independent array #15 produces the best outcomes, achieving the top spot 7 times. Independent array #14 follows closely in second place, with 6 occurrences at the top for yielding the highest average of maximum and average accuracy scores. I selected the candlestick pattern to "Random" because I wanted to analyze a larger sample size. Both arrays use all available independent variables (open, close, low, high, RSI, MFI, MACD, Signal Line), but there is a difference in how they process the data. Independent array #14 applies a log transformation to the open, close, low, and high prices, while independent array #15 uses Sklearn’s scaler function to normalize these prices. The classification results are as shown below:

* The classification results table is self-explanatory, but to clarify:
    + "Most Frequent Class": This represents the class (negative or positive) that occurs more frequently within the entire dependent variable dataset.
    + "Highest Frequency": This is calculated as the ratio of the "Most Frequent Class" frequency to the "Total Observations" (i.e., Most Frequent Class / Total Observations). This metric is used to evaluate whether the model's accuracy scores are better than simply predicting the most frequent class.
    + "Weighted Score": The weighted score is calculated as the average of the "Best Accuracy" and "Avg Accuracy" (i.e., (Best Accuracy + Avg Accuracy) / 2). The purpose of this metric is to account for the possibility that the "Best Accuracy" might be an outlier or lucky result, and by combining it with the "Avg Accuracy," we get a more balanced measure.

| Ticker | Pattern | Independent Array | Best Accuracy | Avg Accuracy | Days Out | Total Observations | Negative Observations | Positive Observations | Percent Increase Parameter | Most Frequent Class | Highest Frequency | Weighted Score |
|--------|---------|-------------------|---------------|--------------|----------|--------------------|-----------------------|------------------------|---------------------------|---------------------|-------------------|----------------|
| SPY    | Random  | independent_array12 | 0.850000024   | 0.586346157  | 1        | 199                | 89                    | 110                    | 1                         | 110                 | 0.552763819       | 0.71817309     |
| SPY    | Random  | independent_array15 | 0.899999976   | 0.832910252  | 1        | 199                | 165                   | 34                     | 1.01                      | 165                 | 0.829145729       | 0.866455114    |
| SPY    | Random  | independent_array1  | 0.974358976   | 0.954871786  | 1        | 199                | 190                   | 9                      | 1.02                      | 190                 | 0.954773869       | 0.964615381    |
| SPY    | Random  | independent_array14 | 0.824999988   | 0.614487181  | 3        | 199                | 89                    | 110                    | 1                         | 110                 | 0.552763819       | 0.719743585    |
| SPY    | Random  | independent_array14 | 0.820512831   | 0.699025648  | 3        | 199                | 138                   | 61                     | 1.01                      | 138                 | 0.693467337       | 0.759769239    |
| SPY    | Random  | independent_array15 | 0.925000012   | 0.890512816  | 3        | 199                | 176                   | 23                     | 1.02                      | 176                 | 0.884422111       | 0.907756414    |
| SPY    | Random  | independent_array15 | 0.824999988   | 0.647256423  | 5        | 199                | 76                    | 123                    | 1                         | 123                 | 0.618090452       | 0.736128206    |
| SPY    | Random  | independent_array15 | 0.794871807   | 0.651641024  | 5        | 199                | 122                   | 77                     | 1.01                      | 122                 | 0.613065327       | 0.723256416    |
| SPY    | Random  | independent_array14 | 0.948717952   | 0.852961555  | 5        | 199                | 169                   | 30                     | 1.02                      | 169                 | 0.849246231       | 0.900839753    |
| SPY    | Random  | independent_array15 | 0.850000024   | 0.662666665  | 10       | 199                | 71                    | 128                    | 1                         | 128                 | 0.64321608        | 0.756333345    |
| SPY    | Random  | independent_array14 | 0.800000012   | 0.632756413  | 10       | 199                | 105                   | 94                     | 1.01                      | 105                 | 0.527638191       | 0.716378213    |
| SPY    | Random  | independent_array15 | 0.820512831   | 0.739974356  | 10       | 199                | 138                   | 61                     | 1.02                      | 138                 | 0.693467337       | 0.780243593    |
| SPY    | Random  | independent_array15 | 0.820512831   | 0.64825641   | 15       | 199                | 73                    | 126                    | 1                         | 126                 | 0.633165829       | 0.734384621    |
| SPY    | Random  | independent_array14 | 0.800000012   | 0.613717952  | 15       | 199                | 100                   | 99                     | 1.01                      | 100                 | 0.502512563       | 0.706858982    |
| SPY    | Random  | independent_array14 | 0.846153855   | 0.680448722  | 15       | 199                | 127                   | 72                     | 1.02                      | 127                 | 0.638190955       | 0.763301288    |


<br>

The regression results indicate that when the candlestick pattern is set to "Random" — meaning that random 30-day sequences are generated without relying on a specific candlestick pattern — independent array #1 produces the best outcomes, achieving the top spot 5 times. This is interesting because independent array #1 solely includes price action (open, close, high, low) independent variables. These variables are also not normalized. The classification results are as shown below:

* The regression results table is as follows:

| Ticker | Pattern | Independent Array   | Best Accuracy | Avg Accuracy | Days Out | Total Observations | Negative Observations | Positive Observations | Percent Increase Parameter | Most Frequent Class | Highest Frequency | Weighted Score |
|--------|---------|----------------------|---------------|--------------|----------|--------------------|-----------------------|------------------------|----------------------------|---------------------|-------------------|----------------|
| SPY    | Random  | independent_array15   | 0.846153846   | 0.619230769  | 1        | 199                | 89                    | 110                    | 1                          | 110                 | 0.552763819       | 0.732692308    |
| SPY    | Random  | independent_array1    | 0.925         | 0.828461538  | 1        | 199                | 165                   | 34                     | 1.01                       | 165                 | 0.829145729       | 0.876730769    |
| SPY    | Random  | independent_array1    | 1.0           | 0.954487179  | 1        | 199                | 190                   | 9                      | 1.02                       | 190                 | 0.954773869       | 0.97724359     |
| SPY    | Random  | independent_array3    | 0.675         | 0.542564103  | 3        | 199                | 89                    | 110                    | 1                          | 110                 | 0.552763819       | 0.608782051    |
| SPY    | Random  | independent_array11   | 0.923076923   | 0.709615385  | 3        | 199                | 138                   | 61                     | 1.01                       | 138                 | 0.693467337       | 0.816346154    |
| SPY    | Random  | independent_array1    | 0.95          | 0.884102564  | 3        | 199                | 176                   | 23                     | 1.02                       | 176                 | 0.884422111       | 0.917051282    |
| SPY    | Random  | independent_array6    | 0.775         | 0.582692308  | 5        | 199                | 76                    | 123                    | 1                          | 123                 | 0.618090452       | 0.678846154    |
| SPY    | Random  | independent_array1    | 0.7           | 0.613076923  | 5        | 199                | 122                   | 77                     | 1.01                       | 122                 | 0.613065327       | 0.656538462    |
| SPY    | Random  | independent_array1    | 0.925         | 0.849102564  | 5        | 199                | 169                   | 30                     | 1.02                       | 169                 | 0.849246231       | 0.887051282    |
| SPY    | Random  | independent_array3    | 0.7           | 0.607948718  | 10       | 199                | 71                    | 128                    | 1                          | 128                 | 0.64321608        | 0.653974359    |
| SPY    | Random  | independent_array14   | 0.775         | 0.558461538  | 10       | 199                | 105                   | 94                     | 1.01                       | 105                 | 0.527638191       | 0.666730769    |
| SPY    | Random  | independent_array2    | 0.948717949   | 0.74474359   | 10       | 199                | 138                   | 61                     | 1.02                       | 138                 | 0.693467337       | 0.846730769    |
| SPY    | Random  | independent_array6    | 0.675         | 0.573205128  | 15       | 199                | 73                    | 126                    | 1                          | 126                 | 0.633165829       | 0.624102564    |
| SPY    | Random  | independent_array2    | 0.95          | 0.643846154  | 15       | 199                | 100                   | 99                     | 1.01                       | 100                 | 0.502512563       | 0.796923077    |
| SPY    | Random  | independent_array11   | 0.95          | 0.794487179  | 15       | 199                | 127                   | 72                     | 1.02                       | 127                 | 0.638190955       | 0.87224359     |

<br>

Looking at the results, it seems that both the classification and regression models outperform simply predicting the most frequently occurring class. To determine which model performs better—classification or regression—I will calculate the difference between the 'Weighted Score' and 'Highest Frequency' for each entry in the table, and then sum these differences across all observations. The model with the highest total of these differences will be considered the best performing, as it indicates that the model is yielding a greater improvement over the baseline of predicting the most frequent class, effectively demonstrating stronger overall performance.

| Classification_Highest_frequency | Classification_Weighted_score | Classification_Differences |
|----------------------------------|-------------------------------|----------------------------|
| 0.552763819                      | 0.71817309                    | 0.165409271                |
| 0.829145729                      | 0.866455114                   | 0.037309385                |
| 0.954773869                      | 0.964615381                   | 0.009841511                |
| 0.552763819                      | 0.719743585                   | 0.166979766                |
| 0.693467337                      | 0.759769239                   | 0.066301903                |
| 0.884422111                      | 0.907756414                   | 0.023334303                |
| 0.618090452                      | 0.736128206                   | 0.118037753                |
| 0.613065327                      | 0.723256416                   | 0.110191089                |
| 0.849246231                      | 0.900839753                   | 0.051593522                |
| 0.64321608                       | 0.756333345                   | 0.113117264                |
| 0.527638191                      | 0.716378213                   | 0.188740022                |
| 0.693467337                      | 0.780243593                   | 0.086776257                |
| 0.633165829                      | 0.734384621                   | 0.101218792                |
| 0.502512563                      | 0.706858982                   | 0.204346419                |
| 0.638190955                      | 0.763301288                   | 0.125110334                |

* The resulting sum of the differences for the classification table is: 1.568307591
<br>

| Regression_Highest_frequency | Regression_Weighted_score | Regression_Differences |
|------------------------------|---------------------------|------------------------|
| 0.552763819                   | 0.732692308               | 0.179928489            |
| 0.829145729                   | 0.876730769               | 0.047585041            |
| 0.954773869                   | 0.97724359                | 0.02246972             |
| 0.552763819                   | 0.608782051               | 0.056018232            |
| 0.693467337                   | 0.816346154               | 0.122878817            |
| 0.884422111                   | 0.917051282               | 0.032629171            |
| 0.618090452                   | 0.678846154               | 0.060755702            |
| 0.613065327                   | 0.656538462               | 0.043473135            |
| 0.849246231                   | 0.887051282               | 0.037805051            |
| 0.64321608                    | 0.653974359               | 0.010758279            |
| 0.527638191                   | 0.666730769               | 0.139092578            |
| 0.693467337                   | 0.846730769               | 0.153263433            |
| 0.633165829                   | 0.624102564               | -0.009063265           |
| 0.502512563                   | 0.796923077               | 0.294410514            |
| 0.638190955                   | 0.87224359                | 0.234052635            |

* The resulting sum of the differences for the regression table is: 1.426057531
<br>

To conclude, the classification model emerged as the best performing model overall. However, it is important to acknowledge that the regression model outperformed the classification model in several specific parameter combinations. Despite this, I believe the classification model is the superior choice for this task due to its greater consistency. The best performing independent variables for the classification model consistently yield high accuracy scores (independent array #15), regardless of the parameter combinations (e.g., days out and percent increase). 

In contrast, it is more challenging to identify which independent variables in the regression model contribute to its accuracy, as the performance tends to fluctuate more significantly based on different parameter settings. This variability in the regression model’s performance makes it less predictable and harder to interpret, whereas the classification model’s stability provides a more reliable foundation for decision-making.

**Lastly, and perhaps most importantly, while we are not analyzing a specific candlestick pattern here, but instead working with a randomly generated occurrence that simulates the presence of random 30-day sequences, we can clearly observe that our trained models—whether for classification or regression—outperform the "Highest Frequency" column. This metric is used to evaluate the accuracy of predicting the most frequent class. When compared to the "Weighted Score" column—representing the accuracy score produced by our model—we consistently find that the "Weighted Score" is higher in nearly all cases (and sometimes a lot more), regardless of the parameter combinations used. This shows us the efficacy of our model.**




#### Answering the first part of my research question

Since I have decided that classification was the better performing model with independent array #15 being the best combination of indepdendent variables, I am going to use that model with that set of independent variables to answer this question. I have decided to not use the bullish engulfing, bullish harami, and three white soldiers patterns for this analysis because of their small sample size. Although the hammer and inverted patterns had small sample sizes as well, they were a bit larger of a sample size than the other three.


Below I am going to compare all three results: for a randomly generated occurrence, the hammer pattern occurrence, and the inverted hammer occurrence.

* Random Occurrence

| Ticker | Pattern | Independent Array    | Best Accuracy | Avg Accuracy  | Days Out | Total Observations | Negative Observations | Positive Observations | Percent Increase Parameter | Most Frequent Class | Classification Highest Frequency | Classification Weighted Score |
|--------|---------|----------------------|---------------|---------------|----------|--------------------|-----------------------|-----------------------|----------------------------|---------------------|---------------------------------|------------------------------|
| SPY    | Random  | independent_array15   | 0.75          | 0.569756417   | 1        | 199                | 89                    | 110                   | 1                          | 110                 | 0.552763819                      | 0.659878208                 |
| SPY    | Random  | independent_array15   | 0.794871807   | 0.59938462    | 3        | 199                | 89                    | 110                   | 1                          | 110                 | 0.552763819                      | 0.697128213                 |
| SPY    | Random  | independent_array15   | 0.824999988   | 0.647256423   | 5        | 199                | 76                    | 123                   | 1                          | 123                 | 0.618090452                      | 0.736128206                 |
| SPY    | Random  | independent_array15   | 0.850000024   | 0.662666665   | 10       | 199                | 71                    | 128                   | 1                          | 128                 | 0.64321608                       | 0.756333345                 |
| SPY    | Random  | independent_array15   | 0.820512831   | 0.64825641    | 15       | 199                | 73                    | 126                   | 1                          | 126                 | 0.633165829                      | 0.734384621                 |
| SPY    | Random  | independent_array15   | 0.899999976   | 0.832910252   | 1        | 199                | 165                   | 34                    | 1.01                       | 165                 | 0.829145729                      | 0.866455114                 |
| SPY    | Random  | independent_array15   | 0.794871807   | 0.708025651   | 3        | 199                | 138                   | 61                    | 1.01                       | 138                 | 0.693467337                      | 0.751448729                 |
| SPY    | Random  | independent_array15   | 0.794871807   | 0.651641024   | 5        | 199                | 122                   | 77                    | 1.01                       | 122                 | 0.613065327                      | 0.723256416                 |
| SPY    | Random  | independent_array15   | 0.774999976   | 0.635089748   | 10       | 199                | 105                   | 94                    | 1.01                       | 105                 | 0.527638191                      | 0.705044862                 |
| SPY    | Random  | independent_array15   | 0.800000012   | 0.604820516   | 15       | 199                | 100                   | 99                    | 1.01                       | 100                 | 0.502512563                      | 0.702410264                 |
| SPY    | Random  | independent_array15   | 0.974358976   | 0.954371786   | 1        | 199                | 190                   | 9                     | 1.02                       | 190                 | 0.954773869                      | 0.964365381                 |
| SPY    | Random  | independent_array15   | 0.925000012   | 0.890512816   | 3        | 199                | 176                   | 23                    | 1.02                       | 176                 | 0.884422111                      | 0.907756414                 |
| SPY    | Random  | independent_array15   | 0.899999976   | 0.859897449   | 5        | 199                | 169                   | 30                    | 1.02                       | 169                 | 0.849246231                      | 0.879948713                 |
| SPY    | Random  | independent_array15   | 0.820512831   | 0.739974356   | 10       | 199                | 138                   | 61                    | 1.02                       | 138                 | 0.693467337                      | 0.780243593                 |
| SPY    | Random  | independent_array15   | 0.794871807   | 0.670179485   | 15       | 199                | 127                   | 72                    | 1.02                       | 127                 | 0.638190955                      | 0.732525646                 |


* Hammer Pattern Occurrence

| Ticker | Pattern | Independent Array    | Best Accuracy | Avg Accuracy   | Days Out | Total Observations | Negative Observations | Positive Observations | Percent Increase Parameter | Most Frequent Class | Classification Highest Frequency | Classification Weighted Score |
|--------|---------|----------------------|---------------|----------------|----------|--------------------|-----------------------|-----------------------|----------------------------|---------------------|---------------------------------|------------------------------|
| SPY    | Hammer  | independent_array15   | 1             | 0.733736265    | 1        | 69                 | 37                    | 32                    | 1                          | 37                  | 0.536231884                      | 0.866868132                 |
| SPY    | Hammer  | independent_array15   | 0.923076928   | 0.659780228    | 3        | 69                 | 25                    | 44                    | 1                          | 44                  | 0.637681159                      | 0.791428578                 |
| SPY    | Hammer  | independent_array15   | 0.928571403   | 0.742307696    | 5        | 69                 | 25                    | 44                    | 1                          | 44                  | 0.637681159                      | 0.83543955                  |
| SPY    | Hammer  | independent_array15   | 0.923076928   | 0.685164844    | 10       | 68                 | 27                    | 41                    | 1                          | 41                  | 0.602941176                      | 0.804120886                 |
| SPY    | Hammer  | independent_array15   | 0.857142866   | 0.707362645    | 15       | 68                 | 31                    | 37                    | 1                          | 37                  | 0.544117647                      | 0.782252755                 |
| SPY    | Hammer  | independent_array15   | 1             | 0.868571429    | 1        | 69                 | 57                    | 12                    | 1.01                       | 57                  | 0.826086957                      | 0.934285715                 |
| SPY    | Hammer  | independent_array15   | 0.785714269   | 0.669450558    | 3        | 69                 | 43                    | 26                    | 1.01                       | 43                  | 0.623188406                      | 0.727582414                 |
| SPY    | Hammer  | independent_array15   | 1             | 0.640659345    | 5        | 69                 | 37                    | 32                    | 1.01                       | 37                  | 0.536231884                      | 0.820329673                 |
| SPY    | Hammer  | independent_array15   | 0.928571403   | 0.700769237    | 10       | 68                 | 35                    | 33                    | 1.01                       | 35                  | 0.514705882                      | 0.81467032                  |
| SPY    | Hammer  | independent_array15   | 1             | 0.742417589    | 15       | 68                 | 36                    | 32                    | 1.01                       | 36                  | 0.529411765                      | 0.871208794                 |
| SPY    | Hammer  | independent_array15   | 1             | 0.957142842    | 1        | 69                 | 66                    | 3                     | 1.02                       | 66                  | 0.956521739                      | 0.978571421                 |
| SPY    | Hammer  | independent_array15   | 1             | 0.858351647    | 3        | 69                 | 58                    | 11                    | 1.02                       | 58                  | 0.84057971                       | 0.929175823                 |
| SPY    | Hammer  | independent_array15   | 0.785714269   | 0.728021987    | 5        | 69                 | 51                    | 18                    | 1.02                       | 51                  | 0.739130435                      | 0.756868128                 |
| SPY    | Hammer  | independent_array15   | 1             | 0.736593409    | 10       | 68                 | 46                    | 22                    | 1.02                       | 46                  | 0.676470588                      | 0.868296704                 |
| SPY    | Hammer  | independent_array15   | 1             | 0.67626375     | 15       | 68                 | 41                    | 27                    | 1.02                       | 41                  | 0.602941176                      | 0.838131875                 |


* Inverted Hammer Occurrence

| Ticker | Pattern       | Independent Array   | Best Accuracy | Avg Accuracy   | Days Out | Total Observations | Negative Observations | Positive Observations | Percent Increase Parameter | Most Frequent Class | Classification Highest Frequency | Classification Weighted Score |
|--------|---------------|---------------------|---------------|----------------|----------|--------------------|-----------------------|-----------------------|----------------------------|---------------------|---------------------------------|------------------------------|
| SPY    | InvertedHammer| independent_array15  | 0.818181813   | 0.656545464    | 1        | 52                 | 22                    | 30                    | 1                          | 30                  | 0.576923077                      | 0.737363639                 |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.705818185    | 3        | 52                 | 19                    | 33                    | 1                          | 33                  | 0.634615385                      | 0.802909081                 |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.707454543    | 5        | 52                 | 18                    | 34                    | 1                          | 34                  | 0.653846154                      | 0.803727259                 |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.672545449    | 10       | 52                 | 18                    | 34                    | 1                          | 34                  | 0.653846154                      | 0.786272712                 |
| SPY    | InvertedHammer| independent_array15  | 0.800000012   | 0.69054545     | 15       | 52                 | 19                    | 33                    | 1                          | 33                  | 0.634615385                      | 0.745272731                 |
| SPY    | InvertedHammer| independent_array15  | 1             | 0.850181819    | 1        | 52                 | 40                    | 12                    | 1.01                       | 40                  | 0.769230769                      | 0.92509091                  |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.656181821    | 3        | 52                 | 28                    | 24                    | 1.01                       | 28                  | 0.538461538                      | 0.778090898                 |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.688181824    | 5        | 52                 | 26                    | 26                    | 1.01                       | 26                  | 0.5                             | 0.7940909                   |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.618181826    | 10       | 52                 | 21                    | 31                    | 1.01                       | 31                  | 0.596153846                      | 0.759090901                 |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.649636369    | 15       | 52                 | 22                    | 30                    | 1.01                       | 30                  | 0.576923077                      | 0.774818172                 |
| SPY    | InvertedHammer| independent_array15  | 1             | 0.941818187    | 1        | 52                 | 47                    | 5                     | 1.02                       | 47                  | 0.903846154                      | 0.970909094                 |
| SPY    | InvertedHammer| independent_array15  | 1             | 0.75036364     | 3        | 52                 | 39                    | 13                    | 1.02                       | 39                  | 0.75                            | 0.87518182                  |
| SPY    | InvertedHammer| independent_array15  | 1             | 0.726363633    | 5        | 52                 | 35                    | 17                    | 1.02                       | 35                  | 0.673076923                      | 0.863181816                 |
| SPY    | InvertedHammer| independent_array15  | 0.899999976   | 0.605454546    | 10       | 52                 | 26                    | 26                    | 1.02                       | 26                  | 0.5                             | 0.752727261                 |
| SPY    | InvertedHammer| independent_array15  | 1             | 0.677272734    | 15       | 52                 | 26                    | 26                    | 1.02                       | 26                  | 0.5                             | 0.838636367                 |



Based on the tables above, it appears that the model returns higher accuracy scores when trained on candlestick patterns. For example, for each table if you take the sum of the "Classification Highest Frequency" and the "Classification Weighted Score" columns and subtract them this will give us the distance of our weighted accuracy to predicting the majority class. For each of these tables, the values are as follows:

* Random Occurrence: SUM(Classification_Weighted_score) - SUM(Classification_Highest_frequency) = 1.411378076
* Hammer Occurrence: SUM(Classification_Weighted_score) - SUM(Classification_Highest_frequency) = 2.815309199
* Inverted Hammer Occurrence: SUM(Classification_Weighted_score) - SUM(Classification_Highest_frequency) = 2.7458251

Now, if I take each of these values and divide them by 15 (which is the total number of parameter combinations), I get the average weighted average accuracy scores across all 15 parameter combinations.

* Random Occurrence: 1.411378076 / 15 = 0.09409187173
* Hammer Occurrence: 2.815309199 / 15 = 0.18768727993
* Inverted Hammer Occurrence: 2.7458251 / 15 = 0.18305500666

**This shows that, for each parameter combination where a true candlestick pattern is present, the model performs approximately 9-10% better in predicting the outcome compared to when a random occurrence is used.**

It is possible that because the total number of observations for random occurrences was set higher than for the hammer and inverted hammer patterns (which are fixed values), the model may have struggled to learn the patterns effectively due to the larger data set for random occurrences. However, upon reviewing the results, it seems that when a true candlestick pattern is present, the model is able to predict the future closing price with significantly higher accuracy compared to when a candlestick pattern is not present or not likely to be present (which represents the randomly generated occurrence).



#### Building off my research question

Because there was a small sample size for my true candlestick patterns, I was a little disappointed. The reason being is that I wanted to deploy my model for real-world scenarios, especially for swing trading, where accurate predictions based on candlestick patterns could lead to more informed and timely trading decisions. However, as I have shown although I have received high accuracy scores, the frequency for these patterns are quite rare, occurring once every hundred days, or about a 1% occurrence. 

Later in this document, I increased the number of randomly generated sequences in order to make my model more adaptable for use on any given day. The goal was to train the model on over 2,000 randomly generated sequences, hoping that even when a true candlestick pattern is not present, the model could still make reliable predictions for future closing prices.

Through this approach, I found that by increasing the sample size for training, using stratified 5-fold cross-validation, and incorporating ensemble learning methods, my model was able to deliver reliable results. These results were significantly more accurate than simply predicting the majority class. For these predictions, I used a single parameter combination: the closing price 10 days out, with a price increase greater than 1% on that day to be considered a positive class. I used independent array #15 as the basis for my independent variables which has 8 different features (Normalized_close, Normalized_open, Normalized_low, Normalized_high, RSI, MFI, MACD, Signal Line).

The results were as follows:

* Number of total observations in entire dependent array dataset: 2188
* Number of actual false labels in entire dependent array dataset: 1230
* Number of actual true labels in entire dependent array dataset: 958
* Number of total observations in test dataset: 438
* Number of incorrect predictions on test dataset: 137
* Number of correct predictions on test dataset: 301
* Prediction accuracy on test dataset: 68.72%
* Number of actual false labels in test dataset: 256
* Number of actual true labels test dataset: 182
* Majority class label percent in test dataset: 58.45%

The results of using ensemble methods are significant as we can see the accuracy scores. Based on this data sample, if we always predicted that the future closing price would be false, we would be correct 1230 out of 2188 times or 56.22% of the time. However, our model performs better, as our model has 301 correct predictions out of 438 observations or a correct prediction rate of 68.72%. Also, when looking at the actual false labels in the test dataset, which is the majority class, we see 256 observations; 256 out of 438 is 58.45%. Our model clearly performs better when compared against the actual labels in the entire dependent array (dataset used for training/validation) dataset and when compared against the actual labels in the test dataset.

The results are significant, as we can outperform the expected market outcome (which assumes predicting the future price as false) by around 10% with this parameter combination when comparing against the actual values of the test set labels. This demonstrates the effectiveness of the model in identifying patterns and making predictions that exceed a baseline strategy of predicting no price increase of greater than 1% over the next ten days (which was the parameter combination I used for this ensemble model (pct_increase=1.01)).

I have demonstrated the implementation of the ensemble learning method (with stratified 5-fold cross-validation used to train these models) using the following parameters: The dependent variable represents the closing price 10 days in the future. The label for the dependent variable is assigned as 0 (false) if the future closing price is less than or equal to a 1% increase from the closing price of the last identified candle. It is labeled as 1 (true) if the future closing price is greater than a 1% increase from the closing price of the last identified candle.

I will run this implementation multiple times through multiple parameter combinations using independent array #15 as mentioned before, I found it to be the best performing combination of independent variables. I will run this separately outside of this document, for the submission for the next part of this project. Specifically, I will build a web application which will allow a user to easily implement this ensemble model, using the same independent variables as found on independent array #15.

So, to answer the last part on the WGU grading rubric in which I am to propose a directions or approach for future study of the data set. The main thing I want to do is train an ensemble model for other stock tickers rather than just 'SPY' as shown in this project, and run those models on different parameter combinations as well to test the results. Again, this will be accomplished by creating a web application which allows me and any user to easily implement a trained model to predict stock price, for any stock ticker, for any parameter combination that is requested.


# Sources

Fidelity. (n.d.). RSI: Relative strength index. Fidelity. Retrieved December 26, 2024, from https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/RSI

Wilder, J. W. (1978). New concepts in technical trading systems. Trend Research.

Fidelity. (n.d.). Money Flow Index (MFI). Fidelity. Retrieved December 26, 2024, from https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/mfi

Fidelity. (n.d.). MACD: Moving average convergence divergence. Fidelity. Retrieved December 26, 2024, from https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/macd