![Image](./Images/stock_market_prediction.jpeg)

# Stock Market Prediction Model

The stock market plays a crucial role in the global economy, influencing investment strategies, economic policies, and financial stability. With the increasing complexity and volume of financial data, accurate stock market predictions have become more critical than ever. According to the World Bank, global stock market capitalization has seen significant growth, emphasizing the importance of effective market analysis and prediction systems.

Despite advancements in data analytics and machine learning, many stock market platforms still lack robust predictive models that can help investors make informed decisions. In this project, we aim to develop a comprehensive stock market prediction system that leverages historical data and advanced machine learning techniques to forecast stock prices accurately.

## 1. Data

The dataset for this project was downloaded from Kaggle and has been filtered and cleaned to only include 4 beauty and wellness stock data. The primary goal is to develop a predictive model that analyzes vast amounts of beauty and wellness stock data and reacts to market changes much faster than humans, potentially leading to improved trading performance. However, it's important to note that predicting stock prices is inherently challenging due to the complexity and randomness of financial markets. While stock predictor models can provide valuable insights, they are not guaranteed to accurately forecast future prices. It's essential for investors to consider various factors, including market conditions, economic indicators, and company fundamentals, when making investment decisions.

Kaggle Dataset link: https://www.kaggle.com/datasets/footballjoe789/us-stock-dataset/data?select=Data

**Stock Selections**
We selected the following 4 beauty and wellness stocks (listed below) that we analyzed.

	#	Stock Name/Ref
	1	The Estée Lauder Companies Inc. (EL)
	2	Ulta Beauty, Inc. (ULTA)
	3	COTY (COTY)
	4	e.l.f. Beauty, Inc. (ELF)

**Stock Selections Reasoning**

1) These four companies collectively represent a broad spectrum of the beauty and wellness market. ULTA is a leading retailer, EL is a renowned luxury brand, ELF targets affordable beauty products, and COTY encompasses both luxury and mass-market segments. This diversity ensures a comprehensive analysis of the beauty industry.
2) Each of these companies is a significant player in the beauty industry, making them highly relevant for analysis. Their performance can serve as a proxy for broader market trends within the beauty and wellness sector.

## 2. Data Cleaning

[Data Cleaning Notebook](./1.%20Stock_Predictor_Data_Wrangling.ipynb)

In developing an effective stock market prediction model, it is essential to ensure that the data used for training and evaluation is clean and well-structured. This process involves addressing issues such as missing values, outliers, and inconsistent data entries. Here, we outline the steps taken to clean and prepare the dataset for our machine learning models.

**Problem 1: Data Filtering**
- **Issue:** The datasaet contained thousands of stock symbols which woudl make codnucting this project too broad and difficult.

- **Solution:** To address this, the project was filtered to only incldue 4 beauty and wellness stock symbols.

**Problem 2: Missing Values**
- **Issue:** The dataset contained missing values in several columns, including Open, High, Low, Close, and Volume. Missing data can lead to inaccuracies in model predictions and skew the results.

- **Solution:** To handle missing values, we employed the following strategies:

- **Forward Fill and Backward Fill:** For time-series data, we used forward fill and backward fill methods to propagate the last valid observation forward or backward.

- **Mean/Median Imputation:** For other numeric columns, we filled missing values with the mean or median of the respective column.

## 3. EDA

[EDA Notebook](./2.%20Stock_Predictor_Model_EDA.ipynb)

Exploratory Data Analysis (EDA) is a critical step in understanding the structure and nuances of the dataset used for stock market prediction. This section details the insights gained from the EDA, highlighting key patterns, trends, and relationships within the data.

- **Distribution of Closing Prices:** The distribution of closing prices for various stocks was examined. The time series plot effectively highlights the historical performance and trends of the four stocks over the period from 1996 to 2024. The data indicates significant growth for ULTA, ELF, and EL, especially post-2009, with recent years showing increased volatility likely due to global economic events. In contrast, COTY's stock price has remained relatively stable with minor fluctuations.

![EDA Visualization](./Images/EDA_1.png)

## 4. Modeling

[Pre-Processing Notebook](./3.%20Stock_Predictor_Model_Pre_Processing.ipynb)

[Modeling Notebook](./4.%20Stock_Predictor_Model-Modeling.ipynb)

LSTM (Long Short-Term Memory) networks are a type of recurrent neural network (RNN) capable of learning long-term dependencies, making them suitable for time series forecasting. Here, we delve into the mathematical foundation and rationale behind using LSTM for stock price prediction.

### Why LSTM?
Rationale for Using LSTM:

**Capturing Long-Term Dependencies:** Stock prices are influenced by long-term trends and patterns, which LSTM networks are well-equipped to capture.
**Handling Sequential Data:** LSTM networks are designed to process and predict time series data, making them ideal for stock price prediction.
**Avoiding Short-Term Memory Issues:** Traditional RNNs suffer from the vanishing gradient problem, making them less effective for long sequences. LSTMs overcome this issue through their gating mechanisms.

### The Three Gates in LSTM
Imagine an LSTM unit as a smart memory box that decides what to remember and what to forget. This box has three gates, which act like filters or doors, controlling the flow of information. These gates are:

Forget Gate - Cleans up old, unneeded information
Input Gate - Decides what new information to add
Output Gate - Chooses what part of the current memory to output

**1. Forget Gate**
Purpose: Decides what information to throw away from the memory.

How it works: It looks at the starting hidden state (Short-Term Memory) and the current input and it outputs a number between 0 and 1 using the **Sigmoid Activiation Function** for each piece of information in the cell state. The **cell state** represents the **Long-Term Memory**. The Forget Gate essentially determines what percentage of the Long-Term Memory will be remembered. 
    - 0 means "completely forget this"
    - 1 means "completely keep this"

**2. Input Gate**
Purpose: Decides what new information to store in the memory.

How it works: It looks at the previous hidden state (Short-Term Memory) combined with the current input to create a **Potential Long-Term Memory**. The percentage of the Potential Long-Term Memory that is saved (the percentage that the Long-Term Memory will remember) is done utilzing the Sigmoid Activiation Function (the same methodology of the Forget Gate). Additionally, the Tanh Activation Function is used to create a vector of new candidate values, which are the potential values that could be added to the cell state (Long-Term Memory).

**3. Output Gate**
Purpose: Updates the Long-Term Memory

How it works: It first calculates the **Potential Short-Term Memory**. Next, the Sigmoid Activiation Function is again used to calculate the percentage of the potential memory to keep or remember. The Tanh Activation Function is applied to the updated cell state to ensure the output values are between -1 and 1, which stabilizes the values. The result is then multiplied by the percentage of potential memory to remember to get the **New Short-Term Memory**. The Potential Short-term Memory becomes the **New Long-Term Memory** and the starting Long-Term Memory Point for the next LSTM unit. The New Short-Term Memory result becomes the starting hidden state or Short-Term Memory for the next LSTM unit. 


**Summary of Activation Functions in LSTM**
Sigmoid Activation Function: Used in the Forget Gate, Input Gate, and Output Gate to decide what percentage of information to keep.
Tanh Activation Function: Used to create candidate values for the Input Gate and to update the cell state for the Output Gate.

![LSTM Visualization](./Images/report_1.webp)

**Evaluation Metric:**
We chose Root Mean Squared Error (RMSE) as the primary evaluation metric. RMSE was selected over Mean Absolute Error (MAE) because it gives higher weight to larger errors, which is crucial in stock price prediction where large errors are particularly undesirable. The lower the RMSE, the more accurate the model's predictions.

**LSTM Model:**
The LSTM model's performance varies across different stocks, with the lowest RMSE observed for COTY (0.4803) and the highest for ULTA (12.5352). This indicates that the model predicts COTY's stock prices most accurately, while ULTA's predictions show the largest errors. EL and ELF have moderate RMSE values, suggesting decent prediction accuracy but with potential for further enhancement. These results highlight the model's strengths and areas for improvement, guiding future efforts in refining the predictive capabilities for different stocks.

![Model Visualization](./Images/EDA_8.png)

## 5. LSTM Model  Results

The provided graphs show the actual versus predicted closing prices for EL, ULTA, COTY, and ELF:

**EL:** 
- **Prediction Accuracy:** The model demonstrates strong predictive performance, closely following the actual price movements over time. This indicates the LSTM model effectively captures the trends and patterns in the historical data.
- **Handling of Volatility:** Despite periods of significant volatility, particularly from 2020 onwards, the model maintains accuracy, indicating its robustness in handling fluctuating market conditions.
- **Performance Over Time:** The model's accuracy appears consistent across different time periods, including both stable and volatile phases. The close alignment of the predicted prices with actual prices suggests that the LSTM model has learned the temporal dependencies well.

![Model Visualization](./Images/EDA_9.png)

**ULTA:** 
- **Prediction Accuracy:** The LSTM model demonstrates high predictive accuracy, with the predicted values closely aligning with the actual prices. This suggests the model effectively captures the underlying patterns and trends in ULTA's stock prices.
- **Handling of Volatility:** The model maintains accuracy even during periods of significant volatility, particularly from 2020 onwards. This indicates the model's robustness in handling fluctuating market conditions and its ability to adapt to rapid changes in stock prices.
- **Performance Over Time:** The model performs consistently well across different time periods, including both stable and volatile phases. The close alignment between actual and predicted prices highlights the LSTM model's ability to learn and predict temporal dependencies accurately.

![Model Visualization](./Images/EDA_10.png)

**COTY:** 
- **Prediction Accuracy:** The LSTM model demonstrates high predictive accuracy, with the predicted values closely aligning with the actual prices. This indicates that the model effectively captures the trends and patterns in COTY's stock prices.
- **Handling of Volatility:** The model maintains its accuracy during periods of significant volatility, particularly noticeable from 2014 to 2016 and around 2020. This suggests that the LSTM model is robust in handling fluctuating market conditions and can adapt to rapid changes in stock prices.
- **Performance Over Time:** The model's performance is consistent across different time periods, including both stable and volatile phases. The close alignment between actual and predicted prices highlights the LSTM model's ability to learn and predict temporal dependencies accurately.

![Model Visualization](./Images/EDA_11.png)

**ELF:** 
- **Prediction Accuracy:** The LSTM model demonstrates high predictive accuracy, with the predicted values closely following the actual prices. This indicates that the model effectively captures the trends and patterns in ELF's stock prices.
- **Handling of Volatility:** The model maintains accuracy even during periods of significant volatility, such as from 2017 to 2020. This suggests the LSTM model is robust in handling fluctuating market conditions and can adapt to rapid changes in stock prices.
- **Performance Over Time:** The model's performance is consistent across different time periods, including both stable and volatile phases. The close alignment between actual and predicted prices highlights the LSTM model's ability to learn and predict temporal dependencies accurately.

![Model Visualization](./Images/EDA_12.png)

## 6. Future Predictions

The below graph illustrates the future predicted closing prices for the next three quarters for the four stocks: EL, ULTA, COTY, and ELF. The predictions were generated using the LSTM model and identified the following key insights:

- **Estée Lauder Companies Inc. (EL) - Blue Line:**
    - The predicted closing prices for EL show a slight decline initially, followed by stabilization around the 350-400 range. This suggests that the model anticipates a minor adjustment before prices level off.

- **Ulta Beauty Inc. (ULTA) - Orange Line:**
    - ULTA's predicted closing prices indicate a gradual decline from around 800 to about 700. The model forecasts a steady decrease over the next three quarters, suggesting possible market corrections or reduced growth expectations.

- **Coty Inc. (COTY) - Green Line:**
    - COTY's predicted prices remain relatively stable around the 10-20 range. This indicates that the model does not foresee significant changes in COTY's stock price, suggesting a period of stability.

- **e.l.f. Beauty Inc. (ELF) - Red Line:**
    - The predicted prices for ELF also show a stable trend, hovering around the 80-100 range. This stability indicates that the model expects ELF's stock to maintain its current performance levels without significant fluctuations.

As predictive modeling was used for this project, it is likely that the future predicted prices for the stock symbols below show as declined based on historical data used in the LSTM model. 

![Prediction Visualization](./Images/EDA_13.png)

## 7. Future Improvements

To enhance the model further, several improvements can be made:

- Feature Engineering: Incorporate more advanced technical indicators and macroeconomic factors to improve prediction accuracy. Examples of Economic Indicators for Feature Engineering:
    - Gross Domestic Product (GDP) Growth Rate:
        - Reflects the overall economic health and can influence investor confidence and stock market performance.
    - Unemployment Rate:
        - High unemployment can signal economic distress, affecting consumer spending and business revenues.
    - Inflation Rate:
        - Rising inflation can lead to increased costs for businesses and reduced purchasing power for consumers, impacting company profits and stock prices.
    - Interest Rates:
        - Higher interest rates can increase borrowing costs for companies and reduce consumer spending, negatively affecting stock prices.
    - Consumer Confidence Index:
        - Measures consumer optimism about the economy. Higher confidence can lead to increased spending and investment.
    - Retail Sales Data:
        - Indicates consumer spending trends, which are crucial for companies in the retail sector.
    - Industrial Production Index:
        - Reflects the output of the industrial sector, providing insight into economic activity.

- Model Optimization: Utilize hyperparameter tuning techniques such as grid search and Bayesian optimization to fine-tune the model.
- Real-Time Data Integration: Implement real-time data feeds to update the model continuously and provide up-to-date predictions.
- Resource Scaling: Use cloud-based solutions to handle larger datasets and more complex models without resource limitations.