![Image](Picture References/stock_market_prediction.jpeg)

# Stock Market Prediction Model

The stock market plays a crucial role in the global economy, influencing investment strategies, economic policies, and financial stability. With the increasing complexity and volume of financial data, accurate stock market predictions have become more critical than ever. According to the World Bank, global stock market capitalization has seen significant growth, emphasizing the importance of effective market analysis and prediction systems.

Despite advancements in data analytics and machine learning, many stock market platforms still lack robust predictive models that can help investors make informed decisions. In this project, we aim to develop a comprehensive stock market prediction system that leverages historical data and advanced machine learning techniques to forecast stock prices accurately.

## 1. Data

The dataset for this project was downloaded from Kaggle and has been filtered and cleaned to only include 15 beauty and wellness stock data. The primary goal is to develop a predictive model that analyzes vast amounts of beauty and wellness stock data and reacts to market changes much faster than humans, potentially leading to improved trading performance. However, it's important to note that predicting stock prices is inherently challenging due to the complexity and randomness of financial markets. While stock predictor models can provide valuable insights, they are not guaranteed to accurately forecast future prices. It's essential for investors to consider various factors, including market conditions, economic indicators, and company fundamentals, when making investment decisions.

Kaggle Dataset link: https://www.kaggle.com/datasets/footballjoe789/us-stock-dataset/data?select=Data

## 2. Data Cleaning

[Data Cleaning Notebook](./1.%20Stock_Predictor_Data_Wrangling.ipynb)

In developing an effective stock market prediction model, it is essential to ensure that the data used for training and evaluation is clean and well-structured. This process involves addressing issues such as missing values, outliers, and inconsistent data entries. Here, we outline the steps taken to clean and prepare the dataset for our machine learning models.

**Problem 1: Data Filtering**
- **Issue:** The datasaet contained thousands of stock symbols which woudl make codnucting this project too broad and difficult.

- **Solution:** To address this, the project was filtered to only incldue 4 beauty and wellness stock symbols.

**Problem 2: Missing Values**
- **Issue:** The dataset contained missing values in several columns, including Open, High, Low, Close, and Volume. Missing data can lead to inaccuracies in model predictions and skew the results.

- **Solution:** To handle missing values, we employed the following strategies:

- **Forward Fill and Backward Fill:** For time-series data, we used forward fill and backward fill methods to propagate the last valid observation forward or backward.

- **Mean/Median Imputation:** For other numeric columns, we filled missing values with the mean or median of the respective column.

## 3. EDA

[EDA Notebook](./2.%20Stock_Predictor_Model_EDA.ipynb)

Exploratory Data Analysis (EDA) is a critical step in understanding the structure and nuances of the dataset used for stock market prediction. This section details the insights gained from the EDA, highlighting key patterns, trends, and relationships within the data.

- **Distribution of Closing Prices:** The distribution of closing prices for various stocks was examined. The time series plot effectively highlights the historical performance and trends of the four stocks over the period from 1996 to 2024. The data indicates significant growth for ULTA, ELF, and EL, especially post-2009, with recent years showing increased volatility likely due to global economic events. In contrast, COTY's stock price has remained relatively stable with minor fluctuations.

![EDA Visualization](EDA_1.png)

## **Correlation Matrices**: Heatmaps of the correlation matrix helped visualize the relationships between different features and identified the following for each stock symbol:    

**EL**: The correlation heatmap for EL (Estée Lauder Companies Inc.) reveals strong positive correlations among the Open, High, Low, and Close prices, as well as various technical indicators, indicating these features move together. Trading volume exhibits low negative correlation with these price features, suggesting different underlying dynamics. Dividends and stock splits show minimal impact on other variables, indicating they do not significantly influence or get influenced by the stock prices. 
    

![EDA Visualization](EDA_2.png)

**ULTA**: The correlation heatmap for ULTA shows strong positive correlations among the Open, High, Low, and Close prices, with correlation coefficients of 1.00. This indicates that these price features move together consistently. Technical indicators like EMA_10, PSARI_0.02_0.2, and BBM_5_2.0 also exhibit very high positive correlations with these price features, highlighting their dependency on price movements. Trading volume shows a low positive correlation with price features, suggesting it varies independently of price changes. Dividends and stock splits have negligible correlations with other features, indicating they do not significantly impact or are impacted by stock prices.

![EDA Visualization](EDA_3.png)

**COTY**: The correlation heatmap for COTY shows a strong positive correlation among the Open, High, Low, and Close prices, indicating these features move together. There is a moderate negative correlation between trading volume and price features, suggesting that higher trading volumes are associated with lower prices. Technical indicators such as EMA_10 and PSARI_0.02_0.2 also exhibit strong positive correlations with price features, highlighting their dependency on price movements. Dividends and stock splits have minimal correlation with other features, indicating they do not significantly impact or are impacted by stock prices. 

![EDA Visualization](EDA_4.png)

**ELF**: The correlation heatmap for ELF shows strong positive correlations among the Open, High, Low, and Close prices, indicating these features move together consistently with correlation coefficients of 1.00. Technical indicators such as EMA_10, PSARI_0.02_0.2, and others also show high positive correlations with the price features, underscoring their reliance on stock price movements. Trading volume exhibits a moderate positive correlation with price features, around 0.27, suggesting some degree of co-movement.

![EDA Visualization](EDA_5.png)

## **Interpretation and Summary of Stock Prices with Volume Traded**: 
- **ULTA**: The closing price has steadily increased, particularly from around 2009, reaching peaks above 500 in recent years. Volume traded, depicted in red, shows sporadic spikes indicating periods of high trading activity. The general trend of increasing prices is accompanied by relatively stable volume, suggesting consistent investor interest with occasional high-volume trading periods.
- **ELF**: TELF's closing prices started to rise sharply around 2016, reaching above 200. Trading volumes also show significant spikes, especially during the rapid price increases, indicating heightened investor activity. The correlation between price increase and volume spikes suggests strong market interest during ELF's growth phases.
- **EL**: EL's stock price exhibits a steady upward trend, particularly accelerating post-2009 and peaking above 350 before a recent decline. Volume traded shows consistent activity with occasional high spikes. The volume spikes correspond to significant price movements, indicating periods of heightened trading and possible investor reactions to market events. 
- **COTY**: COTY's closing prices have been more volatile compared to the other stocks, with significant price drops and subsequent recoveries. Trading volumes show high spikes during periods of price volatility, particularly around 2016 and 2020, indicating strong market reactions and investor activity during these times. The correlation between price volatility and volume spikes suggests that market sentiment heavily influences COTY's trading behavior.

![EDA Visualization](EDA_7.png)

## 4. Modeling

[Pre-Processing Notebook](./3.%20Stock_Predictor_Model_Pre_Processing.ipynb)

[Modeling Notebook](./4.%20Stock_Predictor_Model-Modeling.ipynb)

For this project, we employed Long Short-Term Memory (LSTM) networks. LSTM, a type of recurrent neural network (RNN), is well-suited for time series data like stock prices due to its ability to capture temporal dependencies and long-term trends. Despite being computationally expensive, the LSTM model provides the best balance between accuracy and performance.

**Evaluation Metric:**
We chose Root Mean Squared Error (RMSE) as the primary evaluation metric. RMSE was selected over Mean Absolute Error (MAE) because it gives higher weight to larger errors, which is crucial in stock price prediction where large errors are particularly undesirable. The lower the RMSE, the more accurate the model's predictions.

**LSTM Model:**
The LSTM model's performance varies across different stocks, with the lowest RMSE observed for COTY (0.4803) and the highest for ULTA (12.5352). This indicates that the model predicts COTY's stock prices most accurately, while ULTA's predictions show the largest errors. EL and ELF have moderate RMSE values, suggesting decent prediction accuracy but with potential for further enhancement. These results highlight the model's strengths and areas for improvement, guiding future efforts in refining the predictive capabilities for different stocks.

![Model Visualization](EDA_8.png)

## 5. LSTM Model  Results

The provided graphs show the actual versus predicted closing prices for EL, ULTA, COTY, and ELF:

**EL:** 
- **Prediction Accuracy:** The model demonstrates strong predictive performance, closely following the actual price movements over time. This indicates the LSTM model effectively captures the trends and patterns in the historical data.
- **Handling of Volatility:** Despite periods of significant volatility, particularly from 2020 onwards, the model maintains accuracy, indicating its robustness in handling fluctuating market conditions.
- **Performance Over Time:** The model's accuracy appears consistent across different time periods, including both stable and volatile phases. The close alignment of the predicted prices with actual prices suggests that the LSTM model has learned the temporal dependencies well.

![Model Visualization](EDA_9.png)

**ULTA:** 
- **Prediction Accuracy:** The LSTM model demonstrates high predictive accuracy, with the predicted values closely aligning with the actual prices. This suggests the model effectively captures the underlying patterns and trends in ULTA's stock prices.
- **Handling of Volatility:** The model maintains accuracy even during periods of significant volatility, particularly from 2020 onwards. This indicates the model's robustness in handling fluctuating market conditions and its ability to adapt to rapid changes in stock prices.
- **Performance Over Time:** The model performs consistently well across different time periods, including both stable and volatile phases. The close alignment between actual and predicted prices highlights the LSTM model's ability to learn and predict temporal dependencies accurately.

![Model Visualization](EDA_10.png)

**COTY:** 
- **Prediction Accuracy:** The LSTM model demonstrates high predictive accuracy, with the predicted values closely aligning with the actual prices. This indicates that the model effectively captures the trends and patterns in COTY's stock prices.
- **Handling of Volatility:** The model maintains its accuracy during periods of significant volatility, particularly noticeable from 2014 to 2016 and around 2020. This suggests that the LSTM model is robust in handling fluctuating market conditions and can adapt to rapid changes in stock prices.
- **Performance Over Time:** The model's performance is consistent across different time periods, including both stable and volatile phases. The close alignment between actual and predicted prices highlights the LSTM model's ability to learn and predict temporal dependencies accurately.

![Model Visualization](EDA_11.png)

**ELF:** 
- **Prediction Accuracy:** The LSTM model demonstrates high predictive accuracy, with the predicted values closely following the actual prices. This indicates that the model effectively captures the trends and patterns in ELF's stock prices.
- **Handling of Volatility:** The model maintains accuracy even during periods of significant volatility, such as from 2017 to 2020. This suggests the LSTM model is robust in handling fluctuating market conditions and can adapt to rapid changes in stock prices.
- **Performance Over Time:** The model's performance is consistent across different time periods, including both stable and volatile phases. The close alignment between actual and predicted prices highlights the LSTM model's ability to learn and predict temporal dependencies accurately.

![Model Visualization](EDA_12.png)

## 6. Future Predictions

The below graph illustrates the future predicted closing prices for the next three quarters for the four stocks: EL, ULTA, COTY, and ELF. The predictions were generated using the LSTM model and identified the following key insights:

- **Estée Lauder Companies Inc. (EL) - Blue Line:**
    - The predicted closing prices for EL show a slight decline initially, followed by stabilization around the 350-400 range. This suggests that the model anticipates a minor adjustment before prices level off.

- **Ulta Beauty Inc. (ULTA) - Orange Line:**
    - ULTA's predicted closing prices indicate a gradual decline from around 800 to about 700. The model forecasts a steady decrease over the next three quarters, suggesting possible market corrections or reduced growth expectations.

- **Coty Inc. (COTY) - Green Line:**
    - COTY's predicted prices remain relatively stable around the 10-20 range. This indicates that the model does not foresee significant changes in COTY's stock price, suggesting a period of stability.

- **e.l.f. Beauty Inc. (ELF) - Red Line:**
    - The predicted prices for ELF also show a stable trend, hovering around the 80-100 range. This stability indicates that the model expects ELF's stock to maintain its current performance levels without significant fluctuations.

![Prediction Visualization](EDA_13.png)

## 7. Future Improvements

To enhance the model further, several improvements can be made:

- Feature Engineering: Incorporate more advanced technical indicators and macroeconomic factors to improve prediction accuracy.
- Model Optimization: Utilize hyperparameter tuning techniques such as grid search and Bayesian optimization to fine-tune the model.
- Real-Time Data Integration: Implement real-time data feeds to update the model continuously and provide up-to-date predictions.
- Resource Scaling: Use cloud-based solutions to handle larger datasets and more complex models without resource limitations.