# Title:Attention-based CNN-LSTM and XGBoost Hybrid Model for Stock Prediction

#### Group Member Names :Paul Udayan Gomez Jayaprakash,Udaya Kumar Siva Kumar



### INTRODUCTION:
The stock market is highly volatile and influenced by multiple factors, making prediction a challenging task. Traditional models like ARIMA often fail to capture non-linear patterns in stock data. To address this, deep learning and hybrid approaches combining multiple models are gaining popularity. This project focuses on reproducing and analyzing a research paper that introduces a hybrid model combining CNN-LSTM with XGBoost to improve the accuracy of stock price prediction.
*********************************************************************************************************************
#### AIM :
To implement and evaluate an Attention-based CNN-LSTM and XGBoost hybrid model for stock price prediction, demonstrating its effectiveness in capturing time-series dependencies and improving prediction accuracy compared to traditional methods.

*********************************************************************************************************************
#### Github Repo:
https://github.com/zshicode/Attention-CLX-stock-prediction

*********************************************************************************************************************
#### DESCRIPTION OF PAPER:
The paper titled “Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction” (Shi et al., 2022) proposes a hybrid approach for predicting stock prices. The model integrates three key components:

ARIMA for preprocessing and removing trends.

CNN-LSTM with Attention Mechanism to capture deep temporal and sequential patterns in stock data.

XGBoost for fine-tuning the predictions in a nonlinear relationship.

This hybrid design leverages the strengths of both statistical time series models and deep learning methods, resulting in improved accuracy and robustness in stock price forecasting.

*********************************************************************************************************************
#### PROBLEM STATEMENT :
Stock market prediction remains a complex problem due to the non-linear and dynamic nature of financial data. Traditional models like ARIMA lack the ability to model nonlinear relationships, while standalone deep learning models may overfit or fail to generalize. The problem addressed is how to design a hybrid architecture that combines different modeling strengths to achieve higher accuracy in predicting stock movements.

*********************************************************************************************************************
#### CONTEXT OF THE PROBLEM:
1.The stock market influences global economic stability and investor decisions.

2.Inaccurate predictions can lead to significant financial risks.

3.The need for robust models that can process both sequential dependencies and nonlinear patterns is crucial.
*
*********************************************************************************************************************
#### SOLUTION:
The solution is a three-stage hybrid model where ARIMA is first used to preprocess stock data by removing noise and trends, followed by a CNN-LSTM model with an attention mechanism to capture both temporal and feature-level dependencies, and finally XGBoost is applied to fine-tune the predictions. This approach effectively combines statistical and deep learning methods, enabling the model to handle nonlinear relationships, capture both short-term and long-term patterns, and ultimately improve stock prediction accuracy compared to using individual models alone.

*


# Background
*********************************************************************************************************************
#### Reference:
The paper introduces a three-stage hybrid model for stock prediction that combines ARIMA, CNN-LSTM with attention, and XGBoost.

#### Explanation:
The approach begins with ARIMA preprocessing to remove noise and trends from stock data. Then, a CNN-LSTM model with attention is applied to capture both short-term patterns and long-term dependencies while focusing on the most important features. Finally, XGBoost refines the CNN-LSTM outputs, improving prediction accuracy by handling nonlinear relationships.

#### Dataset/Input:
The dataset consists of historical stock data including open, high, low, close prices, and trading volume.

#### Weakness:
Although the model achieves higher accuracy compared to individual models, it requires high computational resources, is sensitive to parameter tuning, and may not perform consistently in highly volatile markets.



*********************************************************************************************************************






# Implement paper code :
https://github.com/Uday-Kumar01/MLP_Final-Project/blob/main/Research%20Paper%20Code.ipynb
*********************************************************************************************************************




*********************************************************************************************************************
### Contribution  Code :
https://github.com/Uday-Kumar01/MLP_Final-Project/blob/main/Contributed%20Code.ipynb


### Results :
The ARIMA model (2,1,0) was applied on the stock dataset, and the predictions followed the actual price trend closely. The residuals showed less noise, and the error values were smaller compared to the basic model.
*******************************************************************************************************************************


#### Observations :
1.Differencing made the data more stable.

2.Predictions matched the test data well.

3.Residual plots looked smoother, with less variation.
*******************************************************************************************************************************
*


### Conclusion and Future Direction :
*******************************************************************************************************************************
#### Learnings :
We learnt that using ARIMA before deep learning helps to clean the data and improve predictions. Tuning the parameters is also very important for better results.
*******************************************************************************************************************************
#### Results Discussion :
The updated method gave more accurate results than the original approach. ARIMA preprocessing made the CNN–LSTM model perform better by reducing noise in the data.


*******************************************************************************************************************************
#### Limitations :
The method still depends on parameter tuning and may not fully capture sudden market changes. It also works only for this dataset, and results may change for other stocks.



*******************************************************************************************************************************
#### Future Extension :
In the future, we can test this on more datasets, try seasonal models like SARIMA, or add external factors like news and economy data to improve prediction accuracy.
