# Dataset Characteristics
This notebook provides an overview of the dataset characteristics used for stock price prediction.

## Source
Historical stock data is downloaded from Yahoo Finance using the yfinance library.

## Time Frame
The data spans from January 1, 2021, to January 1, 2024.

## Attributes/Features
- **Date**: The specific trading day.
- **Open**: The opening price of the stock on the trading day.
- **High**: The highest price of the stock during the trading day.
- **Low**: The lowest price of the stock during the trading day.
- **Close**: The closing price of the stock on the trading day.
- **Volume**: The number of shares traded during the trading day.

### Technical Indicators
- **SMA (Simple Moving Average)**: Calculated over a 15-day window.
- **EMA (Exponential Moving Average)**: Calculated over a 15-day window.
- **RSI (Relative Strength Index)**: Calculated over a 14-day window.
- **MACD (Moving Average Convergence Divergence) Diff**: The difference between the MACD line and the signal line.
- **Bollinger High Band**: Upper band of Bollinger Bands calculated over a specific window.
- **Bollinger Low Band**: Lower band of Bollinger Bands calculated over a specific window.

## Data Preparation
- The dataset is preprocessed to include technical indicators and drop any rows with missing values.
- **Feature Scaling**: RobustScaler is used to scale the features to handle outliers more effectively.
- **Training Data**: Prepared by creating sliding windows of a specified size (60 days in this case), containing sequences of features used to predict the next day's closing price.

## Training and Test Split
- The training dataset consists of sequences formed from the scaled historical data, used to train the LSTM model.
- The test dataset consists of sequences formed similarly, used to evaluate the model's predictions.

## Data Preprocessing Steps
- **Download Data**: Historical stock data is downloaded for the given time frame.
- **Add Technical Indicators**: Calculate SMA, EMA, RSI, MACD, Bollinger Bands, etc., and add them as new columns to the dataset.
- **Handle Missing Values**: Drop rows with missing values resulting from technical indicator calculations.
- **Feature Scaling**: Apply RobustScaler to scale features to ensure they have a consistent range and reduce the impact of outliers.
- **Windowing**: Create sequences of window_size days to use as input features for the LSTM model, with the target being the closing price of the next day.