# Stock Market Analysis and Prediction: Leveraging Data Science for Insights

### Table of Contents

By following this roadmap, readers can gain a comprehensive understanding of the stock market, learn how to leverage Data Science for financial analysis, and optimize investment strategies for maximum returns. Let's get started!

1. [Introduction](#introduction)  

2. [Data Collection and Preprocessing](#data-collection-and-preprocessing)  
    - [key libraries](#stock-data-retrieval-with-apis)   
   - [Stock Data Retrieval with APIs](#stock-data-retrieval-with-apis)  
   - [Data Cleaning and Formatting](#data-cleaning-and-formatting)  
   - [Feature Engineering for Machine Learning](#feature-engineering-for-machine-learning)  
     - [Technical Indicators](#technical-indicators)  
     - [Fundamental Metrics](#fundamental-metrics)  
   - [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)  
     - [Cumulative Returns](#cumulative-returns)  
     - [Skewness and Kurtosis](#skewness-and-kurtosis)  
     - [Pairplots and Correlation Matrix](#pairplots-and-correlation-matrix)  

3. [Descriptive Financial Metrics](#descriptive-financial-metrics)  
   - [Performance Metrics](#performance-metrics)  
     - [Beta and Alpha](#beta-and-alpha)  
     - [Sharpe Ratio](#sharpe-ratio)  
   - [Risk Analysis and Volatility](#risk-analysis-and-volatility)  

4. [Machine Learning for Financial Insights](#machine-learning-for-financial-insights)  
   - [Predictive Modeling](#predictive-modeling)  
     - [Stock Price Forecasting](#stock-price-forecasting)  
     - [Volatility Prediction](#volatility-prediction)  
   - [Classification Tasks](#classification-tasks)  
     - [Stock Movement Prediction](#stock-movement-prediction)  
     - [Risk Categorization](#risk-categorization)  
   - [Clustering for Stock Grouping](#clustering-for-stock-grouping)  
   - [Model Evaluation Metrics](#model-evaluation-metrics)  

5. [Portfolio Optimization](#portfolio-optimization)  
   - [What is a Portfolio?](#what-is-a-portfolio)  
   - [Markowitz Mean-Variance Optimization](#markowitz-mean-variance-optimization)  
   - [Black-Litterman Allocation Model](#black-litterman-allocation-model)  
     - [Prior](#prior)  
     - [Views](#views)  
     - [Confidences](#confidences)  
   - [Reinforcement Learning for Portfolio Optimization](#reinforcement-learning-for-portfolio-optimization)  

6. [Backtesting Investment Strategies](#backtesting-investment-strategies)  
   - [Technical Strategy Backtesting](#technical-strategy-backtesting)  
     - [RSI and Moving Average Crossover](#rsi-and-moving-average-crossover)  
     - [Hourly, Daily, and Weekly Data](#hourly-daily-and-weekly-data)  
   - [Comparing ML-Based vs. Traditional Approaches](#comparing-ml-based-vs-traditional-approaches)  

7. [Advanced Machine Learning Applications](#advanced-machine-learning-applications)  
   - [Deep Learning for Sequential Data](#deep-learning-for-sequential-data)  
   - [Anomaly Detection in Stock Behavior](#anomaly-detection-in-stock-behavior)  
   - [Reinforcement Learning for Dynamic Strategies](#reinforcement-learning-for-dynamic-strategies)  

8. [Insights and Conclusions](#insights-and-conclusions)  
   - [Summary of Findings](#summary-of-findings)  
   - [Actionable Insights for Investors](#actionable-insights-for-investors)  
   - [Limitations and Future Work](#limitations-and-future-work)  


# Introduction
____

This notebook explores the intersection of Data Science and finance through the analysis of stock data for four prominent technology companies: Apple (AAPL), Microsoft (MSFT), Google (GOOGL), and Amazon (AMZN).

**Objective:**

The primary objective of this project is to demonstrate how Python and Data Science techniques can be effectively utilized to analyze stock market trends, evaluate investment performance, and ultimately optimize investment strategies.

**Problem Statement:**

The stock market presents a complex and dynamic environment. Investors face numerous challenges, including:

* **Identifying profitable investment opportunities:** Understanding market trends, evaluating company performance, and predicting future stock prices are crucial for making informed investment decisions.
* **Managing risk:** Effectively assessing and mitigating investment risks is essential to protect capital and achieve long-term financial goals.
* **Optimizing portfolio allocation:** Determining the optimal allocation of assets across different stocks and asset classes is a critical aspect of portfolio management.

This project aims to address these challenges by:

* **Collecting and analyzing historical stock data:** Utilizing APIs to retrieve stock prices, financial statements, and other relevant data.

* **Employing data preprocessing and feature engineering techniques:** Cleaning, transforming, and creating new features from raw data to enhance analysis and model performance.

* **Calculating key financial metrics:** Evaluating stock performance using metrics such as beta, alpha, Sharpe Ratio, and volatility.

* **Implementing machine learning models:** Developing predictive models for stock price forecasting, volatility prediction, and stock movement classification.

* **Optimizing portfolio allocation:** Applying portfolio optimization techniques, including Markowitz Mean-Variance Optimization and Black-Litterman allocation, to construct efficient portfolios.

* **Backtesting investment strategies:** Evaluating the performance of different trading strategies, including technical analysis-based and machine learning-based approaches, using historical data.

By following the steps outlined in this notebook, readers will gain a practical understanding of how Data Science can be applied to the financial domain, empowering them to make more informed investment decisions and potentially improve their investment outcomes.





As a finance data analyst, understanding how to analyze the stock market is a fundamental skill that bridges the gap between raw data and actionable financial insights. The purpose of this notebook is to demonstrate essential skills in financial analysis, leveraging Python and data science techniques to extract meaningful insights from stock market data. By diving into practical examples, this project aims to showcase how data-driven decisions can be applied in the real world to optimize investments and enhance portfolio management.

For this analysis, I have carefully chosen four prominent stocks: **Apple (AAPL)**, **Microsoft (MSFT)**, **Google (Alphabet) (GOOGL)**, and **Amazon (AMZN)**. These companies share key characteristics that make them highly relevant for financial analysis:

1. **Market Leaders**: These stocks belong to some of the largest companies in the world by market capitalization, dominating their respective industries—technology, e-commerce, and cloud computing.
   
2. **Innovation and Growth**: They are known for their consistent innovation and ability to adapt, making them pivotal players in driving technological and economic trends globally.

3. **Broad Investor Interest**: Their stocks are widely traded and held by a diverse range of institutional and retail investors, making them representative of broader market movements.

4. **Global Influence**: These companies operate on a global scale, impacting multiple sectors and economies, which adds complexity and richness to their market behavior.

Through this analysis, I aim to provide a detailed exploration of their historical stock performance, identify key financial metrics, and develop predictive insights. The focus is not just on theoretical concepts but on practical, actionable results that showcase the power of combining finance and data science.


# Data Collection and Preprocessing
____

## Key Libraries

The success of this project hinges on leveraging powerful Python libraries that enable financial analysis, portfolio optimization, and technical analysis. These libraries form the backbone of the notebook, facilitating data retrieval, manipulation, visualization, and modeling. Below is an overview of the key libraries used and their specific contributions to the project:


- **`yfinance`** 
  A popular library that provides access to historical stock price data, financial statements, and other key metrics for a wide range of stocks. It is a valuable resource for extracting stock data directly from Yahoo Finance for analysis.
  ```python
  !pip install yfinance
  import yfinance as yf
  ```

- **`Quantstats`** 
  This library specializes in quantitative finance, offering tools for analyzing investment strategies, backtesting, and evaluating portfolio performance. It provides a comprehensive suite of functions for detailed financial analysis and visualization of key metrics.
  ```python
  !pip install quantstats
  import quantstats as qs
  ```


- **`PyPortfolioOpt`**
  This library focuses on portfolio optimization, enabling users to construct optimal portfolios based on various criteria such as risk, return, and constraints. It is a powerful tool for optimizing investment strategies, including mean-variance optimization and Black-Litterman models.

  ```python
  !pip install PyPortfolioOpt
  from pypfopt.efficient_frontier import EfficientFrontier
  from pypfopt import risk_models
  from pypfopt import expected_returns
  from pypfopt import plotting
  ```


- **`TA-Lib`** 
  A Technical Analysis Library (TA-Lib) offers a wide range of technical indicators for analyzing stock price data. It includes functions for calculating moving averages, RSI, MACD, Bollinger Bands, and other commonly used technical indicators.

  ```python
  !pip install TA-Lib
  import talib
  ```

- **`Plotly`**
  This library offers interactive visualization capabilities, allowing users to create dynamic and engaging plots for exploring stock data. It provides tools for creating interactive charts, dashboards, and visualizations.

  ```python
  !pip install plotly
  import plotly.express as px
  import plotly.graph_objects as go
  ```

Other commonly used libraries: 

- **`Pandas`**
  This library is essential for data manipulation and analysis, allowing us to handle and preprocess stock data efficiently. It provides powerful data structures and functions for cleaning, transforming, and analyzing financial data.

- **`Numpy`**
  A fundamental library for numerical computing, Numpy provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

- **`Matplotlib and Seaborn`**
  This combination of libraries is used for data visualization, enabling the creation of informative plots, charts, and graphs to visualize trends, patterns, and relationships in the stock data.

- **`Scikit-Learn`**
  A machine learning library that provides a wide range of tools for building predictive models, evaluating performance, and optimizing parameters. It includes functions for regression, classification, clustering, and model evaluation.

By combining these libraries with Python's robust data science capabilities, we can unlock the full potential of financial analysis and stock market prediction. The subsequent sections will delve into the process of collecting, preprocessing, and analyzing stock data to derive actionable insights for investors.

In [1]:
# Data Handling and Statistical Analysis
import pandas as pd
from pandas_datareader import data
import numpy as np
from scipy import stats
import skimpy as sp
pd.set_option('display.max_columns', None)


In [2]:
# Data Visualization
# Standard visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Interactive visualization libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)  # Enable Plotly offline


In [3]:
# Financial Data and Analysis
import ta
import talib
import quantstats as qs
import yfinance as yf
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models, expected_returns
from pypfopt import black_litterman, BlackLittermanModel


In [4]:
# Machine Learning and Optimization
#import packages
import optuna
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, HalvingGridSearchCV
from sklearn.experimental import enable_halving_search_cv
from sklearn.linear_model import SGDClassifier

## Data Retrieval with APIs: Stock Daily Returns

To initiate our analysis, we will retrieve historical stock price data for four prominent technology companies: 
- Apple: aapl

- Microsoft: msft

- Google (Alphabet): googl

- Amazon: amzn

We will utilize the `Quantstats` library to extract this data directly from Yahoo Finance, a popular source for financial information.    

These companies were selected due to their significant market capitalization, technological innovation, and widespread global influence, making them representative of the technology sector and attractive for investment analysis.  The data will include daily stock prices, trading volume, and other relevant metrics that will serve as the foundation for our analysis. Let's begin by importing the necessary  stock data.

The timeframe for this analysis will be from January 1, 2010, to December 31, 2021, covering over a decade of historical stock performance. This extended period will allow us to capture long-term trends, volatility, and key events that have shaped the stock market landscape. By analyzing this data, we can gain valuable insights into the historical performance of these companies and identify patterns that may inform future investment decisions.


In [5]:
# Define the time window
start = '2010-01-01'
end = '2021-12-31'

# Get returns for each stock within the defined time window
aapl = qs.utils.download_returns('AAPL').loc[start:end]
msft = qs.utils.download_returns('MSFT').loc[start:end]
googl = qs.utils.download_returns('GOOGL').loc[start:end]
amzn = qs.utils.download_returns('AMZN').loc[start:end]

# Save data to CSV files
aapl.to_csv('aapl_returns.csv')
msft.to_csv('msft_returns.csv')
googl.to_csv('googl_returns.csv')
amzn.to_csv('amzn_returns.csv')

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [6]:
# Load data from CSV files
aapl_returns = pd.read_csv('aapl_returns.csv', index_col=0, parse_dates=True) 
msft_returns = pd.read_csv('msft_returns.csv', index_col=0, parse_dates=True) 
googl_returns = pd.read_csv('googl_returns.csv', index_col=0, parse_dates=True) 
amzn_returns = pd.read_csv('amzn_returns.csv', index_col=0, parse_dates=True) 

## Data Cleaning and Formatting: Stock Daily Returns

Timezones in DatetimeIndex:    
When working with financial data, it's crucial to be aware of timezones. As our data contains stock retutns, and dates are in UTC timezone, we will specifically converts the `timezone-aware DatetimeIndex` to a timezone-naive DatetimeIndex.    

A timezone-naive DatetimeIndex does not have any timezone information associated with it. This conversion is essential for consistency and compatibility with various financial analysis tools and libraries.





In [7]:
# view the data
aapl_returns.head()

Unnamed: 0_level_0,AAPL
Date,Unnamed: 1_level_1
2010-01-04 00:00:00+00:00,0.015565
2010-01-05 00:00:00+00:00,0.001729
2010-01-06 00:00:00+00:00,-0.015906
2010-01-07 00:00:00+00:00,-0.001849
2010-01-08 00:00:00+00:00,0.006648


In [8]:
# converting time zone to none
aapl_returns.index = aapl_returns.index.tz_convert(None)
msft_returns.index = msft_returns.index.tz_convert(None)
googl_returns.index = googl_returns.index.tz_convert(None)
amzn_returns.index = amzn_returns.index.tz_convert(None)

In [9]:
# view the result of the conversion
aapl_returns.head()

Unnamed: 0_level_0,AAPL
Date,Unnamed: 1_level_1
2010-01-04,0.015565
2010-01-05,0.001729
2010-01-06,-0.015906
2010-01-07,-0.001849
2010-01-08,0.006648


In [10]:
# Rename the columns
aapl_returns.columns = ['returns'] 
msft_returns.columns = ['returns']
googl_returns.columns = ['returns']
amzn_returns.columns = ['returns']

In [11]:
# Display the first few rows of the appl_data
aapl_returns.head()

Unnamed: 0_level_0,returns
Date,Unnamed: 1_level_1
2010-01-04,0.015565
2010-01-05,0.001729
2010-01-06,-0.015906
2010-01-07,-0.001849
2010-01-08,0.006648


## Data Retrieval with APIs: Historical stock price

The next step is to retrieve historical stock price data for the selected companies using the `yfinance` library.    

We will extract daily stock prices, including the opening, high, low, closing prices, and trading volume, for the specified timeframe. This data will serve as the foundation for calculating technical indicators and fundamental metrics.

Due to the types of data, we will have to clean the data and convert the timezone-aware DatetimeIndex to a timezone-naive DatetimeIndex. This conversion is essential for consistency and compatibility with various financial analysis tools and libraries.

In [12]:
#import yfinance
import yfinance as yf

# Define the time window
start = '2010-01-01'
end = '2021-12-31'

# Get historical data for each stock within the defined time window
aapl_historical = yf.download('AAPL', start=start, end=end)
msft_historical = yf.download('MSFT', start=start, end=end)
googl_historical = yf.download('GOOGL', start=start, end=end)
amzn_historical = yf.download('AMZN', start=start, end=end)

# Save data to CSV files
aapl_historical.to_csv('aapl_price.csv')
msft_historical.to_csv('msft_price.csv')
googl_historical.to_csv('googl_price.csv')
amzn_historical.to_csv('amzn_price.csv')

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [13]:
# Load data from CSV files
aapl_price = pd.read_csv('aapl_price.csv', index_col=0, parse_dates=True) 
msft_price = pd.read_csv('msft_price.csv', index_col=0, parse_dates=True) 
googl_price = pd.read_csv('googl_price.csv', index_col=0, parse_dates=True) 
amzn_price = pd.read_csv('amzn_price.csv', index_col=0, parse_dates=True) 

## Data Cleaning and Formatting: Historical Stock Data

In [14]:
# view the data to check if it was loaded correctly
aapl_price.head()

Unnamed: 0_level_0,Adj Close,Close,High,Low,Open,Volume
Price,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL
Date,,,,,,
2010-01-04 00:00:00+00:00,6.447412490844727,7.643214225769043,7.660714149475098,7.585000038146973,7.622499942779541,493729600
2010-01-05 00:00:00+00:00,6.458560466766357,7.656428813934326,7.699643135070801,7.6160712242126465,7.664286136627197,601904800
2010-01-06 00:00:00+00:00,6.355825901031494,7.534643173217773,7.68678617477417,7.526785850524902,7.656428813934326,552160000


In [15]:
# dropping the Adj Close columns that are not needed
aapl_price = aapl_price.drop(['Adj Close'], axis=1)
msft_price = msft_price.drop(['Adj Close'], axis=1)
googl_price = googl_price.drop(['Adj Close'], axis=1)
amzn_price = amzn_price.drop(['Adj Close'], axis=1)

In [16]:
# reset the index to remove the multi-index
aapl_price.reset_index(inplace=True)
msft_price.reset_index(inplace=True)
googl_price.reset_index(inplace=True)
amzn_price.reset_index(inplace=True)

In [17]:
# clean the columns headers to make them uniform
def clean_columns_headers(df):
    df.columns = df.columns.str.strip()
    df.columns = df.columns.str.lower()
    df.columns = df.columns.str.replace(' ', '_')
    return df

aapl_price = clean_columns_headers(aapl_price)
msft_price = clean_columns_headers(msft_price)
googl_price = clean_columns_headers(googl_price)
amzn_price = clean_columns_headers(amzn_price)


In [18]:
# drop rows with missing values, and drop the first row
def drop_rows(df):
    df.dropna(inplace=True)
    df.drop(index=0, inplace=True)
    return df

aapl_price = drop_rows(aapl_price)
msft_price = drop_rows(msft_price)
googl_price = drop_rows(googl_price)
amzn_price = drop_rows(amzn_price)

In [19]:
# Rename the first column to 'date'
aapl_price.rename(columns={'price':'date'}, inplace=True)
msft_price.rename(columns={'price':'date'}, inplace=True)
googl_price.rename(columns={'price':'date'}, inplace=True)
amzn_price.rename(columns={'price':'date'}, inplace=True)

In [20]:
# view the data to check the changes
aapl_price.head()

Unnamed: 0,date,close,high,low,open,volume
2,2010-01-04 00:00:00+00:00,7.643214225769043,7.660714149475098,7.585000038146973,7.622499942779541,493729600
3,2010-01-05 00:00:00+00:00,7.656428813934326,7.699643135070801,7.6160712242126465,7.664286136627197,601904800
4,2010-01-06 00:00:00+00:00,7.534643173217773,7.68678617477417,7.526785850524902,7.656428813934326,552160000
5,2010-01-07 00:00:00+00:00,7.520713806152344,7.5714287757873535,7.466071128845215,7.5625,477131200
6,2010-01-08 00:00:00+00:00,7.570713996887207,7.5714287757873535,7.466429233551025,7.510714054107666,447610800


In [21]:
# check the data types of the columns
aapl_price.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3020 entries, 2 to 3021
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    3020 non-null   object
 1   close   3020 non-null   object
 2   high    3020 non-null   object
 3   low     3020 non-null   object
 4   open    3020 non-null   object
 5   volume  3020 non-null   object
dtypes: object(6)
memory usage: 165.2+ KB


In [22]:
# convert the data types of the columns to float
def convert_data_types(df):
    df['open'] = df['open'].astype(float)
    df['high'] = df['high'].astype(float)
    df['low'] = df['low'].astype(float)
    df['close'] = df['close'].astype(float)
    df['volume'] = df['volume'].astype(float)
    df['date'] = pd.to_datetime(df['date'])
    return df


# convert the data types of the columns to float
aapl_price = convert_data_types(aapl_price)
msft_price = convert_data_types(msft_price)
googl_price = convert_data_types(googl_price)
amzn_price = convert_data_types(amzn_price)

In [23]:
# Setting the date as the index
aapl_price.set_index('date', inplace=True)
msft_price.set_index('date', inplace=True)
googl_price.set_index('date', inplace=True)
amzn_price.set_index('date', inplace=True)

In [24]:
# convert the time zone to none
aapl_price.index = aapl_price.index.tz_convert(None)
msft_price.index = msft_price.index.tz_convert(None)
googl_price.index = googl_price.index.tz_convert(None)
amzn_price.index = amzn_price.index.tz_convert(None)


In [25]:
# view the data to check the changes
aapl_price.head()

Unnamed: 0_level_0,close,high,low,open,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04,7.643214,7.660714,7.585,7.6225,493729600.0
2010-01-05,7.656429,7.699643,7.616071,7.664286,601904800.0
2010-01-06,7.534643,7.686786,7.526786,7.656429,552160000.0
2010-01-07,7.520714,7.571429,7.466071,7.5625,477131200.0
2010-01-08,7.570714,7.571429,7.466429,7.510714,447610800.0


Thus , we have the monthly returns and the historical prices of the four stocks. We can now proceed to extract other relevant financial metrics and perform exploratory data analysis to gain insights into the stock market trends.

## Feature Engineering for Machine Learning

The success of machine learning models in financial analysis hinges on the quality and relevance of the features used for prediction. Feature engineering plays a critical role in extracting meaningful insights from raw data and enhancing the performance of predictive models. In this section, we will create a set of features that capture essential aspects of stock market behavior, including `technical indicators` and `fundamental metrics`.

The focus is on generating features that are relevant, informative, and predictive of future stock price movements. By combining technical indicators, fundamental metrics, and other relevant features, we aim to develop a comprehensive feature set that can be used for machine learning-based stock price forecasting, volatility prediction, and classification tasks.

####  Technical Indicators

Technical indicators are mathematical calculations derived from historical price, volume, or other market data to provide insights into market trends, momentum, volatility, and volume. These indicators are essential for understanding market behavior and identifying potential opportunities for trading or investment.

* Categories of Technical Indicators
	1.	**Trend Indicators**: These indicators identify the direction of market movements and help traders determine whether a market is in an uptrend, downtrend, or consolidating. Examples include:
		- `Moving Averages (MA)`: Smooths out price data to identify trends over time.
		- `Exponential Moving Average (EMA)`: Gives more weight to recent prices for faster responses to price changes.
		- `MACD (Moving Average Convergence Divergence)`: Highlights changes in momentum and trend direction.
		- `Parabolic SAR`: Indicates potential reversals in market trends.	

	2.	**Momentum Indicators**: These indicators measure the speed and strength of price movements, helping traders identify overbought or oversold conditions. Examples include:
		- `Relative Strength Index (RSI)`: Measures the magnitude of recent price changes to evaluate overbought or oversold conditions.
		- `Stochastic Oscillator`: Compares a security's closing price to its price range over a specific period.
		- `Rate of Change (ROC)`: Measures the percentage change in price between the current price and a past price.

	3.	**Volatility Indicators**: These indicators quantify the degree of price fluctuations in the market, helping traders assess risk and potential price movements. Examples include:
		- `Bollinger Bands`: Consist of a moving average and two standard deviation bands to identify price volatility.
		- `Average True Range (ATR)`: Measures market volatility by calculating the average range between price highs and lows.
		- `Keltner Channels`: Similar to Bollinger Bands, but use average true range to set channel boundaries.

	4.	**Volume Indicators**: These indicators analyze trading volume to assess the strength of price movements and identify potential reversals. Examples include:
		- `On-Balance Volume (OBV)`: Tracks cumulative volume to predict price movements.
		- `Accumulation/Distribution Line`: Combines price and volume data to assess the flow of money in and out of a security.
		- `Chaikin Money Flow (CMF)`: Measures the buying and selling pressure for a security.



### Trend Indicators measures: 

- **Moving Averages (MA):**    
A moving average is a widely used technical indicator that smooths out price data to identify trends over time. It calculates the average price of a security over a specified period, providing a clearer picture of the underlying trend. Moving averages are commonly used to identify support and resistance levels, trend direction, and potential entry or exit points for trades.

In [26]:
# Simple Moving Average (SMA)
aapl_price['sma_20'] = aapl_price['close'].rolling(window=20).mean()
msft_price['sma_20'] = msft_price['close'].rolling(window=20).mean()
googl_price['sma_20'] = googl_price['close'].rolling(window=20).mean()
amzn_price['sma_20'] = amzn_price['close'].rolling(window=20).mean()

- **Exponential Moving Average (EMA):**   
The exponential moving average is a type of moving average that gives more weight to recent prices, making it more responsive to price changes. It is calculated by applying a smoothing factor to the previous period's EMA and the current price. The EMA reacts faster to price movements than the simple moving average, making it popular among traders looking for timely signals.

In [27]:
# Exponential Moving Average (EMA)
aapl_price['ema_20'] = aapl_price['close'].ewm(span=20, adjust=False).mean()
msft_price['ema_20'] = msft_price['close'].ewm(span=20, adjust=False).mean()
googl_price['ema_20'] = googl_price['close'].ewm(span=20, adjust=False).mean()
amzn_price['ema_20'] = amzn_price['close'].ewm(span=20, adjust=False).mean()

- **MACD (Moving Average Convergence Divergence)**  
The Moving Average Convergence Divergence (MACD) is a trend-following momentum indicator that shows the relationship between two moving averages of a security's price. It consists of the MACD line, signal line, and histogram. The MACD line is calculated by subtracting the 26-period EMA from the 12-period EMA, while the signal line is the 9-period EMA of the MACD line. The histogram represents the difference between the MACD line and the signal line. The MACD is used to identify changes in trend direction, momentum, and potential buy or sell signals.

In [28]:
# Define the parameters for the MACD calculation
fastperiod = 12
slowperiod = 26
signalperiod = 9

aapl_price['macd'], aapl_price['macd_signal'], _ = talib.MACD(aapl_price['close'], fastperiod, slowperiod, signalperiod)
msft_price['macd'], msft_price['macd_signal'], _ = talib.MACD(msft_price['close'], fastperiod, slowperiod, signalperiod)
googl_price['macd'], googl_price['macd_signal'], _ = talib.MACD(googl_price['close'], fastperiod, slowperiod, signalperiod)
amzn_price['macd'], amzn_price['macd_signal'], _ = talib.MACD(amzn_price['close'], fastperiod, slowperiod, signalperiod)

**Parabolic SAR (SAR)**   
The parabolic SAR (stop and reverse) is a trend-following indicator that provides potential entry and exit points for trades. It appears as dots above or below the price chart, indicating the direction of the trend. When the dots are below the price, it suggests an uptrend, while dots above the price indicate a downtrend. The parabolic SAR is used to set trailing stop-loss orders and identify potential trend reversals.

In [33]:
aapl_price['sar'] = talib.SAR(aapl_price['high'], aapl_price['low'], acceleration=0.02, maximum=0.2)
msft_price['sar'] = talib.SAR(msft_price['high'], msft_price['low'], acceleration=0.02, maximum=0.2)
googl_price['sar'] = talib.SAR(googl_price['high'], googl_price['low'], acceleration=0.02, maximum=0.2)
amzn_price['sar'] = talib.SAR(amzn_price['high'], amzn_price['low'], acceleration=0.02, maximum=0.2)

In [34]:
aapl_price.head()

Unnamed: 0_level_0,close,high,low,open,volume,sma_20,ema_20,macd,macd_signal,sar
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010-01-04,7.643214,7.660714,7.585,7.6225,493729600.0,,7.643214,,,
2010-01-05,7.656429,7.699643,7.616071,7.664286,601904800.0,,7.644473,,,7.585
2010-01-06,7.534643,7.686786,7.526786,7.656429,552160000.0,,7.634013,,,7.699643
2010-01-07,7.520714,7.571429,7.466071,7.5625,477131200.0,,7.623222,,,7.699643
2010-01-08,7.570714,7.571429,7.466429,7.510714,447610800.0,,7.618222,,,7.6903


### Momentum Indicators measures:

* **Relative Strength Index (RSI):**
The Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. It ranges from 0 to 100 and is used to identify overbought or oversold conditions in a security. A high RSI value (above 70) indicates overbought conditions, while a low RSI value (below 30) suggests oversold conditions. The RSI is used to assess the strength of price movements and potential trend reversals.

In [35]:
aapl_price['RSI'] = talib.RSI(aapl_price['close'], timeperiod=14)
msft_price['RSI'] = talib.RSI(msft_price['close'], timeperiod=14)
googl_price['RSI'] = talib.RSI(googl_price['close'], timeperiod=14)
amzn_price['RSI'] = talib.RSI(amzn_price['close'], timeperiod=14)

* **Stochastic Oscillator:**   
The Stochastic Oscillator is a momentum indicator that compares a security's closing price to its price range over a specific period. It consists of two lines, %K and %D, which fluctuate between 0 and 100. The %K line represents the current price relative to the price range, while the %D line is a moving average of the %K line. The Stochastic Oscillator is used to identify overbought or oversold conditions and potential trend reversals.

In [37]:
# Calculate Stochastic Oscillator for AAPL
aapl_price['slowk'], aapl_price['slowd'] = talib.STOCH(aapl_price['high'], aapl_price['low'], aapl_price['close'], 
                                                       fastk_period=14, slowk_period=3, slowk_matype=0,
                                                       slowd_period=3, slowd_matype=0)

# Calculate Stochastic Oscillator for MSFT
msft_price['slowk'], msft_price['slowd'] = talib.STOCH(msft_price['high'], msft_price['low'], msft_price['close'], 
                                                       fastk_period=14, slowk_period=3, slowk_matype=0,
                                                       slowd_period=3, slowd_matype=0)

# Calculate Stochastic Oscillator for GOOGL
googl_price['slowk'], googl_price['slowd'] = talib.STOCH(googl_price['high'], googl_price['low'], googl_price['close'], 
                                                         fastk_period=14, slowk_period=3, slowk_matype=0,
                                                         slowd_period=3, slowd_matype=0)

# Calculate Stochastic Oscillator for AMZN
amzn_price['slowk'], amzn_price['slowd'] = talib.STOCH(amzn_price['high'], amzn_price['low'], amzn_price['close'], 
                                                       fastk_period=14, slowk_period=3, slowk_matype=0,
                                                       slowd_period=3, slowd_matype=0)


* **Rate of Change (ROC):**   
The Rate of Change (ROC) is a momentum oscillator that measures the percentage change in price between the current price and a past price. It calculates the rate of change over a specified period, providing insights into the speed and direction of price movements. The ROC is used to identify trends, momentum shifts, and potential buy or sell signals.

In [None]:
# Calculate Rate of Change (ROC) for each stock
aapl_price['ROC'] = talib.ROC(aapl_price['close'], timeperiod=10)
msft_price['ROC'] = talib.ROC(msft_price['close'], timeperiod=10)
googl_price['ROC'] = talib.ROC(googl_price['close'], timeperiod=10)
amzn_price['ROC'] = talib.ROC(amzn_price['close'], timeperiod=10)

(3020, 17)

### Volatility Indicators measures:

* **Bollinger Bands:**   
Bollinger Bands consist of a moving average and two standard deviation bands that are plotted above and below the moving average. The bands expand and contract based on price volatility, providing a visual representation of price volatility. Bollinger Bands are used to identify overbought or oversold conditions, potential trend reversals, and price volatility.

In [44]:
aapl_price['upper_band'], aapl_price['middle_band'], aapl_price['lower_band'] = talib.BBANDS(aapl_price['close'], 
                                                                                             timeperiod=20, 
                                                                                             nbdevup=2, 
                                                                                             nbdevdn=2, 
                                                                                             matype=0)

msft_price['upper_band'], msft_price['middle_band'], msft_price['lower_band'] = talib.BBANDS(msft_price['close'], 
                                                                                             timeperiod=20, 
                                                                                             nbdevup=2, 
                                                                                             nbdevdn=2, 
                                                                                             matype=0)

googl_price['upper_band'], googl_price['middle_band'], googl_price['lower_band'] = talib.BBANDS(googl_price['close'], 
                                                                                               timeperiod=20, 
                                                                                               nbdevup=2, 
                                                                                               nbdevdn=2, 
                                                                                               matype=0)

amzn_price['upper_band'], amzn_price['middle_band'], amzn_price['lower_band'] = talib.BBANDS(amzn_price['close'], 
                                                                                             timeperiod=20, 
                                                                                             nbdevup=2, 
                                                                                             nbdevdn=2, 
                                                                                             matype=0)


* **Average True Range (ATR):**  
The Average True Range (ATR) is a volatility indicator that measures the average range between price highs and lows over a specified period. It provides insights into the volatility of a security, helping traders assess the potential risk and reward of a trade. The ATR is used to set stop-loss levels, determine position size, and assess market volatility.




In [46]:
# Calculate Average True Range (ATR) for each stock
aapl_price['ATR'] = talib.ATR(aapl_price['high'], aapl_price['low'], aapl_price['close'], timeperiod=14)
msft_price['ATR'] = talib.ATR(msft_price['high'], msft_price['low'], msft_price['close'], timeperiod=14)
googl_price['ATR'] = talib.ATR(googl_price['high'], googl_price['low'], googl_price['close'], timeperiod=14)
amzn_price['ATR'] = talib.ATR(amzn_price['high'], amzn_price['low'], amzn_price['close'], timeperiod=14)


* **Keltner Channels:**  
Keltner Channels are volatility-based indicators that consist of an exponential moving average (EMA) and two bands based on the average true range (ATR). The bands expand and contract based on price volatility, providing insights into potential price movements. Keltner Channels are used to identify overbought or oversold conditions, trend direction, and potential entry or exit points for trades.

In [48]:
# Calculate Keltner Channels for AAPL
aapl_price['Keltner_middle'] = aapl_price['close'].rolling(window=20).mean()
aapl_price['Keltner_upper'] = aapl_price['Keltner_middle'] + (2 * aapl_price['ATR'])
aapl_price['Keltner_lower'] = aapl_price['Keltner_middle'] - (2 * aapl_price['ATR'])

# Calculate Keltner Channels for MSFT
msft_price['Keltner_middle'] = msft_price['close'].rolling(window=20).mean()
msft_price['Keltner_upper'] = msft_price['Keltner_middle'] + (2 * msft_price['ATR'])
msft_price['Keltner_lower'] = msft_price['Keltner_middle'] - (2 * msft_price['ATR'])

# Calculate Keltner Channels for GOOGL
googl_price['Keltner_middle'] = googl_price['close'].rolling(window=20).mean()
googl_price['Keltner_upper'] = googl_price['Keltner_middle'] + (2 * googl_price['ATR'])
googl_price['Keltner_lower'] = googl_price['Keltner_middle'] - (2 * googl_price['ATR'])

# Calculate Keltner Channels for AMZN
amzn_price['Keltner_middle'] = amzn_price['close'].rolling(window=20).mean()
amzn_price['Keltner_upper'] = amzn_price['Keltner_middle'] + (2 * amzn_price['ATR'])
amzn_price['Keltner_lower'] = amzn_price['Keltner_middle'] - (2 * amzn_price['ATR'])


### Volume Indicators measures:

* **On-Balance Volume (OBV):**   
On-Balance Volume (OBV) is a volume indicator that tracks cumulative volume to predict price movements. It adds or subtracts the volume based on the price direction, providing insights into the strength of buying and selling pressure. OBV is used to confirm price trends, identify potential reversals, and assess the flow of money in and out of a security.

In [50]:
aapl_price['OBV'] = talib.OBV(aapl_price['close'], aapl_price['volume'])
msft_price['OBV'] = talib.OBV(msft_price['close'], msft_price['volume'])
googl_price['OBV'] = talib.OBV(googl_price['close'], googl_price['volume'])
amzn_price['OBV'] = talib.OBV(amzn_price['close'], amzn_price['volume'])


* **Accumulation/Distribution Line:**   
The Accumulation/Distribution Line is a volume indicator that combines price and volume data to assess the flow of money in and out of a security. It calculates the value based on the close location relative to the high and low price, providing insights into buying and selling pressure. The Accumulation/Distribution Line is used to confirm price trends, identify potential reversals, and assess the strength of price movements.

In [54]:
aapl_price['AD'] = ((aapl_price['close'] - aapl_price['low']) - (aapl_price['high'] - aapl_price['close'])) / (aapl_price['high'] - aapl_price['low']) * aapl_price['volume']
aapl_price['AD_line'] = aapl_price['AD'].cumsum()

msft_price['AD'] = ((msft_price['close'] - msft_price['low']) - (msft_price['high'] - msft_price['close'])) / (msft_price['high'] - msft_price['low']) * msft_price['volume']

googl_price['AD'] = ((googl_price['close'] - googl_price['low']) - (googl_price['high'] - googl_price['close'])) / (googl_price['high'] - googl_price['low']) * googl_price['volume']

amzn_price['AD'] = ((amzn_price['close'] - amzn_price['low']) - (amzn_price['high'] - amzn_price['close'])) / (amzn_price['high'] - amzn_price['low']) * amzn_price['volume']


* **Chaikin Money Flow (CMF):**   
Chaikin Money Flow (CMF) is a volume indicator that measures the buying and selling pressure for a security. It combines price and volume data to calculate the value, providing insights into the flow of money in and out of a security. The CMF is used to confirm price trends, identify potential reversals, and assess the strength of price movements.

In [56]:
# Calculate Chaikin Money Flow (CMF) for AAPL
aapl_price['MF_multiplier'] = ((aapl_price['close'] - aapl_price['low']) - (aapl_price['high'] - aapl_price['close'])) / (aapl_price['high'] - aapl_price['low'])
aapl_price['MF_volume'] = aapl_price['MF_multiplier'] * aapl_price['volume']
aapl_price['CMF'] = aapl_price['MF_volume'].rolling(window=20).sum() / aapl_price['volume'].rolling(window=20).sum()

# Calculate Chaikin Money Flow (CMF) for MSFT
msft_price['MF_multiplier'] = ((msft_price['close'] - msft_price['low']) - (msft_price['high'] - msft_price['close'])) / (msft_price['high'] - msft_price['low'])
msft_price['MF_volume'] = msft_price['MF_multiplier'] * msft_price['volume']
msft_price['CMF'] = msft_price['MF_volume'].rolling(window=20).sum() / msft_price['volume'].rolling(window=20).sum()

# Calculate Chaikin Money Flow (CMF) for GOOGL
googl_price['MF_multiplier'] = ((googl_price['close'] - googl_price['low']) - (googl_price['high'] - googl_price['close'])) / (googl_price['high'] - googl_price['low'])
googl_price['MF_volume'] = googl_price['MF_multiplier'] * googl_price['volume']
googl_price['CMF'] = googl_price['MF_volume'].rolling(window=20).sum() / googl_price['volume'].rolling(window=20).sum()

# Calculate Chaikin Money Flow (CMF) for AMZN
amzn_price['MF_multiplier'] = ((amzn_price['close'] - amzn_price['low']) - (amzn_price['high'] - amzn_price['close'])) / (amzn_price['high'] - amzn_price['low'])
amzn_price['MF_volume'] = amzn_price['MF_multiplier'] * amzn_price['volume']
amzn_price['CMF'] = amzn_price['MF_volume'].rolling(window=20).sum() / amzn_price['volume'].rolling(window=20).sum()


In [57]:
aapl_price.head()

Unnamed: 0_level_0,close,high,low,open,volume,sma_20,ema_20,macd,macd_signal,sar,RSI,slowk,slowd,ROC,upper_band,middle_band,lower_band,ATR,Keltner_middle,Keltner_upper,Keltner_lower,OBV,AD,AD_line,MF_multiplier,MF_volume,CMF
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
2010-01-04,7.643214,7.660714,7.585,7.6225,493729600.0,,7.643214,,,,,,,,,,,,,,,493729600.0,265496600.0,265496600.0,0.537737,265496600.0,
2010-01-05,7.656429,7.699643,7.616071,7.664286,601904800.0,,7.644473,,,7.585,,,,,,,,,,,,1095634000.0,-20574860.0,244921700.0,-0.034183,-20574860.0,
2010-01-06,7.534643,7.686786,7.526786,7.656429,552160000.0,,7.634013,,,7.699643,,,,,,,,,,,,543474400.0,-497928900.0,-253007200.0,-0.901784,-497928900.0,
2010-01-07,7.520714,7.571429,7.466071,7.5625,477131200.0,,7.623222,,,7.699643,,,,,,,,,,,,66343200.0,17787340.0,-235219800.0,0.03728,17787340.0,
2010-01-08,7.570714,7.571429,7.466429,7.510714,447610800.0,,7.618222,,,7.6903,,,,,,,,,,,,513954000.0,441516600.0,206296800.0,0.986385,441516600.0,


In [58]:
aapl_price.isnull().sum()

close              0
high               0
low                0
open               0
volume             0
sma_20            19
ema_20             0
macd              33
macd_signal       33
sar                1
RSI               14
slowk             17
slowd             17
ROC               10
upper_band        19
middle_band       19
lower_band        19
ATR               14
Keltner_middle    19
Keltner_upper     19
Keltner_lower     19
OBV                0
AD                 0
AD_line            0
MF_multiplier      0
MF_volume          0
CMF               19
dtype: int64

#### Fundamental Metrics

In addition to technical indicators, fundamental metrics provide valuable insights into a company's financial health, performance, and valuation. These metrics are derived from financial statements, earnings reports, and other fundamental data sources, offering a comprehensive view of a company's operations and prospects. By incorporating fundamental metrics into our feature set, we can enhance the predictive power of our machine learning models and gain a deeper understanding of the factors driving stock price movements.

Fundamental metrics are quantitative data points derived from a company's financial statements, earnings reports, and other fundamental data sources. These metrics provide insights into a company's financial health, performance, valuation, and growth prospects, helping investors make informed decisions about stock investments. By analyzing fundamental metrics, investors can assess the intrinsic value of a company, evaluate its competitive position, and identify potential investment opportunities.


* Categories of Fundamental Metrics
	1.	**Valuation Metrics**: These metrics assess the relative value of a company's stock by comparing its market price to fundamental indicators such as earnings, book value, and cash flow. Examples include:
		- `Price-to-Earnings (P/E) Ratio`: Compares a company's stock price to its earnings per share to evaluate valuation.
		- `Price-to-Book (P/B) Ratio`: Compares a company's stock price to its book value per share to assess valuation.
		- `Price-to-Sales (P/S) Ratio`: Compares a company's stock price to its revenue per share to evaluate valuation.

	2.	**Profitability Metrics**: These metrics measure a company's ability to generate profits and manage costs effectively. Examples include:
		- `Return on Equity (ROE)`: Measures a company's profitability by evaluating its return on shareholders' equity.
		- `Net Profit Margin`: Measures the percentage of revenue that translates into profit after accounting for expenses.
		- `Operating Margin`: Measures the percentage of revenue that translates into profit after accounting for operating expenses.

	3.	**Growth Metrics**: These metrics assess a company's growth prospects and potential for future expansion. Examples include:
		- `Revenue Growth Rate`: Measures the percentage increase in a company's revenue over a specific period.
		- `Earnings Growth Rate`: Measures the percentage increase in a company's earnings over a specific period.
		- `Dividend Yield`: Measures the percentage of dividends paid relative to a company's stock price.

	4.	**Financial Health Metrics**: These metrics evaluate a company's financial stability, liquidity, and debt levels. Examples include:
		- `Debt-to-Equity Ratio`: Measures a company's debt relative to its equity to assess financial leverage.
		- `Current Ratio`: Measures a company's ability to cover short-term liabilities with its short-term assets.
		- `Interest Coverage Ratio`: Measures a company's ability to pay interest on its debt with its earnings.

____