# Can we develop a model to predict short-term price movements for FAANG stocks based on historical price patterns, volume, and volatility indicators?

<h2>Data Collection and Cleaning</h2>

In [8]:
import pandas as pd

# Load the dataset
data = pd.read_csv('FAANG.csv')
data.head()

Unnamed: 0,Company,Ticker,Date,Open,High,Low,Close,Adj Close,Volume,Market Cap,...,Price to Book Ratio,Enterprise Value,Total Debt,Total Assets,Total Equity,Beta (5Y),Annual Dividend Rate,Trailing Twelve Months (TTM) Revenue,Trailing Twelve Months (TTM) EBITDA,Trailing Twelve Months (TTM) Earnings
0,Apple,AAPL,2005-01-03,1.156786,1.162679,1.117857,1.130179,0.954409,691992000,3575092084736,...,53.66043,3569143513088,101304000512,,,1.239,1.0,,,
1,Apple,AAPL,2005-01-04,1.139107,1.169107,1.124464,1.141786,0.96421,1096810400,3575092084736,...,53.66043,3569143513088,101304000512,,,1.239,1.0,,,
2,Apple,AAPL,2005-01-05,1.151071,1.165179,1.14375,1.151786,0.972655,680433600,3575092084736,...,53.66043,3569143513088,101304000512,,,1.239,1.0,,,
3,Apple,AAPL,2005-01-06,1.154821,1.159107,1.130893,1.152679,0.973409,705555200,3575092084736,...,53.66043,3569143513088,101304000512,,,1.239,1.0,,,
4,Apple,AAPL,2005-01-07,1.160714,1.243393,1.15625,1.236607,1.044284,2227450400,3575092084736,...,53.66043,3569143513088,101304000512,,,1.239,1.0,,,


In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23055 entries, 0 to 23054
Data columns (total 41 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   Company                                23055 non-null  object 
 1   Ticker                                 23055 non-null  object 
 2   Date                                   23055 non-null  object 
 3   Open                                   23055 non-null  float64
 4   High                                   23055 non-null  float64
 5   Low                                    23055 non-null  float64
 6   Close                                  23055 non-null  float64
 7   Adj Close                              23055 non-null  float64
 8   Volume                                 23055 non-null  int64  
 9   Market Cap                             23055 non-null  int64  
 10  PE Ratio                               23055 non-null  float64
 11  Be

Columns such as Revenue, Gross Profit, Operating Income, Cash Ratio, Total Assets, Total Equity, Trailing Twelve Months (TTM) Revenue, TTM EBITDA, and TTM Earnings have 0 non-null entries, making them irrelevant due to the complete absence of data. Columns like Beta and Beta (5Y) have partial missing data but may still hold some value. 

For our research goal, we will utlize the columns that help us understand historical price patterns, volume, and volatility indicators. 

These are: 
- Price and Volume: Open, High, Low, Close, Adj Close, Volume
- Volatility Indicators: Beta, Beta (5Y)
- Basic Financial Ratios: PE Ratio, Debt to Equity, Return on Equity (ROE), Current Ratio, Quick Ratio, Price to Book Ratio

<h2>Column Relevance Analysis</h2>

1. <b>Price and Volume</b>: Open, High, Low, Close, Adj Close, Volume
- Highly Relevant: These columns are essential for understanding historical price trends and calculating various technical indicators (e.g., moving averages, RSI, MACD). Volume, in particular, is crucial for gauging trading activity and potential price momentum.

2. <b>Volatility Indicators</b>: Beta, Beta (5Y)
- Moderately Relevant: Beta is a measure of a stock's volatility relative to the overall market. While it can provide insight into how sensitive a stock is to market movements, its impact may be more noticeable over longer periods. For short-term predictions, it could be used as a contextual feature, but it is not as impactful as technical indicators derived from historical prices.

3. <b>Basic Financial Ratios</b>: PE Ratio, Debt to Equity, Return on Equity (ROE), Current Ratio, Quick Ratio, Price to Book Ratio
- Less Relevant for Short-Term Analysis: These financial ratios are more suited for evaluating the company's fundamental strength and long-term value. While they can provide context for understanding stock performance, they are not primary indicators of short-term price movements, which are often driven by market sentiment, technical patterns, and trading volume.

In [22]:
data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Market Cap,PE Ratio,Beta,EPS,...,Price to Book Ratio,Enterprise Value,Total Debt,Total Assets,Total Equity,Beta (5Y),Annual Dividend Rate,Trailing Twelve Months (TTM) Revenue,Trailing Twelve Months (TTM) EBITDA,Trailing Twelve Months (TTM) Earnings
count,23055.0,23055.0,23055.0,23055.0,23055.0,23055.0,23055.0,23055.0,18073.0,23055.0,...,23055.0,23055.0,23055.0,0.0,0.0,18073.0,13091.0,0.0,0.0,0.0
mean,93.647661,94.863101,92.420934,93.672274,93.206572,138444500.0,1910403000000.0,35.912447,1.154244,10.29935,...,19.221725,1889730000000.0,70813890000.0,,,1.154244,1.162585,,,
std,126.060231,127.749769,124.330704,126.069016,126.123574,243000100.0,1083314000000.0,8.300362,0.079878,6.07117,...,18.271,1089725000000.0,54922110000.0,,,0.079878,0.476994,,,
min,1.139107,1.159107,1.117857,1.130179,0.954409,1144000.0,324753000000.0,23.492826,1.038,4.18,...,6.708661,301236000000.0,15981330000.0,,,1.038,0.8,,,
25%,11.728979,11.864486,11.587829,11.720929,11.366614,21315000.0,1465347000000.0,29.612986,1.038,6.57,...,8.437223,1439438000000.0,28719000000.0,,,1.038,0.8,,,
50%,38.584999,38.983002,38.297501,38.598499,38.050781,55763800.0,1996001000000.0,35.789955,1.147,6.97,...,9.359326,1933662000000.0,37991000000.0,,,1.147,1.0,,,
75%,134.849998,136.550003,133.449997,134.970001,134.17395,130289600.0,2024576000000.0,42.8245,1.239,17.67,...,14.262457,2036984000000.0,101304000000.0,,,1.239,1.0,,,
max,734.900024,736.0,722.5,730.289978,730.289978,3372970000.0,3575092000000.0,45.496414,1.239,19.56,...,53.66043,3569144000000.0,157842000000.0,,,1.239,2.0,,,


<h3>Handling Missing Data</h3>

In [14]:
missing_values = data.isnull().sum()
print(missing_values[missing_values > 0])

Beta                                      4982
Revenue                                  23055
Gross Profit                             23055
Operating Income                         23055
Dividends Paid                            9964
Dividend Yield                            9964
Cash Ratio                               23055
Total Assets                             23055
Total Equity                             23055
Beta (5Y)                                 4982
Annual Dividend Rate                      9964
Trailing Twelve Months (TTM) Revenue     23055
Trailing Twelve Months (TTM) EBITDA      23055
Trailing Twelve Months (TTM) Earnings    23055
dtype: int64


<h2>Imputing Values for the Beta columns</h2>

In [33]:
data['Beta'] = data['Beta'].fillna(data['Beta'].median())
data['Beta (5Y)'] = data['Beta (5Y)'].fillna(data['Beta (5Y)'].median())

In [35]:
data['Dividends Paid'] = data['Dividends Paid'].fillna(0)
data['Dividend Yield'] = data['Dividend Yield'].fillna(0)
data['Annual Dividend Rate'] = data['Annual Dividend Rate'].fillna(0)

In [39]:
# Dropping columns with 0 non-null values
# Drop columns with 0 non-null values
columns_to_drop = [
    'Revenue', 'Gross Profit', 'Operating Income', 'Cash Ratio',
    'Total Assets', 'Total Equity', 'Trailing Twelve Months (TTM) Revenue',
    'Trailing Twelve Months (TTM) EBITDA', 'Trailing Twelve Months (TTM) Earnings'
]
data.drop(columns=columns_to_drop, axis=1, inplace=True)

# Verify data structure
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23055 entries, 0 to 23054
Data columns (total 32 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Company                   23055 non-null  object 
 1   Ticker                    23055 non-null  object 
 2   Date                      23055 non-null  object 
 3   Open                      23055 non-null  float64
 4   High                      23055 non-null  float64
 5   Low                       23055 non-null  float64
 6   Close                     23055 non-null  float64
 7   Adj Close                 23055 non-null  float64
 8   Volume                    23055 non-null  int64  
 9   Market Cap                23055 non-null  int64  
 10  PE Ratio                  23055 non-null  float64
 11  Beta                      23055 non-null  float64
 12  EPS                       23055 non-null  float64
 13  Forward PE                23055 non-null  float64
 14  Net In

<h2>Exploratory Data Analysis</h2>

In [42]:
# converting to datetime format
data['Date'] = pd.to_datetime(data['Date'])

In [44]:
# date will be index
data.set_index('Date', inplace=True)

In [46]:
# seeing basic statistics for numerical columns
print(data.describe())

               Open          High           Low         Close     Adj Close  \
count  23055.000000  23055.000000  23055.000000  23055.000000  23055.000000   
mean      93.647661     94.863101     92.420934     93.672274     93.206572   
std      126.060231    127.749769    124.330704    126.069016    126.123574   
min        1.139107      1.159107      1.117857      1.130179      0.954409   
25%       11.728979     11.864486     11.587829     11.720929     11.366614   
50%       38.584999     38.983002     38.297501     38.598499     38.050781   
75%      134.849998    136.550003    133.449997    134.970001    134.173950   
max      734.900024    736.000000    722.500000    730.289978    730.289978   

             Volume    Market Cap      PE Ratio          Beta          EPS  \
count  2.305500e+04  2.305500e+04  23055.000000  23055.000000  23055.00000   
mean   1.384445e+08  1.910403e+12     35.912447      1.152678     10.29935   
std    2.430001e+08  1.083314e+12      8.300362      0

<h2>Segment the data by Company and perform EDA for each subset individually</h2>

In [None]:
# perform EDA and visualizations separately for each company
