Problem statement:
    
    A stock price always fluctuates, but is there a relation between a closing price and its moving averages?
    
    An analyst wants to showcase how the stock would move, he has 5 years of stock price data with him on which he needs to perform the analysis.

Objective:

    Find the possible futuristic movement of the stock 'GOOG' based on its price for the last 5 years.

Variable:
    
    - Date: date of the stock price
    - Open: opening price of the stock on that day
    - High: peak price of the stock on that day
    - Low: lowest price of the stock on that day
    - Close: closing price of the stock on that day
    - Volume: total volume traded on that day

Perform the following steps:
    
    1. Get 5 years of data from Finance Package (yfinance) with Ticker symbol 'GOOG"
    2. Create 50DMA and 200DMA
    3. Plot it with the Actual Price
    4. Basic EDA on the Data.
    5. Fit Statsmodel OLS to find the best possible features

In [12]:
import piplite
await piplite.install('seaborn')

In [13]:
await piplite.install('yfinance')

In [16]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import yfinance as yahooFinance

%matplotlib inline
sns.set_style('darkgrid')

import warnings
warnings.filterwarnings('ignore')

In [26]:
proxies = {'http':'127.0.0.1:58591', 'https':'127.0.0.1:58591'}
#data = yahooFinance.download('GOOG', start="2017-01-01", end="2022-12-31", proxy=proxies)
data = yahooFinance.download("GOOG", start="2017-01-01", end="2022-12-31", threads=False)

[*********************100%***********************]  1 of 1 completed

1 Failed download:
- GOOG: No timezone found, symbol may be delisted


In [9]:
# 5 years of data
#data = google.history('5Y')

GOOG: No data found for this date range, symbol may be delisted


In [6]:
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,1/3/2012,325.25,332.83,324.97,663.59,7380500
1,1/4/2012,331.27,333.87,329.08,666.45,5749400
2,1/5/2012,329.83,330.75,326.89,657.21,6590300
3,1/6/2012,328.34,328.77,323.68,648.24,5405900
4,1/9/2012,322.04,322.29,309.46,620.76,11688800


In [None]:
###SOLUTION
import yfinance as yahooFinance
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline

In [None]:
google = yahooFinance.Ticker('GOOG')

In [None]:
# 5 Years of data
df = google.history('5Y')

In [None]:
df.head()

In [None]:
df.tail

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
#Overview of closing price before calculating DMAs
sns.set_style('darkgrid')
plt.figure(figsize = (7,5), dpi = 150)
plt.title('Closing Price')
plt.plot(df['Close'])

In [None]:
#Calculate 50DMA
df['fiftyDMA'] = df['Close'].rolling(50).mean()

In [None]:
#Calculate 200DMA
df['thDMA'] = df['Close'].rolling(200).mean()

In [None]:
df.describe()

In [None]:
#Remove dividends and stock splits
df.drop(columns = ['Dividends','Stock Split'], inplace = True)

In [None]:
df.head()
#The first 20, 50 and 200 rows would be null because its calculating the Moving Average.

In [None]:
df.tail()

In [None]:
#Plot clsing price vs 50DMA vs 200DMA
sns.set_style('darkgrid')
plt.figure(figsize = (7,5), dpi = 150)
plt.title('Closing price vs 50DMA vs 200DMA')
plt.plot(df['Close'], label = 'Close')
plt.plot(df['fiftyDMA'], label = '50DMA')
plt.plot(df['thDMA'], label = '200DMA')
plt.legend()

50 DMA and 200 DMA are fairly good indicator of how the stock is moving from the graph above

If closing price above 50DMA or 200DMA = usually uptrend

If closing price below 50DMA or 200DMA = usually downtrend

Every point where 50DMA and 200DMA intersect, the market usually reverses its trend (High->Low | Low->High)

Hude dip at March 2020, indicates the COVID19 market crash.

# Finding the possible futuristic movement of the stock

In [None]:
#Analyse correlation between each variables
plt.figure(figsize=(7,7), dpi = 100)
sns.heatmap(df.corr(), annot = True)

In [None]:
#Closing price have high correlation with 50DMA and in fact with almost the variable except the volume
#Also there is a strong multicollinearity between all the other variables.

In [None]:
#Plot distplot of 50DMA
sns.set_style('darkgrid')
plt.figure(figsize=(7,5), dpi = 150)
plt.title('Distplot 50 DMA')
sns.distplot(df['fiftyDMA'])

In [None]:
#Plot distplot of the close price
sns.set_style('darkgrid')
plt.figure(figsize=(7,5), dpi = 150)
plt.title('Distplot of Close Price')
sns.distplot(df['Close'])

In [None]:
#Seems to be following the same distribution

In [None]:
import statsmodels.formula.api as smf

In [None]:
#200DMA was not use, even it have high correlation with Close because 50DMA and 200DMA was closely related to each other as well.
#So it was necessary to avoid multicollinearity when we are choosing features for our dataset.

model = smf.ols(formula = 'Close ~ fiftyDMA', data = df)
model = model.fit()

In [None]:
model.summary()

Good r2(R-squared) scores and coef

SInce they are very highly correlated, ~ 1, it isn't wise to fit the model based on the parameters in real life.

In [None]:
sns.se_style('darkgrid')
plt.figure(figsize = (7,5), dpi = 150)
plt.title('Closing price vs 50DMA vs 200DMA')
plt.plot(df['Close'], label = 'Close')
plt.plot(df['fiftyDMA'], label = '50DMA')
plt.plot(df['thDMA'], label = '200DMA')
plt.legend()

In [None]:
#If we notice the graph, the closing price is below 50DMA and 200DMA this likely means that the price will go down further.
#But at the end we notice that there is a steep increase which means that the 200DMA and 50DMA will change over time there will be a little bit of resistance
#with respect to price so it was unlikely that the price will fall very steeply.