# **Core Stock Data EDA for GOOG Ticker**
## In this notebook we will examine only the Google stock for the periods we have selected for this project (03-14-2019 through 08-15-2024), and see what we can derive from it through our plots.  We will look at each of our core stock tickers separately in order to gain a better analysis.


#### Let's start by bringing in the libraries and logic necessary for reading in our file.

In [1]:
import sys
import os

project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
from scipy.stats import linregress
from scipy.stats import gaussian_kde

#### Now let's read in our data that we need for this notebook.

In [2]:
# Now let's access the main core_stock_data.csv file
csv_path = os.path.join(project_root, 'data', 'core_stock_data.csv')
core_stock_data = pd.read_csv(csv_path, parse_dates=['Date'], index_col= 'Date')
core_stock_data.head()

Unnamed: 0_level_0,Close_core,Volume_core,Open_core,High_core,Low_core,SMA_core,EMA_core,RSI_core,BBM_core,BBU_core,...,ATR_14_core,Stoch_K_core,Stoch_D_core,Momentum_1_core,Momentum_3_core,Momentum_7_core,Momentum_30_core,Momentum_50_core,OBV_core,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-03-14,45.932499,94318000,45.974998,46.025002,45.639999,41.35925,42.219051,75.741602,41.35925,46.695085,...,0.700179,97.465683,90.860103,0.504997,1.2075,2.049999,4.619999,7.049999,1592190800,AAPL
2019-03-15,46.529999,156171600,46.212502,46.8325,45.935001,41.50025,42.388107,76.98591,41.50025,47.003365,...,0.712679,93.213648,93.05254,0.5975,1.302498,2.899998,4.919998,7.049999,1748362400,AAPL
2019-03-18,47.005001,104879200,46.450001,47.0975,46.447498,41.7294,42.569162,78.724282,41.7294,47.174667,...,0.721072,98.041317,96.240216,0.475002,1.577499,3.880001,5.375,11.4575,1853241600,AAPL
2019-03-19,46.6325,126585600,47.087502,47.247501,46.48,41.92075,42.728509,73.527018,41.92075,47.369412,...,0.735358,87.378112,92.877692,-0.372501,0.700001,3.404999,3.82,9.567501,1726656000,AAPL
2019-03-20,47.040001,124140800,46.557499,47.372501,46.182499,42.1219,42.897587,80.396901,42.1219,47.569044,...,0.784822,93.346666,92.922032,0.407501,0.510002,2.315002,3.495003,10.057503,1850796800,AAPL


In [3]:
# Now let's just select our subject stock information in the APPL stock.
goog_data = core_stock_data[core_stock_data['Ticker'] == 'GOOG']
print(goog_data.head())

            Close_core  Volume_core  Open_core  High_core   Low_core  \
Date                                                                   
2019-03-14   59.277500     23456000  59.725498  59.894001  59.223999   
2019-03-15   59.223000     49236000  59.668999  59.828499  59.130501   
2019-03-18   59.213001     25852000  59.165001  59.500000  58.871052   
2019-03-19   59.942501     30414000  59.440498  60.000000  59.293499   
2019-03-20   61.198502     44548000  59.867500  61.356998  59.808498   

            SMA_core   EMA_core   RSI_core  BBM_core   BBU_core  ...  \
Date                                                             ...   
2019-03-14  55.33028  55.750025  78.460015  55.33028  59.234571  ...   
2019-03-15  55.46889  55.886221  78.388790  55.46889  59.424741  ...   
2019-03-18  55.63709  56.016683  77.287448  55.63709  59.497201  ...   
2019-03-19  55.76523  56.170636  79.499789  55.76523  59.763490  ...   
2019-03-20  55.92081  56.367807  82.195692  55.92081  60.145492

#### Note above that the starting date is 03-14, this is because of the rolling windows when we calculated our SMA (Single Moving Average).  With our setting of 50 days for the window it was offset by the starting date of 01-01-2019, and we needed a whole window to compute.  So the date you see above in 03-14 starts a new window for us.

#### Let's begin our EDA analysis with a simple look at Closing price over time.  We will be using Plotly a lot here, as we can use the interactivity of a singular plot to make multiple insights.

In [4]:
x = np.arange(len(goog_data))
y = goog_data['Close_core'].values

slope, intercept, r_value, p_value, std_err = linregress(x, y)

regression_line = slope * x + intercept


fig = go.Figure()

fig.add_trace(go.Scatter(x = goog_data.index, y = goog_data['Close_core'], mode = 'lines', name = 'Close Price'))

fig.add_trace(go.Scatter(x = goog_data.index, y = regression_line, mode = 'lines', name = 'Linear Trend', line = dict(color = 'red', dash = 'dash')))

fig.update_layout(title = 'GOOG Closing Price with Linear Trend Line', xaxis_title = 'Date', yaxis_title = 'Price', template = 'plotly_dark')

fig.show()

#### Key Takeways:  GOOG experienced fluctuation in pricing during our observation period.  Looking into historical events we can see:
#### (Early 2020) Around March 2020 GOOG was impacted by the pandemic, as was almost everyone.  The results were a notable decrease in their stock price.

#### (Throughout 2021) Coming out of the pandemic GOOG began to recover rapidly as everyone began to go back to work.  Google's ability to capitalize on the acceleration of digital transformation was key in getting its stock price back up during this time.

#### (Feb 2022) At this time Alphabet (Google's parent company) announced a 20-for-1 stock split, which drove demand and the price upwards.

#### Now let's look at Volume over the same period for GOOG.

In [5]:
# converting to monthly data for a smoother plot
goog_monthly = goog_data.resample('ME').sum()

fig = go.Figure()

fig.add_trace(go.Bar(x = goog_monthly.index, y = goog_monthly['Volume_core'], name = 'Monthly Volume', marker_color = 'cyan'))

fig.update_layout(title = 'GOOG Monthly Volume Over Time', xaxis_title = 'Date', yaxis_title = 'Volume', template = 'plotly_dark')

fig.show()

#### Looking at the above monthly Volume for GOOG we can see a couple things.  The same time period for 2020 - 2021 again is noticeable, this time as GOOG's most positive trading period though not near as noticeable as some of the other core stocks in this project we are viewing.  After 2021 the trading volume for GOOG hits a low for the current data then normalizes before declining again in 2024.

#### Now let's look at the SMA and EMA (Simple Moving Average and Exponential Moving Average, respectively) for GOOG, to see what trends exist in this time period.

In [6]:
fig = go.Figure()

# We will again use Close Price here as a starting figure for our SMA and EMA
fig.add_trace(go.Scatter(x = goog_data.index, y = goog_data['Close_core'], mode = 'lines', name = 'Close Price'))

# Plot our SMA
if 'SMA_core' in goog_data.columns:
    fig.add_trace(go.Scatter(x = goog_data.index, y = goog_data['SMA_core'], mode = 'lines', name = 'SMA 50'))

# Plot our EMA
if 'EMA_core' in goog_data.columns:
    fig.add_trace(go.Scatter(x = goog_data.index, y = goog_data['EMA_core'], mode = 'lines', name = 'EMA 50'))

fig.update_layout(title = 'GOOG Closing Price with SMA and EMA', xaxis_title = 'Date', yaxis_title = 'Price', template = 'plotly_dark')
fig.show()

#### Let's take a look at this one, as it introduces some new concepts.  Only looking at the SMA and EMA lines for a second (the red and the green lines) if both of them are sloping upwards it can indicate a short-term uptrend in the price and and confirm a bullish momentum.  We can clearly see this in several positions on our plot, notably throughout most of 2020 -2022 and then again the start of 2023 through Aug 2024.  There are smaller examples but these two are the most obvious.  Conversely if these lines are together moving down then it can indicate a short-term downtrend or bearish momentum.

#### Now if the Closing Price is above our SMA and EMA lines then it indicates that the stock is trending above the recent average and shows suggested continued strength.  This can also be illustrated in the same periods noted above.

#### Looking at SMA vs EMA, if the SMA is *above* the EMA line this suggests that the the recent pricing is weaker and can indicate a price slow or momentum downshift.  Conversely if the EMA line is above the SMA line it can indicate a positive price shift and momentum upturn.

#### Now let's look at our RSI (Relative Strength Index) for our GOOG data.

In [7]:
# Converting to monthly data for a smoother plot
goog_monthly = goog_data.resample('ME').agg({
    'Close_core' : 'last',
    'RSI_core' : 'last'
})

fig = go.Figure()

# Plotting our RSI line
fig.add_trace(go.Scatter(x = goog_monthly.index, y = goog_monthly['RSI_core'], mode = 'lines', name = 'RSI'))

# Now adding lines for Overbought and Oversold at 0.7 and 0.3 respectively
fig.add_trace(go.Scatter(x = goog_monthly.index, y = [70]*len(goog_monthly), mode = 'lines', name = 'Overbought (70)', line = dict(dash = 'dash', color = 'red')))
fig.add_trace(go.Scatter(x = goog_monthly.index, y = [30]*len(goog_monthly), mode = 'lines', name = 'Oversold (30)', line = dict(dash = 'dash', color = 'green')))

fig.update_layout(title = 'GOOG Monthly RSI Over Time', xaxis_title = 'Date', yaxis_title = 'RSI', template = 'plotly_dark')

fig.show()

#### This plot above is very interesting as it can show us potential price shifts.  The red line for Overbought indicates that while the price has been strong it could be due for a reversal or a decrease.  You can see this happen multiple times over this plot with the sudden spike over the red line, then a quick dip back below.  This can take a while to happen, but eventually does.

#### Let's also look at the Oversold line.  Similarly to the Overbought line this line indicates when a positive shift in price is about to happen.  So when the RSI value dips below the green line the signs point toward an increase in pricing.  This can also be viewed in the plot numerous times, especially in Aug - Oct of 2022 and Feb 2023.

#### Now let's make use of some of the other features in our dataset and make a Candlestick Chart.

In [8]:
# We will again be using monthly sampling for interpretability
goog_monthly_candles = goog_data.resample('ME').agg({
    'Open_core' : 'first',
    'High_core' : 'max',
    'Low_core' : 'min',
    'Close_core' : 'last',
    'Volume_core' : 'sum',
    'SMA_core' : 'last',
    'EMA_core' : 'last'
})

# Let's start by compiling the features we need for this one.
fig = go.Figure(data = [go.Candlestick(x = goog_monthly_candles.index,
                open = goog_monthly_candles['Open_core'],
                high = goog_monthly_candles['High_core'],
                low = goog_monthly_candles['Low_core'],
                close = goog_monthly_candles['Close_core'],
                name = 'Candlesticks')])

# Now adding in the SMA again
fig.add_trace(go.Scatter(x = goog_monthly_candles.index, y = goog_monthly_candles['SMA_core'], mode = 'lines', name = 'SMA 50'))

# Adding in the EMA as well
fig.add_trace(go.Scatter(x = goog_monthly_candles.index, y = goog_monthly_candles['EMA_core'], mode = 'lines', name = 'EMA 50'))

fig.update_layout(title = 'GOOG Candlestick Chart with SMA and EMA', xaxis_title = 'Date', yaxis_title = 'Price', template = 'plotly_dark')

fig.show()

#### Candlestick charts are great at showing a lot of information.  The size of the candle shows the range of pricing in the given window, in our case a month.  The color (green for positive change, red for negative) will dictate how the final closing price was settled (final closing - beginning opening price for the total window, again a month here).  

#### You can then begin to see trends just by noticing the colors, although there are other parts of the candlestick too.  You can notice buyer/seller behavior by looking at successive green or red candlesticks.  If you see multiple long red candlesticks together it could mean that sellers are pushing prices lower.  This can be demonstrated in GOOG from Aug - Oct 2022.  Conversely successive green candlesticks can show buyer behavior pushing positive price changes.  This can be show in several places, especially from Oct 2020 - Aug 2021. 

#### Now let's look at a correlation heatmap to see which indicators are most closely related to price movements for our GOOG stock.

In [10]:
corr_matrix = goog_data[['Open_core', 'High_core', 'Low_core', 'Volume_core','Close_core', 'EMA_core', 'SMA_core', 'RSI_core', 'BBM_core', 'BBL_core', 'BBU_core', 'MACD_core', 'MACD_Signal_core', 'MACD_Hist_core', 'ADX_14_core', 'CCI_20_core', 'ATR_14_core', 'Stoch_K_core', 'Stoch_D_core', 'Momentum_1_core', 'Momentum_3_core', 'Momentum_7_core', 'Momentum_30_core', 'Momentum_50_core', 'OBV_core']].corr()

fig = px.imshow(corr_matrix, text_auto = True, aspect = 'auto', color_continuous_scale= 'Viridis')

fig.update_layout(title = 'Correlation Matrix of GOOG Features', template = 'plotly_dark')
fig.show()

#### For the above correlation plot we are looking at which features correlate the strongest with our Close price, as that is going to be our strongest driver for this project.  Looking at our colorbar a score of 1 is very strong, and this chart shows that our EMA_core and OBV_core have very strong correlations with our Close Price and would help us in further predicting further values.  Additively while our Volume and RSI_core provide useful information in other area they do not provide our Close Price any further value.  We will still look to try to implement this information though in our modeling if possible.

#### Now let's look at a distribution of daily returns for GOOG.

In [11]:
# Calculating the pct_change of the Close column
goog_data = goog_data.copy()
goog_data.loc[:, 'daily_return'] = goog_data['Close_core'].pct_change()

# Plotting this new feature, with n = 50 bins as a default
daily_returns = goog_data['daily_return'].dropna().copy()

# Calculating the KDE for the daily returns
kde = gaussian_kde(daily_returns)
x_vals = np.linspace(daily_returns.min(), daily_returns.max(), 1000)
kde_vals = kde(x_vals)

# Calculating the histogram first without plotting
hist_values, bin_edges = np.histogram(daily_returns, bins = 50)

# Calculate the bin width
manual_scaling_factor = max(hist_values) / max(kde_vals) * 1.2

# Plotting histogram of daily returns
fig = go.Figure(data = [go.Histogram(x = daily_returns, nbinsx=50, name = 'Histogram', marker_color = 'blue', opacity = 0.6)])

# Scaling for the KDE Curve is needed here
#scaling_factor = max(hist_values) / max(kde_vals)
kde_vals_scaled = kde_vals * manual_scaling_factor

# Plotting the KDE Curve
fig.add_trace(go.Scatter(x = x_vals, y = kde_vals_scaled, mode = 'lines', name = 'KDE', line = dict(color = 'red')))

fig.update_layout(title = 'Distribution of GOOG Daily Returns with KDE', xaxis_title = 'Daily Return', yaxis_title = 'Frequency', template = 'plotly_dark')
fig.show()

#### There are some key takeaways from our distribution chart.  The first one is that the distribution is mostly centered around 0, with very little spread from the center.  With this near symmetry it suggests that positive and negative are both likely and that there is no strong bias in the direction of the returns.  Also with it being a normal distribution it shows that our GOOG stock behaves in a predictable manner, where extreme returns are rare.  The minimal spread (distance away from 0) indicates low volatility as well.  

#### Let's now look at a Rolling Mean and Volatility plot.  We will use this to understand the stability of price movements over time, as it is helpful to identify periods of high uncertainty and/or strong trends in our pricing.

In [12]:
#### First let's create the rolling mean and rolling std needed for this plot.
#### We will keep the same window size as our SMA and EMA windows for consistency and to also help us as we view the long-term analysis.
goog_data = goog_data.copy()

goog_data.loc[:, 'Rolling_Mean'] = goog_data['Close_core'].rolling(window = 50).mean()
goog_data.loc[:, 'Rolling_Std'] = goog_data['Close_core'].rolling(window = 50).std()

fig = go.Figure()

# Plot the Rolling Mean on primary y-axis
fig.add_trace(go.Scatter(x = goog_data.index, y = goog_data['Rolling_Mean'], mode = 'lines', name = 'Rolling Mean'))

# Plot the Rolling Std (Volatility) on secondary y-axis
fig.add_trace(go.Scatter(x = goog_data.index, y = goog_data['Rolling_Std'], mode = 'lines', name = 'Rolling Std (Volatility)', line = dict(dash = 'dash'), yaxis = 'y2'))

fig.update_layout(title = 'GOOG Rolling Mean and Volatility',
                xaxis_title = 'Date',
                yaxis_title = 'Price',
                yaxis2 = dict(
                    title = 'Volatility (Rolling Std)',
                    overlaying = 'y',
                    side = 'right'
                ),
                template = 'plotly_dark'
)

fig.show()

#### In this plot above we can see again the consistent price increase over time in the blue line that is the Rolling Mean, albeit with varying inconsistency.  Our previous plots have shown this as well a bit this is just more reinforcement and a bit more demonstrative of that fact.  The red line in the Rolling Std is a bit more important as it displays Volatility.  Our window size of 50 days plays a part in this too as it will smooth out short-term volatility and provide a longer-term view of that price stability.  As we can see the price for GOOG does increase over time, however not without fluctuating periods of volatility and shifts in pricing.

#### We will now look at a few of the plots whose features correlated most closely with our target Price.  We will take these last few plots and look at some Bollinger Bands and On-Balance Volume for our GOOG data.

In [13]:
obv_trace = go.Scatter(x = goog_data.index, y = goog_data['OBV_core'], mode = 'lines', name = 'OBV')

close_trace = go.Scatter(x = goog_data.index, y = goog_data['Close_core'], mode = 'lines', name = 'Close Price', yaxis = 'y2')

layout = go.Layout(
    title = "On-Balance Volume (OBV) and Close Price for GOOG",
    xaxis = dict(title = 'Date'),
    yaxis = dict(title = 'OBV'),
    yaxis2 = dict(title = 'Close Price', overlaying = 'y', side = 'right'),
    legend = dict(x = 0.1, y = 0.9),
    template = 'plotly_dark'
)

fig = go.Figure(data = [obv_trace, close_trace], layout = layout)

fig.show()

#### OBV (On-Balance Volume) can provide valuable insights into the relationship between price movements and trading volume.  We can show the confirmation of the trend when the OBV line is moving in the same direction as the Price line (for the second half of the plot mostly) as it would suggest that the price trend is backed by the volume, indicating the momentum direction.
#### OBV often acts as a leading indicator and predictor of the Price, as you can see above in large portions of the plot it mirrors our Price movements and is just ahead (predicted) just before the actual time period.  For GOOG and the time period viewed the OBV actually does a great job overall at mirroring the Price.

#### For our last plot in our EDA notebook we will look at all 3 of the Bollinger Bands with a Close price overlay.  The Bollinger Bands were our other high correlator in our correlation matrix above.  Let's see what we can derive from it.

In [14]:
trace_bbu = go.Scatter(x = goog_data.index, y = goog_data['BBU_core'], mode = 'lines', name = 'Upper Bollinger Band', line = dict(width = 2,color = 'green'), opacity = 0.4)
trace_bbm = go.Scatter(x = goog_data.index, y = goog_data['BBM_core'], mode = 'lines', name = 'Middle Bollinger Band', line = dict(width = 2, color = 'orange'), opacity = 0.6)
trace_bbl = go.Scatter(x = goog_data.index, y = goog_data['BBL_core'], mode = 'lines', name = 'Lower Bollinger Band', line = dict(width = 2, color = 'red'), opacity = 0.4)
trace_close = go.Scatter(x = goog_data.index, y = goog_data['Close_core'], mode = 'lines', name = 'Close Price', line = dict(width = 3, color = 'blue'))

data = [trace_bbu, trace_bbm, trace_bbl, trace_close]

layout = go.Layout(title = "Bollinger Bands with Close Price for GOOG",
                xaxis = dict(title = "Date"),
                yaxis = dict(title = "Price"),
                template = 'plotly_dark')

fig = go.Figure(data = data, layout = layout)

fig.show()

#### The above Bollinger Bands plot with Closing Price combines a lot of features and information we have seen in this notebook to give a clear picture of the stock data and whether it is relatively high or low.  The Close price will oscillate between the Upper and Lower bands (Green and Red, respectively).  Now when the Close Price touches these bands and moves through it, this can indicate Overbought or Oversold conditions (much like our RSI plot above).
#### The width of the bands can also show volatility, where a wider band visual indicates increased volatility and conversely shortened or decreased band width can indicated less volatility for that time period.
#### What to look out for is when the Price does indeed break outside of the bands, as that is when something unique or unusual is happening.  In the above while the Price for GOOG touches the outside bands multiple times, GOOG's price only exceeds the bands a few times: Jan 21 - 28 of 2022, Nov 2 - 7 of 2022, and Apr 26 - 30 of 2024 to illustrate some of the occurrences.  

## Summary of Findings for GOOG
#### We have taken a good look at our GOOG stock covering the given time period.  Here is some of the key takeaways from our plots:

#### -GOOG Stock Price has shown to be a moderately consistent performer, although it has also shown to be prone to volatility.
#### -GOOG's Closing Price is strongly correlated with the SMA (Single Moving Average) and EMA (Exponential Moving Average), as well as the Bollinger Bands data.
#### -GOOG's price behaves in a relatively predictable manner, due to the OBV able consistently mirror the Close Price well over time.