<a href="https://www.kaggle.com/code/rautaishwarya/googl-stock-prediction-using-prophet?scriptVersionId=138466611" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Predicting GOOGL Stock Prices with Facebook Prophet

## Introduction

Forecasting stock prices is a fundamental task in finance, playing a crucial role in investment decision-making and risk management. In this notebook, we delve into the exciting world of time series forecasting and focus on predicting the future stock prices of Alphabet Inc.'s (GOOGL) using the powerful Facebook Prophet model.

## Dataset and Preprocessing

The historical stock prices of GOOGL and other selected stocks were sourced from the Kaggle dataset [S&P 100 Stock Prices Forecast](https://www.kaggle.com/tarunpaparaju/all-s-and-p100-open-price-stocks-forecast). After loading the dataset, we embark on data preprocessing to handle missing values and convert the "Date" column to a datetime format, a crucial step for time series forecasting.

## Exploratory Data Analysis (EDA)

Before we dive into building the forecasting model, we conduct an exploratory data analysis (EDA) to gain insights into the historical stock price trends of GOOGL and other selected stocks. Through visualization, we aim to identify any patterns, seasonality, or trends present in the data, which will guide our model selection and parameter tuning.

## Facebook Prophet Model

To tackle the stock price prediction task, we employ the widely acclaimed Facebook Prophet model. Prophet is a user-friendly and robust forecasting tool that handles time series data with strong seasonal patterns and trends. Its intuitive parameter tuning and inherent ability to handle missing values make it an excellent choice for our prediction task.

## Model Training and Evaluation

The dataset is split into training and testing sets, with the training data used to fit the Prophet model. We evaluate the model's performance on the testing data, using metrics such as Mean Squared Error (MSE) and R-squared (R2) to gauge its accuracy in predicting stock prices.

## GOOGL Stock Price Forecast

After training the Prophet model on the historical stock prices, we harness its forecasting capabilities to predict the future prices of GOOGL stock. The model's forecasted results are then compared with the actual prices to visualize its predictive capabilities and unveil potential trends.

In [None]:
pip install -q ptitprince

In [None]:
pip install summarytools

# Importing Libraries

In [None]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import ptitprince as pt
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt

from prophet import Prophet
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
df=pd.read_csv("/kaggle/input/all-s-and-p100-open-price-stocks-forecast/sep100.csv")
df.head()

We convert the "Date" column from its original string format to datetime format using the pd.to_datetime() function.

In [None]:
df["Date"] = pd.to_datetime(df["Date"])

The code `dfSummary(df)` uses the `dfSummary` function from the `summarytools` library to generate a summary of the DataFrame `df`. The `dfSummary` function provides a comprehensive summary of various statistics and characteristics of the DataFrame.

The summary typically includes the following information:

- Number of rows and columns in the DataFrame.
- Data type of each column.
- Count of non-missing values in each column.
- Count of missing values in each column.
- Basic descriptive statistics for numeric columns (mean, median, standard deviation, min, max, quartiles, etc.).
- Count of unique values for categorical columns.
- Frequency of top values in each column.

The `dfSummary` function is a handy tool for quick data exploration and getting an overview of the data's structure and content. It can be particularly useful when working with large datasets or when you need to understand the data's distribution and summary statistics at a glance.

In [None]:
dfSummary(df)

The code calculates the correlation coefficient between the "GOOGL" stock and other features in the dataset. It then identifies features that have a correlation greater than or equal to 0.97 with the "GOOGL" stock. These highly correlated features may have a strong linear relationship with the "GOOGL" stock and could be important factors for further analysis or modeling.

In [None]:
correlation=df.corr()
correlated_feature=correlation[abs(correlation["GOOGL"])>=0.97].index.tolist()

In [None]:
plt.figure(figsize=(10,6))
sns.heatmap(df[correlated_feature].corr(),annot=True,cmap="mako")
plt.show()

Created new dataframe with most correlated stocks.

In [None]:
df=df[["Date","AAPL","DHR","GOOGL","GOOG","TMO"]]
df

In [None]:
# For missing value
df.isnull().sum()

In [None]:
# Drop missing value
df=df.dropna()

In [None]:
for i in df.columns:
    if i != "Date":
        sns.displot(data=df, x=i, kind="kde")
        plt.title("Displot")
        plt.show()

In [None]:
plt.figure(figsize=(10,6))
pt.RainCloud(data=df,orient="h",jitter=1)
plt.show()

In [None]:
for i in df.columns:
    if i != "Date":
        plt.figure(figsize=(10, 2))
        plt.plot(df["Date"], df[i])
        plt.xlabel("Date")
        plt.ylabel(i)
        plt.title(f"{i} over time")
        plt.show()

## Moving Average

In [None]:
df["AAPL_15"]=df["AAPL"].rolling(15).mean()
df["AAPL_30"]=df["AAPL"].rolling(30).mean()
df["AAPL_45"]=df["AAPL"].rolling(45).mean()

df["DHR_15"]=df["DHR"].rolling(15).mean()
df["DHR_30"]=df["DHR"].rolling(30).mean()
df["DHR_45"]=df["DHR"].rolling(45).mean()

df["GOOGL_15"]=df["GOOGL"].rolling(15).mean()
df["GOOGL_30"]=df["GOOGL"].rolling(30).mean()
df["GOOGL_45"]=df["GOOGL"].rolling(45).mean()

df["GOOG_15"]=df["GOOG"].rolling(15).mean()
df["GOOG_30"]=df["GOOG"].rolling(30).mean()
df["GOOG_45"]=df["GOOG"].rolling(45).mean()

df["TMO_15"]=df["TMO"].rolling(15).mean()
df["TMO_30"]=df["TMO"].rolling(30).mean()
df["TMO_45"]=df["TMO"].rolling(45).mean()

In [None]:
fig, axes = plt.subplots(5, sharex=True, sharey=True)
fig.set_figheight(15)
fig.set_figwidth(15)

df[['AAPL', 'AAPL_15', 'AAPL_30', 'AAPL_45']].plot(ax=axes[0])
axes[0].set_title('AAPL')

df[['DHR', 'DHR_15', 'DHR_30', 'DHR_45']].plot(ax=axes[1])
axes[1].set_title('DHR')

df[['GOOGL', 'GOOGL_15', 'GOOGL_30', 'GOOGL_45']].plot(ax=axes[2])
axes[2].set_title('GOOGL')

df[['GOOG', 'GOOG_15', 'GOOG_30', 'GOOG_45']].plot(ax=axes[3])
axes[3].set_title('GOOG')

df[['TMO', 'TMO_15', 'TMO_30', 'TMO_45']].plot(ax=axes[4])
axes[4].set_title('TMO')

fig.tight_layout()
plt.show()

## Daily Returns

In [None]:
df["AAPL_daily_return"] = df["AAPL"].pct_change()
df["DHR_daily_return"] = df["DHR"].pct_change()
df["GOOGL_daily_return"] = df["GOOGL"].pct_change()
df["GOOG_daily_return"] = df["GOOG"].pct_change()
df["TMO_daily_return"] = df["TMO"].pct_change()

In [None]:
fig,(axes)= plt.subplots(5,sharex=True,sharey=True)
fig.set_figheight(10)
fig.set_figwidth(15)


df["AAPL_daily_return"].plot(ax=axes[0],legend=True,linestyle="--",marker="o")
axes[0].set_title("AAPL_Daily_Return")
df["DHR_daily_return"].plot(ax=axes[1],legend=True,linestyle="--",marker="o")
axes[1].set_title("DHR_daily_return")
df["GOOGL_daily_return"].plot(ax=axes[2],legend=True,linestyle="--",marker="o")
axes[2].set_title("GOOGL_daily_return")
df["GOOG_daily_return"].plot(ax=axes[3],legend=True,linestyle="--",marker="o")
axes[3].set_title("GOOG_daily_return")
df["TMO_daily_return"].plot(ax=axes[4],legend=True,linestyle="--",marker="o")
axes[4].set_title("TMO_daily_return")

fig.tight_layout()

In [None]:
dailyreturns=['AAPL_daily_return', 'DHR_daily_return', 'GOOGL_daily_return',
       'GOOG_daily_return', 'TMO_daily_return']
plt.figure(figsize=(10,10))

for i,dailyreturn in enumerate(dailyreturns,1):
    plt.subplot(3,2,i)
    df[dailyreturn].hist(bins=50)
    plt.xlabel("Daily Return")
    plt.ylabel("Count")
    plt.title(f"{dailyreturn}")
    

In [None]:
dailyreturn=df[['AAPL_daily_return', 'DHR_daily_return', 'GOOGL_daily_return',
       'GOOG_daily_return', 'TMO_daily_return']]
dailyreturn

In [None]:
sns.jointplot(y="AAPL_daily_return",x="GOOGL_daily_return",data=dailyreturn,color="seagreen")

In [None]:
sns.pairplot(dailyreturn)

In [None]:
plt.figure(figsize=(6,4))
sns.heatmap(dailyreturn.corr(),annot=True,cmap="summer")
plt.title("Correlation between Daily Returns")

## volitile stock

In [None]:
returns=df[['AAPL', 'DHR', 'GOOGL', 'GOOG', 'TMO']]

returns["AAPL_return"]=(returns["AAPL"]/returns["AAPL"].shift(1))-1
returns["DHR_return"]=(returns["DHR"]/returns["DHR"].shift(1))-1
returns["GOOGL_return"]=(returns["GOOGL"]/returns["GOOGL"].shift(1))-1
returns["GOOG_return"]=(returns["GOOG"]/returns["GOOG"].shift(1))-1
returns["TMO_return"]=(returns["TMO"]/returns["TMO"].shift(1))-1

returns["AAPL_return"].hist(label="AAPL",bins=100,alpha=1,figsize=(12,6))
returns["DHR_return"].hist(label="DHR",bins=100,color="g",alpha=0.8)
returns["GOOGL_return"].hist(label="GOOGL",bins=100,color="r",alpha=0.6)
returns["GOOG_return"].hist(label="GOOG",bins=100,color="y",alpha=0.4)
returns["TMO_return"].hist(label="TMO",bins=100,color="purple",alpha=0.2)
plt.xlim(-0.10,0.10)
plt.legend()
plt.show()

# Model Building

In [None]:
df_new = df[["Date", "GOOGL"]] 
df_new

In [None]:
df_new[["ds","y"]]= df_new[["Date", "GOOGL"]] 

train, test= np.split(df_new, [int(.8 *len(df_new))])
print(f'Training data size : {train.shape}')
print(f'Testing data size : {test.shape}')

In [None]:
model=Prophet(growth="linear",daily_seasonality=True,seasonality_mode="multiplicative")
model.fit(test)
y_actual=test["y"]
prediction=model.predict(pd.DataFrame({"ds":test["ds"]}))
y_predicted=prediction["yhat"]
y_predicted=y_predicted.astype(int)
print(mean_squared_error(y_actual,y_predicted))


r2_score(y_actual,y_predicted)
print("Accuracy of the model is :",100*r2_score(y_actual,y_predicted))

plt.plot(test['ds'], y_predicted, 'k')
plt.plot(test['ds'], y_actual, 'b')
plt.xlabel("Year")
plt.ylabel("Closing Price")
plt.title("Closing Price: Predicted vs Actual")
plt.show()

In [None]:
model = Prophet()                                                               
model.fit(df_new)                                              
future = model.make_future_dataframe(30)
forecast = model.predict(future)  
forecast[["ds","yhat","yhat_lower","yhat_upper"]].tail(30)

In [None]:
model.plot(forecast)
plt.show()  

## Conclusion

Stock price forecasting is a challenging and dynamic task, and with the aid of the Facebook Prophet model, we endeavor to provide valuable insights into the future price movements of GOOGL stock. Armed with intuitive functionality and performance metrics, this notebook aims to empower investors and financial analysts to make well-informed decisions based on the predicted stock prices.

# Thank You