# **Alphabet Inc. (GOOGL) Stock Historical Prices & Data**



 **Step-1:Import the necessary libraries.**

**Step-2: Data Preprocessing**

1-Download the data from Yahoo Finance.

2-Check for  missing  value

3-Resample the Data

4-Normalization and Scaling

5-Train-Test Split

6-Reshape Data for RNN/LSTM

**Step-3: Build Model**

1st:Simple RNN Model

2nd: LSTM Model

3rd: ARIMA Model

4th: SARIMA Model

**Step-4: Prediction and RMSE**




# **Step-1: Import the necessary libraries.**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import yfinance as yf
import math
import statsmodels.api as sm
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense, LSTM,SimpleRNN
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam
from statsmodels.tsa.arima.model import ARIMA
from sklearn.model_selection import TimeSeriesSplit


# **Step-2: Data Preprocessing**

**1- Download the data from Yahoo Finance.**

In [None]:
start = dt.datetime(2015,1,1)
end = dt.datetime(2020,1,1)


df = yf.download("GOOGL",start,end)

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 5)
df

**2-Check for  missing  value**

In [None]:
missing_values = df.isnull().sum().sum()
print("Missing values total count per column:\n", missing_values)
print("\nDataFrame info:")
df.info()


**3-Resample the Data**

In [None]:
weekly_df = df.resample('W').mean()
monthly_df = df.resample('M').mean()

weekly_df.to_csv('weekly_df.csv')
monthly_df.to_csv('monthly_df.csv')

print("Weekly Data:")
print(weekly_df.head())

print("\nMonthly Data:")
print(monthly_df.head())

**4-Normalization and Scaling**

In [None]:
from sklearn.preprocessing import MinMaxScaler


scaler = MinMaxScaler()
scaled_df = scaler.fit_transform(monthly_df[["Close"]])


scaled_df = pd.DataFrame(scaled_df,index=monthly_df.index, columns=["Close"])

print("Scaled Data:")
print(scaled_df.head())
print("Lenth of scaled data")
print(len(scaled_df))

**5-Train-Test Split**

In [None]:
train_size = int(len(scaled_df) * 0.8)
train_df = scaled_df[:train_size]
test_df = scaled_df[train_size:]


In [None]:
print("Train Data:")
print(train_df)
print("\nTest Data:")
print(test_df)

**6-Reshape Data for RNN/LSTM**

In [None]:
len(train_df)

In [None]:
X_train = []
y_train = []
for i in range(6, len(train_df)):
    X_train.append(train_df.iloc[i-6:i])
    y_train.append(train_df.iloc[i])

for idx, (x, y) in enumerate(zip(X_train, y_train)):
    print(f"Sample {idx + 1}:")
    print(f"X_train:\n{x}\n")
    print(f"y_train:\n{y}\n")

In [None]:
len(test_df)

In [None]:
X_test = []
y_test = []
for i in range(6, len(test_df)):
    X_test.append(test_df.iloc[i-6:i])
    y_test.append(test_df.iloc[i])

for idx, (x, y) in enumerate(zip(X_test, y_test)):
    print(f"Sample {idx + 1}:")
    print(f"X_test:\n{x}\n")
    print(f"y_test:\n{y}\n")

In [None]:
X_train, y_train = np.array(X_train), np.array(y_train)
X_test, y_test = np.array(X_test), np.array(y_test)
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)


In [None]:
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
y_train = y_train.reshape(y_train.shape[0], 1)
y_test = y_test.reshape(y_test.shape[0], 1)

# **Step-3: Build Model**

**1st:Simple RNN Model**

In [None]:
from keras.layers import Dropout

rnn_model = Sequential()
rnn_model.add(SimpleRNN(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
rnn_model.add(Dropout(0.2))
rnn_model.add(SimpleRNN(units=50))
rnn_model.add(Dropout(0.2))
rnn_model.add(Dense(units=1))
rnn_model.compile(optimizer='adam', loss='mean_squared_error')
rnn_model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1, callbacks=EarlyStopping(monitor='loss', patience=10, verbose=1))


**2nd: LSTM Model**

In [None]:
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(units=1))
lstm_model.compile(optimizer='adam', loss='mean_squared_error')
lstm_model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1, callbacks=EarlyStopping(monitor='loss', patience=10, verbose=1))

**3rd: ARIMA Model**

In [None]:
p, d, q = 1, 1, 1

arima_model = ARIMA(train_df['Close'], order=(p, d, q))
arima_model_fit = arima_model.fit(method_kwargs={"maxiter": 500})


arima_pred = arima_model_fit.forecast(steps=len(test_df))

print("ARIMA Predictions:")
print(arima_pred)

**4th: SARIMA Model**

In [None]:

sarima_model = sm.tsa.statespace.SARIMAX(train_df['Close'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
sarima_model_fit = sarima_model.fit(method_kwargs={"maxiter": 500})

sarima_pred = sarima_model_fit.forecast(steps=len(test_df))

print("SARIMA Predictions:")
print(sarima_pred)


# **Step-4: Prediction and RMSE**

**Function to calculate RMSE**

In [None]:
def calculate_rmse(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred))

 **RNN Predictions on Training Data and on Test Data**

In [None]:
y_train_rnn_pred = rnn_model.predict(X_train)
rnn_train_rmse = calculate_rmse(y_train, y_train_rnn_pred)


rnn_test_pred = rnn_model.predict(X_test)
rnn_test_rmse = calculate_rmse(y_test, rnn_test_pred)

print("RNN Training RMSE:", rnn_train_rmse)
print("RNN Test RMSE:", rnn_test_rmse)

**LSTM Predictions on Training Data  and on Test Data**

In [None]:
y_train_lstm_pred = lstm_model.predict(X_train)
lstm_train_rmse = calculate_rmse(y_train, y_train_lstm_pred)

lstm_test_pred = lstm_model.predict(X_test)
lstm_test_rmse = calculate_rmse(y_test, lstm_test_pred)

print("LSTM Training RMSE:", lstm_train_rmse)
print("LSTM Test RMSE:", lstm_test_rmse)

**ARIMA Predictions on Training Data and on Test Data**

In [None]:
y_train_arima_pred = arima_model_fit.predict(start=0, end=len(train_df)-1)
arima_train_rmse = calculate_rmse(train_df['Close'], y_train_arima_pred)


y_test_arima_pred = arima_model_fit.forecast(steps=len(test_df))
arima_test_rmse = calculate_rmse(test_df['Close'], y_test_arima_pred)

print("ARIMA Training RMSE:", arima_train_rmse)
print("ARIMA Test RMSE:", arima_test_rmse)

**SARIMA Predictions on Training Data and on Test Data**

In [None]:
y_train_sarima_pred = sarima_model_fit.predict(start=0, end=len(train_df)-1)
sarima_train_rmse = calculate_rmse(train_df['Close'], y_train_sarima_pred)

y_test_sarima_pred = sarima_model_fit.forecast(steps=len(test_df))
sarima_test_rmse = calculate_rmse(test_df['Close'], y_test_sarima_pred)

print("SARIMA Training RMSE:", sarima_train_rmse)
print("SARIMA Test RMSE:", sarima_test_rmse)

**Summarizing results**

In [None]:
print("\nSummary of RMSE:")
print("RNN Training RMSE:", rnn_train_rmse, "RNN Test RMSE:", rnn_test_rmse)
print("LSTM Training RMSE:", lstm_train_rmse, "LSTM Test RMSE:", lstm_test_rmse)
print("ARIMA Training RMSE:", arima_train_rmse, "ARIMA Test RMSE:", arima_test_rmse)
print("SARIMA Training RMSE:", sarima_train_rmse, "SARIMA Test RMSE:", sarima_test_rmse)

# **Best Model: RNN**
**Performance:**
1. Training RMSE: 0.0659
2. Test RMSE: 0.0955
**Key Points:**
**Generalization:** The RNN model has the lowest test RMSE among all the models, indicating that it generalizes well to unseen data.

**Balanced Performance:** The RNN's training and test RMSE values are relatively close, suggesting it doesn't overfit the training data.