<a href="https://colab.research.google.com/github/ai-ivision/Meta-Stock-Price-Data-Prediction/blob/main/Meta_Stock_Price_Data_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Meta Stock Price Data Prediction**


### Meta Platforms Stock Prices (Oct 28, 2021 - May 7, 2024)
This dataset contains daily stock price data for Meta Platforms (formerly Facebook) from October 28, 2021, to May 7, 2024. The data was collected from Yahoo Finance

### Columns:
- Date: Date (DD/MM/YYYY)
- Open: The opening price of the stock on that day
- High: The highest price of the stock on that day
- Low: The lowest price of the stock on that day
- Close: The closing price of the stock on that day
- Adj Close: Adjusted closing price of the stock on that day (adjusted for stock splits)
- Volume: Number of shares traded on that day


### Acknowledgements

We would like to extend our deepest gratitude to all the contributors and curators of the dataset that made this project possible. The availability of high-quality datasets is crucial for the development and benchmarking of machine learning models, and your hard work is greatly appreciated.

### Source

This dataset was sourced from [Kaggle](https://www.kaggle.com). We acknowledge and thank the original authors for making this dataset publicly available. The dataset can be accessed at the following link:

[Kaggle Dataset Link](https://www.kaggle.com/datasets/saadatkhalid/meta-platforms-stock-price-data/data)

### Ignore Warnings

In [35]:
import warnings

warnings.filterwarnings('ignore')

### Downloading Dataset Directly From Kaggle

In [36]:
!pip install kaggle



In [37]:
!mkdir ~/.kaggle

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [38]:
!cp kaggle.json ~/.kaggle

In [39]:
!chmod 600 ~/.kaggle/kaggle.json

In [40]:
!kaggle datasets download -d saadatkhalid/meta-platforms-stock-price-data

Dataset URL: https://www.kaggle.com/datasets/saadatkhalid/meta-platforms-stock-price-data
License(s): CC0-1.0
meta-platforms-stock-price-data.zip: Skipping, found more recently modified local copy (use --force to force download)


In [41]:
!unzip meta-platforms-stock-price-data.zip

Archive:  meta-platforms-stock-price-data.zip
replace META.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

### Import Libraries for Data Loading, Reading and Visualization

In [42]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

### Loading and Studying the Dataset

In [43]:
df = pd.read_csv('META.csv')

In [44]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,28/10/2021,312.98999,325.519989,308.109985,316.920013,316.584106,50806800
1,29/10/2021,320.190002,326.0,319.600006,323.570007,323.227051,37059400
2,01/11/2021,326.040009,333.450012,326.0,329.980011,329.63028,31518900
3,02/11/2021,331.380005,334.790009,323.799988,328.079987,327.732269,28353000
4,03/11/2021,327.48999,332.149994,323.200012,331.619995,331.268524,20786500


In [45]:
df.shape

(633, 7)

In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 633 entries, 0 to 632
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       633 non-null    object 
 1   Open       633 non-null    float64
 2   High       633 non-null    float64
 3   Low        633 non-null    float64
 4   Close      633 non-null    float64
 5   Adj Close  633 non-null    float64
 6   Volume     633 non-null    int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 34.7+ KB


In [47]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Open,633.0,261.0611,107.4414,90.08,172.75,238.45,326.2,529.28
High,633.0,265.2208,108.4067,90.46,176.49,241.69,332.33,531.49
Low,633.0,257.388,106.2366,88.09,171.43,235.52,322.72,518.89
Close,633.0,261.2939,107.3565,88.91,173.42,238.56,327.64,527.34
Adj Close,633.0,261.0604,107.3358,88.81576,173.2362,238.3071,327.2928,527.34
Volume,633.0,27713920.0,19004350.0,5467500.0,17724300.0,23066000.0,31423900.0,232316600.0


In [48]:
df.isnull().sum().sum()

0

In [49]:
df.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [50]:
df.dtypes

Date          object
Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

### Data Visualization with *plotly*

In [51]:
fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=df['Date'], y=df['Close'], mode='lines',
               name='Closing Price', line=dict(color='orange'))
)

fig.update_layout(
    title='Stock Price Over Time',
    xaxis_title='Date',
    yaxis_title='Closing Price',
    showlegend=True
)

fig.show()

In [52]:
fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=df['Date'], y=df['Volume'], mode='lines',
               name='Volume', line=dict(color='red'))
)

fig.update_layout(
    title='Volume Over Time',
    xaxis_title='Date',
    yaxis_title='Volume',
    showlegend=True
)

fig.show()

In [53]:
figure = go.Figure(data=[go.Candlestick(
    x=df['Date'],
    open=df['Open'],
    high=df['High'],
    low=df['Low'],
    close=df['Close']
)])

figure.update_layout(
    title="Price Analysis Over Time",
)

figure.show()

### Feature Engineering

In [54]:
# Converting the date object col to pd datetime
df['Date'] = pd.to_datetime(df['Date'])

In [55]:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

df.drop(columns=['Date'], inplace=True)

In [56]:
df.head()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day
0,312.98999,325.519989,308.109985,316.920013,316.584106,50806800,2021,10,28
1,320.190002,326.0,319.600006,323.570007,323.227051,37059400,2021,10,29
2,326.040009,333.450012,326.0,329.980011,329.63028,31518900,2021,11,1
3,331.380005,334.790009,323.799988,328.079987,327.732269,28353000,2021,11,2
4,327.48999,332.149994,323.200012,331.619995,331.268524,20786500,2021,11,3


In [57]:
df.shape

(633, 9)

In [59]:
yearly_stock_price = df.groupby('Year')[['Open', 'Close']].mean().reset_index()

fig = px.bar(yearly_stock_price, x='Year', y=['Open', 'Close'],
             barmode='group', title="Yrarly Comparison of Price Open and Close")

fig.show()

### Importing Libraries for Data Splitting and Model Building

In [61]:
!pip install tensorflow



In [65]:
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import Adam

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score



> Splitting the data



In [62]:
X = df.drop('Close', axis=1)
y = df['Close']

In [66]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [67]:
print(f"The Shape of X_train  : {X_train.shape}")
print(f"The Shape of X_test  : {X_test.shape}")
print(f"The Shape of y_train  : {y_train.shape}")
print(f"The Shape of y_test  : {y_test.shape}")

The Shape of X_train  : (506, 8)
The Shape of X_test  : (127, 8)
The Shape of y_train  : (506,)
The Shape of y_test  : (127,)


In [113]:
results = []



> Nural Network



In [68]:
model = Sequential(
    [
        Dense(units=16, input_dim=8, activation='relu'),
        Dense(units=24, activation='relu'),
        Dropout(0.5), # To avoid Overfitting
        Dense(units=20, activation='relu'),
        Dense(units=24, activation='relu'),
        Dense(units=1, activation='sigmoid')
    ]
)

In [69]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 16)                144       
                                                                 
 dense_1 (Dense)             (None, 24)                408       
                                                                 
 dropout (Dropout)           (None, 24)                0         
                                                                 
 dense_2 (Dense)             (None, 20)                500       
                                                                 
 dense_3 (Dense)             (None, 24)                504       
                                                                 
 dense_4 (Dense)             (None, 1)                 25        
                                                                 
Total params: 1581 (6.18 KB)
Trainable params: 1581 (6.1

In [83]:
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='mean_squared_error',
)

model.fit(X_train, y_train, batch_size=15, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7bb677a833a0>

In [114]:
y_pred = model.predict(X_test)



In [115]:
dnn_mse = mean_squared_error(y_test, y_pred)
dnn_mae = mean_absolute_error(y_test, y_pred)
dnn_rmse = np.sqrt(dnn_mse)
dnn_r2 = r2_score(y_test, y_pred)


print(f'Mean Squared Error on Test Set: {dnn_mse:.4f}')
print(f'Mean Absolute Error on Test Set: {dnn_mae:.4f}')
print(f'Root Mean Squared Error on Test Set: {dnn_rmse:.4f}')
print(f"R2 Score on Test Set :  {dnn_r2:.4f}")

Mean Squared Error on Test Set: 70574.2433
Mean Absolute Error on Test Set: 242.3957
Root Mean Squared Error on Test Set: 265.6581
R2 Score on Test Set :  -4.9715


In [116]:
results.append(
    {
        'Model': 'DNN',
        'MSE': dnn_mse,
        'MAE': dnn_mae,
        'RMSE': dnn_rmse,
        'R2': dnn_r2
    }
)



> LinearRegression



In [74]:
lr = LinearRegression()

In [87]:
lr.fit(X_train, y_train)

In [117]:
y_pred = lr.predict(X_test)

In [118]:
lr_mse = mean_squared_error(y_test, y_pred)
lr_mae = mean_absolute_error(y_test, y_pred)
lr_rmse = np.sqrt(lr_mse)
lr_r2 = r2_score(y_test, y_pred)


print(f'Mean Squared Error on Test Set: {lr_mse:.4f}')
print(f'Mean Absolute Error on Test Set: {lr_mae:.4f}')
print(f'Root Mean Squared Error on Test Set: {lr_rmse:.4f}')
print(f"R2 Score on Test Set :  {lr_r2:.4f}")

Mean Squared Error on Test Set: 0.0119
Mean Absolute Error on Test Set: 0.0766
Root Mean Squared Error on Test Set: 0.1092
R2 Score on Test Set :  1.0000


In [119]:
results.append(
    {
        'Model': 'LinearRegression',
        'MSE': lr_mse,
        'MAE': lr_mae,
        'RMSE': lr_rmse,
        'R2': lr_r2
    }
)



> RandomForest



In [78]:
rf = RandomForestRegressor()

In [94]:
rf.fit(X_train, y_train)

In [120]:
y_pred = rf.predict(X_test)

In [121]:
rf_mse = mean_squared_error(y_test, y_pred)
rf_mae = mean_absolute_error(y_test, y_pred)
rf_rmse = np.sqrt(rf_mse)
rf_r2 = r2_score(y_test, y_pred)


print(f'Mean Squared Error on Test Set: {rf_mse:.4f}')
print(f'Mean Absolute Error on Test Set: {rf_mae:.4f}')
print(f'Root Mean Squared Error on Test Set: {rf_rmse:.4f}')
print(f"R2 Score on Test Set :  {rf_r2:.4f}")

Mean Squared Error on Test Set: 2.2395
Mean Absolute Error on Test Set: 0.8809
Root Mean Squared Error on Test Set: 1.4965
R2 Score on Test Set :  0.9998


In [122]:
results.append(
    {
        'Model': 'RandomForestRegressor',
        'MSE': rf_mse,
        'MAE': rf_mae,
        'RMSE': rf_rmse,
        'R2': rf_r2
    }
)

### Model Performance Metrics

In [123]:
results

[{'Model': 'DNN',
  'MSE': 70574.24331227689,
  'MAE': 242.395748503937,
  'RMSE': 265.6581324038037,
  'R2': -4.971483526844024},
 {'Model': 'LinearRegression',
  'MSE': 0.011925979060590942,
  'MAE': 0.07657978387433403,
  'RMSE': 0.10920613105769722,
  'R2': 0.9999989909096554},
 {'Model': 'RandomForestRegressor',
  'MSE': 2.2394693794043192,
  'MAE': 0.8808980479527937,
  'RMSE': 1.4964856763111096,
  'R2': 0.999810512250924}]

In [124]:
results_df = pd.DataFrame(results)

In [125]:
results_df

Unnamed: 0,Model,MSE,MAE,RMSE,R2
0,DNN,70574.243312,242.395749,265.658132,-4.971484
1,LinearRegression,0.011926,0.07658,0.109206,0.999999
2,RandomForestRegressor,2.239469,0.880898,1.496486,0.999811




>



### Summary

1. Deep Neural Network (DNN)
- Mean Squared Error (MSE): 70574.243
- Mean Absolute Error (MAE): 242.396
- Root Mean Squared Error (RMSE): 265.658
- R-squared (R²): -4.971

The DNN model performed poorly compared to the other models, with a high MSE, MAE, and RMSE. The negative R² value indicates that the model is not a good fit for the data, performing worse than a horizontal line through the mean of the target variable.

2. Linear Regression
- Mean Squared Error (MSE): 0.0119
- Mean Absolute Error (MAE): 0.0766
- Root Mean Squared Error (RMSE): 0.1092
- R-squared (R²): 0.999999

The Linear Regression model achieved excellent performance metrics. The MSE, MAE, and RMSE values are very low, and the R² value is almost 1, indicating that the model explains nearly all the variability of the target variable.

3. Random Forest Regressor
- Mean Squared Error (MSE): 2.2395
- Mean Absolute Error (MAE): 0.8809
- Root Mean Squared Error (RMSE): 1.4965
- R-squared (R²): 0.999811

The Random Forest Regressor also performed well, with low MSE, MAE, and RMSE values. The R² value is very close to 1, suggesting that the model captures most of the variability in the data, though not as perfectly as the Linear Regression model.

### Conclusion

Among the models evaluated, the Linear Regression model achieved the best performance metrics, indicating its strong predictive power and suitability for the given dataset. The Random Forest Regressor also performed well, while the DNN model did not perform adequately in this context.By the further tuning of the DNN model or a different architecture we might improve its performance.