## **Problem Statement**
Saudi Aramco is one of the largest publicly traded companies, and its stock price is influenced by various factors such as market trends, oil prices, economic policies, and investor sentiment. Predicting stock prices accurately is crucial for investors, financial analysts, and stakeholders to make informed decisions. However, stock prices are highly volatile, making accurate prediction a complex challenge.

This project aims to develop a machine learning model to forecast Saudi Aramco’s stock closing prices based on historical data and various technical indicators. The goal is to identify meaningful patterns and trends that can help predict future stock prices, improving investment strategies.

## **Project Goals & Objectives**
**1 - Analyze Stock Price Trends**
- Examine historical Saudi Aramco stock data to identify key patterns.
- Study the impact of trading volume, market trends, and technical indicators on stock prices.

**2 - Build a Predictive Regression**
- Use AI and statistical regression techniques to forecast future stock prices.
- Train the model on historical data to improve prediction accuracy.

**3 - Support Data-Driven Investment Decisions**
- Provide investors and analysts with insights to make informed decisions.
- Reduce uncertainty by predicting potential stock price movements.

**4 - Ensure Model Accuracy & Reliability**
- Validate the model using real-world data and key performance metrics.
- Measure accuracy using RMSE, MAE, and R² scores.

**5 - Make Predictions Accessible & Understandable**
- Visualize predictions using charts, graphs, and trend reports.
- Optionally, develop a simple dashboard for interactive forecasting.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV, KFold
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import pickle as pkl
import warnings
warnings.filterwarnings('ignore')

#### Loading the dataset

In [4]:
df = pd.read_csv('aramco_stock_price_dataset.csv', encoding='utf-8')

#### Understand the data 

In [5]:
df.head(3)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits,Close_diff,Lag_Close,...,BB_Middle_Band,BB_Upper_Band,BB_Lower_Band,Change_Close,Change_Volume,Weekday,Month,Year,Quarter,Volume_Normalized
0,2019-12-11 00:00:00+03:00,25.485229,25.485229,25.485229,25.485229,38289394,0.0,0.0,,,...,,,,,,2,12,2019,4,1.580604
1,2019-12-12 00:00:00+03:00,28.019275,28.019275,26.064442,26.643652,505692621,0.0,0.0,1.158422,25.485229,...,,,,1.158422,467403227.0,3,12,2019,4,27.146985
2,2019-12-15 00:00:00+03:00,26.860858,27.150462,26.643654,27.07806,98349281,0.0,0.0,0.434408,26.643652,...,,,,0.434408,-407343340.0,6,12,2019,4,4.865806


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1095 entries, 0 to 1094
Data columns (total 28 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Date               1095 non-null   object 
 1   Open               1095 non-null   float64
 2   High               1095 non-null   float64
 3   Low                1095 non-null   float64
 4   Close              1095 non-null   float64
 5   Volume             1095 non-null   int64  
 6   Dividends          1095 non-null   float64
 7   Stock Splits       1095 non-null   float64
 8   Close_diff         1094 non-null   float64
 9   Lag_Close          1094 non-null   float64
 10  Lag_High           1094 non-null   float64
 11  Lag_Low            1094 non-null   float64
 12  Rolling_Mean_7     1089 non-null   float64
 13  Rolling_Std_7      1089 non-null   float64
 14  Rolling_Mean_30    1066 non-null   float64
 15  Rolling_Std_30     1066 non-null   float64
 16  RSI                1082 

In [8]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Open,1095.0,28.85776,3.472621,19.54834,26.54623,28.43185,31.84231,36.4111
High,1095.0,29.03354,3.503517,20.48955,26.65929,28.56368,32.03948,37.13932
Low,1095.0,28.67893,3.421617,19.54834,26.43623,28.21213,31.69444,36.15408
Close,1095.0,28.86365,3.458323,20.12754,26.54817,28.47579,31.84231,36.4111
Volume,1095.0,9392873.0,18290300.0,0.0,3777274.0,6202220.0,11086380.0,505692600.0
Dividends,1095.0,0.00369511,0.0315886,0.0,0.0,0.0,0.0,0.3024
Stock Splits,1095.0,0.004018265,0.06639255,0.0,0.0,0.0,0.0,1.1
Close_diff,1094.0,0.004081144,0.319654,-2.172035,-0.1134429,0.0,0.1157408,2.027229
Lag_Close,1094.0,28.86265,3.459748,20.12754,26.54735,28.47579,31.84231,36.4111
Lag_High,1094.0,29.03257,3.504971,20.48955,26.65837,28.55394,32.03948,37.13932


In [9]:
df.isnull().sum()

Date                  0
Open                  0
High                  0
Low                   0
Close                 0
Volume                0
Dividends             0
Stock Splits          0
Close_diff            1
Lag_Close             1
Lag_High              1
Lag_Low               1
Rolling_Mean_7        6
Rolling_Std_7         6
Rolling_Mean_30      29
Rolling_Std_30       29
RSI                  13
MACD                  0
BB_Middle_Band       19
BB_Upper_Band        19
BB_Lower_Band        19
Change_Close          1
Change_Volume         1
Weekday               0
Month                 0
Year                  0
Quarter               0
Volume_Normalized     0
dtype: int64

In [10]:
df.shape

(1095, 28)

In [11]:
df.dtypes

Date                  object
Open                 float64
High                 float64
Low                  float64
Close                float64
Volume                 int64
Dividends            float64
Stock Splits         float64
Close_diff           float64
Lag_Close            float64
Lag_High             float64
Lag_Low              float64
Rolling_Mean_7       float64
Rolling_Std_7        float64
Rolling_Mean_30      float64
Rolling_Std_30       float64
RSI                  float64
MACD                 float64
BB_Middle_Band       float64
BB_Upper_Band        float64
BB_Lower_Band        float64
Change_Close         float64
Change_Volume        float64
Weekday                int64
Month                  int64
Year                   int64
Quarter                int64
Volume_Normalized    float64
dtype: object