<a href="https://colab.research.google.com/github/ai-ivision/Apple-Stock-Price-Prediction/blob/main/Apple_Stock_Price_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *Apple Stock Price*

Apple Inc., founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976, is a pioneering technology company known for its innovative products, including the Macintosh, iPod, iPhone, iPad, and MacBook. Starting in a garage in Cupertino, California, Apple has grown into one of the world's most valuable companies, revolutionizing the personal computing, smartphone, and digital music industries. The company's commitment to design, quality, and user experience has set it apart as a leader in technology and innovation.

This dataset provides a comprehensive record of Apple Inc.'s stock price changes over the past 44 years. It includes essential columns such as the date, opening price, highest price of the day, lowest price of the day, closing price, adjusted closing price, and trading volume.

This extensive data is invaluable for conducting historical analyses, forecasting future stock performance, and understanding long-term market trends related to Apple's stock.

## About Dataset

This dataset provides a comprehensive record of Apple Inc.'s stock price changes over the past 44 years. It includes essential columns such as the date, opening price, highest price of the day, lowest price of the day, closing price, adjusted closing price, and trading volume.
The Columns in the Dataset :
- **Date :** The date of the trading day.
- **Open :** The opening price of Apple stock on the given day.
- **High :** The highest price of Apple stock during the trading day.
- **Low :** The lowest price of Apple stock during the trading day.
- **Close :** The closing price of Apple stock on the given day.
- **Adj Close :** The adjusted closing price of Apple stock.
- **Volume :** The trading volume of Apple stock on the given day.

### Acknowledgements

We would like to extend our deepest gratitude to all the contributors and curators of the dataset that made this project possible. The availability of high-quality datasets is crucial for the development and benchmarking of machine learning models, and your hard work is greatly appreciated.

### Source

This dataset was sourced from [Kaggle](https://www.kaggle.com). We acknowledge and thank the original authors for making this dataset publicly available. The dataset can be accessed at the following link:

[Kaggle Dataset Link](https://www.kaggle.com/datasets/mayankanand2701/apple-stock-price-dataset/data)

### To Ignore Warnngs

In [1]:
import warnings

warnings.filterwarnings('ignore')

### Downloading the Dataset directly from **[Kaggle](https://www.kaggle.com/)**

In [2]:
! mkdir ~/.kaggle

In [3]:
! cp kaggle.json ~/.kaggle

In [4]:
! chmod 600 ~/.kaggle/kaggle.json

In [6]:
! kaggle datasets download -d mayankanand2701/apple-stock-price-dataset

Dataset URL: https://www.kaggle.com/datasets/mayankanand2701/apple-stock-price-dataset
License(s): MIT
Downloading apple-stock-price-dataset.zip to /content
  0% 0.00/218k [00:00<?, ?B/s]
100% 218k/218k [00:00<00:00, 72.6MB/s]


In [7]:
! ls

apple-stock-price-dataset.zip  kaggle.json  sample_data


In [8]:
! unzip apple-stock-price-dataset.zip

Archive:  apple-stock-price-dataset.zip
  inflating: Apple Dataset.csv       


In [12]:
!ls

AppleDataset.csv  apple-stock-price-dataset.zip  kaggle.json  sample_data


### Import The Libraries for Data Loading, Data Studying and Visualization

In [13]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

### Load and Read the DataSet

In [14]:
apple_df = pd.read_csv('AppleDataset.csv')

In [15]:
apple_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1980-12-12,0.128348,0.128906,0.128348,0.128348,0.099058,469033600
1,1980-12-15,0.12221,0.12221,0.121652,0.121652,0.09389,175884800
2,1980-12-16,0.113281,0.113281,0.112723,0.112723,0.086999,105728000
3,1980-12-17,0.115513,0.116071,0.115513,0.115513,0.089152,86441600
4,1980-12-18,0.118862,0.11942,0.118862,0.118862,0.091737,73449600


In [16]:
apple_df.shape

(10954, 7)

In [17]:
apple_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10954 entries, 0 to 10953
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       10954 non-null  object 
 1   Open       10954 non-null  float64
 2   High       10954 non-null  float64
 3   Low        10954 non-null  float64
 4   Close      10954 non-null  float64
 5   Adj Close  10954 non-null  float64
 6   Volume     10954 non-null  int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 599.2+ KB


In [19]:
apple_df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Open,10954.0,21.53088,44.45839,0.049665,0.296875,0.522321,19.7675,198.02
High,10954.0,21.7619,44.93186,0.049665,0.303571,0.533482,19.88857,199.62
Low,10954.0,21.30822,44.01358,0.049107,0.290179,0.513393,19.45777,197.0
Close,10954.0,21.54407,44.49248,0.049107,0.296875,0.524554,19.68268,198.11
Adj Close,10954.0,20.74751,44.03894,0.0379,0.241624,0.427333,17.04805,197.5895
Volume,10954.0,319079200.0,335744600.0,0.0,113993600.0,206712800.0,399344400.0,7421641000.0


In [20]:
apple_df.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [22]:
apple_df.duplicated().sum()

0

In [29]:
for column in apple_df.columns:

  val_counts = apple_df[column].value_counts()
  filtered_val_counts = val_counts[val_counts > 20]

  if not filtered_val_counts.empty:
    print(f"Column: {column}")
    print(filtered_val_counts)
    print("\n")

Column: Open
Open
0.354911    38
0.401786    37
0.366071    36
0.357143    34
0.397321    34
0.372768    33
0.363839    31
0.383929    31
0.361607    31
0.348214    30
0.375000    30
0.341518    29
0.399554    29
0.352679    29
0.392857    29
0.368304    27
0.343750    27
0.359375    26
0.377232    26
0.299107    25
0.319196    25
0.325893    24
0.323661    24
0.328125    24
0.370536    24
0.350446    24
0.332589    24
0.386161    23
0.406250    22
0.316964    22
0.390625    21
0.114955    21
0.301339    21
0.410714    21
0.379464    21
0.068080    21
0.330357    21
0.345982    21
0.339286    21
Name: count, dtype: int64


Column: High
High
0.372768    35
0.375000    34
0.370536    32
0.363839    32
0.334821    31
0.383929    31
0.361607    31
0.404018    31
0.366071    30
0.401786    30
0.357143    30
0.330357    29
0.410714    28
0.352679    28
0.406250    27
0.379464    27
0.354911    27
0.386161    26
0.368304    26
0.381696    25
0.392857    23
0.399554    23
0.343750    23
0.3883

### Data Visualization

In [30]:
# plotting price over the time
fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=apple_df['Date'], y=apple_df['Close'], mode='lines',
               name='Closing Price', line=dict(color='orange'))
)

fig.update_layout(
    title="Apple Stock Price Over Time",
    xaxis_title="Date",
    yaxis_title="Closing Price",
    showlegend=True
)

fig.show()

In [32]:
# plotting stock volume over time
fig = make_subplots(rows=1, cols=1)

fig.add_trace(
    go.Scatter(x=apple_df['Date'], y=apple_df['Volume'], mode='lines',
               name='Volume', line=dict(color='brown'))
)

fig.update_layout(
    title="Apple Stock Volume Over Time",
    xaxis_title="Date",
    yaxis_title="Volume",
    showlegend=True
)

fig.show()

In [33]:
# plotting prices over time in a candlestick

figure = go.Figure(
    data = [go.Candlestick(
        x=apple_df['Date'],
        open=apple_df['Open'],
        high=apple_df['High'],
        low=apple_df['Low'],
        close=apple_df['Close']
    )]
)

figure.update_layout(
    title="Price CandleSticks Over Time"
)

figure.show()

### Feature Engineering

In [34]:
apple_df['Date'] = pd.to_datetime(apple_df['Date'])

apple_df['Year'] = apple_df['Date'].dt.year
apple_df['Month'] = apple_df['Date'].dt.month
apple_df['Day'] = apple_df['Date'].dt.day

apple_df.drop(columns=['Date'], inplace=True)

In [35]:
apple_df.head()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day
0,0.128348,0.128906,0.128348,0.128348,0.099058,469033600,1980,12,12
1,0.12221,0.12221,0.121652,0.121652,0.09389,175884800,1980,12,15
2,0.113281,0.113281,0.112723,0.112723,0.086999,105728000,1980,12,16
3,0.115513,0.116071,0.115513,0.115513,0.089152,86441600,1980,12,17
4,0.118862,0.11942,0.118862,0.118862,0.091737,73449600,1980,12,18


In [36]:
apple_df.shape

(10954, 9)

### Importing Libraries for Data Splitting and Model Building

In [45]:
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor


from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score



>## Splitting Dataset



In [37]:
X = apple_df.drop('Close', axis=1)
y = apple_df['Close']

In [43]:
def FitAndEvaluateModels(X, y, models):
  """
    Evaluates a list of models on the provided training data.

    Parameters:
    x_train (pd.DataFrame or np.ndarray): The feature training data.
    y_train (pd.Series or np.ndarray): The target training data.
    models (list): A list of tuples where each tuple contains the model name and the model instance.

    Returns:
    dict: A dictionary with model names as keys and their evaluation metrics as values.
  """

  results = {}

  # Split the training data for validation
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  print(f"The Shape of X_train  : {X_train.shape}")
  print(f"The Shape of X_test  : {X_test.shape}")
  print(f"The Shape of y_train  : {y_train.shape}")
  print(f"The Shape of y_test  : {y_test.shape}")

  for name, model in models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    mse = mean_squared_error(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test, y_pred)

    results[name] = {
        'MSE': mse,
        'MAE': mae,
        'RMSE': rmse,
        'R2': r2
    }

  return results


In [48]:
models = [
    ('LinearRegression', LinearRegression()),
    ('Ridge', Ridge(alpha=1.0)),
    ('Lasso', Lasso(alpha=1.0)),
    ('ElasticNet', ElasticNet(alpha=1.0, l1_ratio=0.5)),
    ('SVR', SVR(kernel='rbf', C=1.0, epsilon=0.1)),
    ('DecisionTreeRegressor', DecisionTreeRegressor()),
    ('GradientBoostingRegressor', GradientBoostingRegressor(n_estimators=100)),
    ('RandomForestRegressor', RandomForestRegressor(n_estimators=100))
]

In [49]:
# Fitting the FitAndEvaluateModels methos with models and data
res = FitAndEvaluateModels(X, y, models)

res_df = pd.DataFrame(res)

The Shape of X_train  : (8763, 8)
The Shape of X_test  : (2191, 8)
The Shape of y_train  : (8763,)
The Shape of y_test  : (2191,)


In [51]:
res

{'LinearRegression': {'MSE': 0.07458333329353117,
  'MAE': 0.15556747141953112,
  'RMSE': 0.2730994933966945,
  'R2': 0.9999625546775321},
 'Ridge': {'MSE': 0.07459754915115534,
  'MAE': 0.15568252034437796,
  'RMSE': 0.27312551904052346,
  'R2': 0.999962547540316},
 'Lasso': {'MSE': 0.3886627662383276,
  'MAE': 0.2382482416093752,
  'RMSE': 0.6234282366386107,
  'R2': 0.9998048678978216},
 'ElasticNet': {'MSE': 0.31392471072527467,
  'MAE': 0.2100323038869795,
  'RMSE': 0.5602898452812389,
  'R2': 0.9998423909001564},
 'SVR': {'MSE': 2422.369519840327,
  'MAE': 20.91150928205772,
  'RMSE': 49.2175732827242,
  'R2': -0.21617530085019765},
 'DecisionTreeRegressor': {'MSE': 0.05778007939541265,
  'MAE': 0.08237487950707442,
  'RMSE': 0.24037487263733007,
  'R2': 0.9999709909223732},
 'GradientBoostingRegressor': {'MSE': 0.08809977600978687,
  'MAE': 0.13856329993915828,
  'RMSE': 0.2968160642717757,
  'R2': 0.9999557686097369},
 'RandomForestRegressor': {'MSE': 0.035726584537333156,
  'M

In [50]:
res_df

Unnamed: 0,LinearRegression,Ridge,Lasso,ElasticNet,SVR,DecisionTreeRegressor,GradientBoostingRegressor,RandomForestRegressor
MSE,0.074583,0.074598,0.388663,0.313925,2422.36952,0.05778,0.0881,0.035727
MAE,0.155567,0.155683,0.238248,0.210032,20.911509,0.082375,0.138563,0.067442
RMSE,0.273099,0.273126,0.623428,0.56029,49.217573,0.240375,0.296816,0.189015
R2,0.999963,0.999963,0.999805,0.999842,-0.216175,0.999971,0.999956,0.999982


### Summary

1. **Linear Regression**
- MSE: 0.0746
- MAE: 0.1556
- RMSE: 0.2731
- R2: 0.99996

*Linear Regression performed very well with an excellent R-squared score indicating a strong fit to the data and low error metrics.*

2. **Ridge Regression**
- MSE: 0.0746
- MAE: 0.1557
- RMSE: 0.2731
- R2: 0.99996

*Ridge Regression showed similar performance to Linear Regression with minimal differences in error metrics and R-squared, suggesting it handled multicollinearity well.*

3. **Lasso Regression**
- MSE: 0.3887
- MAE: 0.2382
- RMSE: 0.6234
- R2: 0.99980

*Lasso Regression had higher error metrics compared to Linear and Ridge Regression, but still maintained a high R-squared value indicating a good fit with some variable selection.*

4. **ElasticNet Regression**
- MSE: 0.3139
- MAE: 0.2100
- RMSE: 0.5603
- R2: 0.99984

*ElasticNet Regression, combining Lasso and Ridge penalties, showed improved performance over Lasso alone but did not outperform Ridge and Linear Regression.*

5. **Support Vector Regression (SVR)**
- MSE: 2422.3695
- MAE: 20.9115
- RMSE: 49.2176
- R2: -0.2162

*SVR significantly underperformed compared to other models, with high error metrics and a negative R-squared indicating a poor fit to the data.*

6. **Decision Tree Regression**
- MSE: 0.0578
- MAE: 0.0824
- RMSE: 0.2404
- R2: 0.99997

*Decision Tree Regression performed exceptionally well with very low error metrics and a high R-squared, suggesting it captured the data patterns effectively.*

7. **Gradient Boosting Regression**
- MSE: 0.0881
- MAE: 0.1386
- RMSE: 0.2968
- R2: 0.99996

*Gradient Boosting Regression also showed strong performance with slightly higher error metrics than Decision Tree but still maintained a very high R-squared.*

8. **Random Forest Regression**
- MSE: 0.0357
- MAE: 0.0674
- RMSE: 0.1890
- R2: 0.99998

*Random Forest Regression achieved the best overall performance with the lowest error metrics and the highest R-squared, indicating it provided the most accurate predictions.*

### Summary

- **Best Performers:** Random Forest Regression and Decision Tree Regression exhibited the best performance with the lowest error metrics and highest R-squared values, indicating they captured the data patterns very effectively.

- **Strong Performers:** Linear Regression, Ridge Regression, and Gradient Boosting Regression also performed very well with high R-squared values and low error metrics.

- **Moderate Performers:** Lasso Regression and ElasticNet Regression performed moderately well with higher error metrics but still high R-squared values.

- **Underperformer:** Support Vector Regression significantly underperformed, with high error metrics and a negative R-squared value, indicating it was not suitable for this dataset.

### Conclusion

Based on these results, *Random Forest Regression* is recommended as the best model for this dataset, followed closely by *Decision Tree Regression*.