<a href="https://colab.research.google.com/github/ChintPatel/CMPE-255-Data-preparation-and-EDA/blob/main/Timeseries_Microsoft_Stock.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pandas matplotlib seaborn pycaret



In [None]:
# Step 1: Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import lag_plot
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from pycaret.time_series import *

In [None]:
!pip install pycaret kaleido



In [None]:
# Step 2: Load the Dataset
file_path = 'Microsoft_Stock.csv'
time_series_data = pd.read_csv(file_path)

# Inspect the dataset
print("Dataset Head:\n", time_series_data.head())  # Display the first few rows
print("\nDataset Info:\n")
time_series_data.info()  # Check data types and non-null counts
print("\nMissing Values:\n", time_series_data.isnull().sum())  # Check for missing values

# Step 3: Data Cleaning and Preparation
# Convert the 'Date' column to datetime format
time_series_data['Date'] = pd.to_datetime(time_series_data['Date'])

# Set the 'Date' column as the index
time_series_data = time_series_data.set_index('Date')

# Ensure the index has a frequency
time_series_data = time_series_data.asfreq('B')  # 'B' for business days

# Fill any missing values introduced by frequency assignment
time_series_data['Close'] = time_series_data['Close'].fillna(method='ffill')

# Step 4: Time Series Forecasting with PyCaret
# Initialize PyCaret setup
time_series_setup = setup(
    data=time_series_data['Close'],  # Use the 'Close' column for modeling
    session_id=42,  # Random seed for reproducibility
    seasonal_period=30  # Monthly seasonality
)

# Compare models to identify the best-performing one
best_model = compare_models()

# Finalize the best model for forecasting
final_model = finalize_model(best_model)

# Forecast future values
future_forecast = predict_model(final_model, fh=30)  # Forecast next 30 periods (30 days in this case)


Dataset Head:
                 Date   Open   High    Low  Close    Volume
0  4/1/2015 16:00:00  40.60  40.76  40.31  40.72  36865322
1  4/2/2015 16:00:00  40.66  40.74  40.12  40.29  37487476
2  4/6/2015 16:00:00  40.34  41.78  40.18  41.55  39223692
3  4/7/2015 16:00:00  41.61  41.91  41.31  41.53  28809375
4  4/8/2015 16:00:00  41.48  41.69  41.04  41.42  24753438

Dataset Info:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1511 entries, 0 to 1510
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1511 non-null   object 
 1   Open    1511 non-null   float64
 2   High    1511 non-null   float64
 3   Low     1511 non-null   float64
 4   Close   1511 non-null   float64
 5   Volume  1511 non-null   int64  
dtypes: float64(4), int64(1), object(1)
memory usage: 71.0+ KB

Missing Values:
 Date      0
Open      0
High      0
Low       0
Close     0
Volume    0
dtype: int64


Unnamed: 0,Description,Value
0,session_id,42
1,Target,Close
2,Approach,Univariate
3,Exogenous Variables,Not Present
4,Original data shape,"(1566, 1)"
5,Transformed data shape,"(1566, 1)"
6,Transformed train set shape,"(1565, 1)"
7,Transformed test set shape,"(1, 1)"
8,Rows with missing values,0.0%
9,Fold Generator,ExpandingWindowSplitter


Unnamed: 0,Model,MASE,RMSSE,MAE,RMSE,MAPE,SMAPE,TT (Sec)
stlf,STLF,0.2428,0.1629,1.4732,1.4732,0.0063,0.0063,0.1333
croston,Croston,0.3061,0.2054,1.858,1.858,0.0079,0.0079,0.0367
exp_smooth,Exponential Smoothing,0.3334,0.2237,2.024,2.024,0.0087,0.0086,1.23
ets,ETS,0.3334,0.2237,2.0236,2.0236,0.0087,0.0086,2.7333
knn_cds_dt,K Neighbors w/ Cond. Deseasonalize & Detrending,0.3378,0.2267,2.0508,2.0508,0.0088,0.0087,0.19
en_cds_dt,Elastic Net w/ Cond. Deseasonalize & Detrending,0.3512,0.2356,2.1317,2.1317,0.0091,0.0091,0.1867
lasso_cds_dt,Lasso w/ Cond. Deseasonalize & Detrending,0.3652,0.2451,2.2168,2.2168,0.0095,0.0095,0.1867
llar_cds_dt,Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending,0.3655,0.2452,2.2183,2.2183,0.0095,0.0095,0.1733
theta,Theta Forecaster,0.3663,0.2458,2.2236,2.2236,0.0095,0.0095,0.07
omp_cds_dt,Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending,0.3781,0.2537,2.295,2.295,0.0098,0.0098,0.17


Processing:   0%|          | 0/117 [00:00<?, ?it/s]

TypeError: argument of type 'NoneType' is not iterable

In [None]:
# Step 5: Display Forecast Results
print("\nFuture Forecast:\n", future_forecast)

# Save the model for future use
save_model(final_model, 'microsoft_stock_forecast_model')



Future Forecast:
               y_pred
2021-04-01  231.9769
2021-04-02  231.1013
2021-04-05  226.3659
2021-04-06  226.0987
2021-04-07  226.3738
2021-04-08  221.9805
2021-04-09  223.9771
2021-04-12  226.4177
2021-04-13  225.9015
2021-04-14  225.1463
2021-04-15  224.6501
2021-04-16  228.9177
2021-04-19  227.7630
2021-04-20  232.3602
2021-04-21  232.5916
2021-04-22  237.4190
2021-04-23  233.1904
2021-04-26  236.2181
2021-04-27  238.6168
2021-04-28  238.7048
2021-04-29  234.0419
2021-04-30  234.5638
2021-05-03  238.3956
2021-05-04  240.2842
2021-05-05  237.8716
2021-05-06  236.4833
2021-05-07  238.8703
2021-05-10  237.1088
2021-05-11  234.3454
2021-05-12  236.3352
Transformation Pipeline and Model Successfully Saved


(ForecastingPipeline(steps=[('forecaster',
                             TransformedTargetForecaster(steps=[('model',
                                                                 ForecastingPipeline(steps=[('forecaster',
                                                                                             TransformedTargetForecaster(steps=[('model',
                                                                                                                                 STLForecaster(sp=30))]))]))]))]),
 'microsoft_stock_forecast_model.pkl')