<a href="https://colab.research.google.com/github/Arka1212/Yes-Bank-Stock-Closing-Price-Prediction/blob/main/Yes_Bank_Stock_Closing_Price_Prediction_(ML_capstone).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**-------------------------------------------------------------------------------------------------------------------------------------------------------------------**

# **Problem Statement** ==> Yes Bank is a well-known bank in the Indian financial domain. Since 2018, it has been in the news  because of the fraud case involving Rana Kapoor. Owing to this fact, it was interesting to see how that impacted the stock prices of the company and whether Time series models or any other predictive models can do justice to such situations. This dataset has monthly stock prices of the bank since its inception and includes closing, starting, highest, and lowest stock prices of every month. The main objective is to predict the stock’s closing price of the month.

**-------------------------------------------------------------------------------------------------------------------------------------------------------------------**

In [23]:
# Importing necessary libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import math

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import *
from statsmodels.stats.outliers_influence import variance_inflation_factor
from datetime import datetime

import warnings
warnings.filterwarnings('ignore')

In [3]:
# Mounting drive.
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
# Accessing the dataset.
path = '/content/drive/MyDrive/Capstone Projects/Machine Learning (Regression)/Yes Bank Stock Closing Price Prediction/data_YesBank_StockPrices.csv'
df = pd.read_csv(path)

### **> DATA INSPECTION & PREPROCESSING**

In [30]:
# Glimpse of the dataset.
df.head()

Unnamed: 0,Date,Open,High,Low,Close
0,Jul-05,13.0,14.0,11.25,12.46
1,Aug-05,12.58,14.88,12.55,13.42
2,Sep-05,13.48,14.87,12.27,13.3
3,Oct-05,13.2,14.47,12.4,12.99
4,Nov-05,13.35,13.88,12.88,13.41


In [14]:
# Columns.
df.columns

Index(['Date', 'Open', 'High', 'Low', 'Close'], dtype='object')

**Date**: It denotes the month & year with respect to the price of the stock.

**Open**: The price at which a stock started trading that month.

**High**: Refers to the maximum price of the stock.

**Low**: Refers to the minimum price of the stock.

**Close**: Refers to the final trading price for that month.

**"Close" is the dependent variable or the target variable in this case & rest all are independent variables or input variables.**

In [9]:
# Shape of the data.
df.shape

# Rows = 185 & Columns = 5.

(185, 5)

In [13]:
# Dataset information.
df.info()

# Dataset has no null values.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185 entries, 0 to 184
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    185 non-null    object 
 1   Open    185 non-null    float64
 2   High    185 non-null    float64
 3   Low     185 non-null    float64
 4   Close   185 non-null    float64
dtypes: float64(4), object(1)
memory usage: 7.4+ KB


In [31]:
# Checking for missing values.
df.isna().sum()

# Dataset has no missing values.

Date     0
Open     0
High     0
Low      0
Close    0
dtype: int64

In [21]:
# Checking the presence of duplicate data.
len(df[df.duplicated()])

# Dataset has no duplicate data.

0

In [36]:
bank_df = df.copy()

# Created a copy of the data to work with, so as to keep the original dataset intact.

In [37]:
# Changing the "Date" column to datetime object.
bank_df['Date'] = pd.to_datetime(bank_df['Date'].apply(lambda x: datetime.strptime(x, '%b-%y')))

In [40]:
bank_df.info()

# Column is succesfully coverted into datetime object.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185 entries, 0 to 184
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    185 non-null    datetime64[ns]
 1   Open    185 non-null    float64       
 2   High    185 non-null    float64       
 3   Low     185 non-null    float64       
 4   Close   185 non-null    float64       
dtypes: datetime64[ns](1), float64(4)
memory usage: 7.4 KB


In [43]:
# Segregating the dataset into dependent & independent variable.
x = bank_df.drop(['Close'],axis=1)         # Independent variables.
y = bank_df['Close']                       # Dependent variable.

### **> Exploratory Data Analysis (EDA)**