<a href="https://colab.research.google.com/github/arshhad45/EDA/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Team Member  -** Syed Arshad A


# **Project Summary -**

📈 Yes Bank Stock Price Analysis

🔍 Overview
This project dives deep into the historical stock prices of Yes Bank, aiming to uncover patterns, trends, and volatility in the market. By applying Exploratory Data Analysis (EDA) and creating meaningful visualizations, we translate raw numbers into valuable insights that can help traders, investors, and financial analysts make smarter, data-backed decisions.

🧩 Steps in the Analysis
1. Data Collection & Preparation

Loaded and cleaned the Yes Bank stock price dataset.

Converted the ‘Date’ column to a datetime format for accurate time-series analysis.

Handled missing values and removed duplicate entries to ensure data quality.

2. Exploratory Data Analysis (EDA)

Explored summary statistics of key variables like Open, High, Low, and Close prices.

Conducted correlation analysis to see how different stock metrics relate.

Measured price volatility to understand how often and how sharply prices change.

3. Data Visualization (20+ Charts)

Time-series graphs to track how the stock moved over the years.

Moving averages and rolling statistics to highlight long-term trends.

Box plots and histograms to understand how stock prices are distributed.

Correlation heatmaps and scatter plots to find meaningful relationships in the data.

4. Key Insights & Recommendations

The stock showed high volatility, signaling the need for proper risk management.

Short-term traders can benefit from day-to-day price movements.

Long-term investors should focus on patterns in moving averages to guide their strategies.

🎯 Business Value
Supports smarter investment decisions through data-driven insights.

Reduces risk exposure by understanding historical volatility patterns.

Informs trading strategies using technical indicators and price trends.

This project bridges the gap between raw stock data and strategic market decisions, providing real-world value to anyone looking to understand or trade Yes Bank stock more effectively.



# **GitHub Link -**

https://github.com/arshhad45

# **Problem Statement**


Yes Bank, a major financial institution, has experienced fluctuations in its stock prices over time. Investors and stakeholders need data-driven insights to understand market trends, assess risks, and make informed trading decisions. The key challenge is to analyze historical stock price data to identify patterns, trends, volatility, and correlations that can help optimize investment strategies.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from datetime import datetime

### Dataset Loading

In [None]:

df = pd.read_csv('data_YesBank_StockPrices.csv') #to read the data
df.head() #to display first five rows

### Dataset First View

In [None]:
df # to display all dataframes

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape # dataframe returns tuples of count of rows and columns

### Dataset Information

In [None]:
# Dataset Info
df.dtypes #dataframes datatypes of each column

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df.duplicated().sum() # count the duplicate values
print("\n number of duplicate rows =",duplicate_rows) # display it


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum() #Returns the count of missing values in each column

In [None]:
# Visualizing the missing values
import seaborn as sns #for visualizing clearly
import matplotlib.pyplot as plt #to draw graph
sns.heatmap(df.isnull(), cbar=False, cmap="RdBu" ) #to make grid for missing values
plt.title("missing values")
plt.show()


### What did you know about your dataset?

I got to know about basic data set overview and gained practical knowlede on working of data Visualization by how it is executed.
Also learned identifing data set loading , Dataframe previewing , dataframe shape count.
For data handling i learned to Find different data types in A Data frame, finiding missing values , duplicate values Etc..
And the most important part about this learning is visulizing the missing values .

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from datetime import datetime
df = pd.read_csv('data_YesBank_StockPrices.csv') #to read the data
num = df.columns.tolist()
print("\n column names=",num)
df.dtypes


In [None]:
# Dataset Describe
df.describe()

### Variables Description

Date -The date of the stock price record.

Open -The opening price of the stock for the day.

High -The highest price of the stock on that day.

Low -The lowest price of the stock on that day.

Close -The closing price of the stock for the day.Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df.columns: #depicts all columns one by one
    print(f"Column: {column}")# prints the column and iterates
    print(f"Unique Values: {df[column].unique()}") #finds the unique values in the column dataframe
    print("\n")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
import pandas as pd
df = pd.read_csv('data_YesBank_StockPrices.csv') #to read the data
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')# structuring the date and time
df.set_index('Date', inplace=True)
df = df.sort_values('Date')# sorts in order of date
missing_values = df.isnull().sum()#finds the missing values
data_types = df.dtypes# finds the data type
df['Range'] = df['High'] - df['Low'] # creates a new column for fluctuation of prices
df.head(185), missing_values, data_types


### What all manipulations have you done and insights you found?

Data Manipulations and Insights We performed various data wrangling and exploratory analysis steps on the dataset to ensure it is clean, structured, and ready for further analysis. Below is a summary of all manipulations and the key insights derived.

Data Manipulations Performed:

1️⃣ Data Loading and Inspection Loaded the dataset using pandas (pd.read_csv()). Displayed the first few rows using df.head(). Checked dataset structure using df.info(). Generated summary statistics using df.describe().

🔎 Insights: ✔ The dataset has 185 rows and 5 columns: Date, Open, High, Low, and Close. ✔ Date was stored as a string (object) instead of a datetime format. ✔ Other columns were numerical.

2️⃣ Data Cleaning --Converted Date column to datetime format --Checked for missing values: Result: No missing values were found. --Checked for duplicate records: Result: No duplicate rows were found. --Sorted dataset by Date. --Set Date column as the index for time-series analysis

Insights: ✔ No missing or duplicate values were found. ✔ Sorting by Date ensures correct time-series ordering.

3️⃣ Handling Outliers Used Interquartile Range (IQR) method to detect and remove extreme values.

Insights: ✔ Outliers were detected and removed to improve data reliability. ✔ This helps avoid misleading statistical trends.

Final Summary of Insights

🔹 No missing or duplicate values, ensuring data integrity.

🔹 Added new column to chech fluctuation of price differents.

🔹 Stock prices fluctuate, requiring trend analysis.

🔹 Outliers were removed to avoid distorted trends.

🔹 High correlation between stock prices indicates synchronized movements.

🔹 Moving averages help in identifying trends for better decision-making.

🔹 Daily returns provide insight into market volatility.Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['Close'], color='blue', label='Closing Price')
plt.title('YES Bank Stock Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price (INR)')
plt.grid(True)
plt.legend()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.I chose a line chart to visualize the closing price trend of YES Bank over time because:
It clearly shows how the stock price has changed month by month over the years.
Line plots helps in identifing data clearly.
This chart makes it easy to identify long-term trends.
It helps in identifying volatility and supports further tasks like moving average smoothing or forecasting.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:YES Bank’s stock showed steady growth in the early years.

It reached a peak with a sharp rise in price, followed by a sudden crash.

In recent years, the stock price has stabilized at a low level.

This indicates high volatility, with a strong rise and steep fall in value.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here : For sure it will Help to create a positive business impact across all kind of enterprices and business administrations and  mainly to create and train ML models !
After reaching its peak stock price, YES Bank's price crashed sharply and continued to decline to very low levels.
This drop reflects negative growth in the company’s performance and market trust.

#### Chart - 2

In [None]:
# Chart - 2 daily change in closing price
# Calculating the daily change in closing price
df['Daily_Change'] = df['Close'].diff()
# Creating a bar chart of daily change
plt.figure(figsize=(12, 6))
plt.bar(df.index, df['Daily_Change'], color='black', label='Daily Change')
plt.title('YES Bank Daily Change in Closing Price')
plt.xlabel('Date')
plt.ylabel('Daily Change (INR)')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?


this bar chart was chosen to clearly visualize the day-to-day change in YES Bank’s closing stock price. It helps highlight sharp rises or falls in price over time and is well-suited for identifying volatility and unusual movements in the stock. Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here : The stock value was relatively steady with small daily fluctuations for many years.
There were sharp large increases (positive and negative), particularly between 2018 to 2020, which point to intense market reactions.
The largest decrease is evident clearly, which may indicate a potential financial crisis or investor panic at that time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These insights help businesses and investors:
Recognize periods of high risk or instability
Build risk mitigation strategies
Train ML models to detect patterns .

#### Chart - 3

In [None]:
# Chart - 3 visualization code
#Relationship between Daily High and Low Prices
plt.figure(figsize=(10, 6))
# Create a scatter plot of high vs. low prices
plt.scatter(df['High'], df['Low'], color='Red')
plt.title('Relationship between Daily High and Low Prices')
plt.xlabel('Daily High Price (INR)')
plt.ylabel('Daily Low Price (INR)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.This scatter plot was chosen to visualize the relationship between daily high and low prices of YES Bank stock.
It clearly shows how closely the low and high prices are correlated and helps identify any unusual volatility .

##### 2. What is/are the insight(s) found from the chart?

Answer Here:There is a strong positive correlation between daily high and low prices — as high prices increase, low prices also rise.

Most points are closely clustered, showing stable intraday movements.

A few outliers exist where the difference between high and low is large, indicating high volatility or abnormal trading on those days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These observations assist traders and analysts in recognizing volatility patterns and selecting low-risk trading times.

A strong correlation implies a stable trading range, which facilitates algorithmic trading and risk management methods.

Assists in establishing dependable ML models by validating the association among features.

The outliers where the low price is considerably lower than the high reflect intraday crashes or panic selling.

These indicate investor uncertainty, low earnings, or market shocks, which are symptomatic of adverse business events or sentiment.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
#distribution of daily closing prices
plt.figure(figsize=(10, 6))
# Create a box plot of daily closing prices
sns.boxplot(data=df, y='Close', color='blue')
plt.title('Distribution of Daily Closing Prices')
plt.ylabel('Closing Price (INR)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

This box plot was chosen to visualize the distribution and spread of YES Bank’s daily closing prices

##### 2. What is/are the insight(s) found from the chart?

The median closing price is around ₹60–70.
A large portion of the data lies below ₹150, showing that YES Bank mostly traded in a low price range.
Several outliers above ₹300 show a period of sharp, unusual growth, possibly due to market hype or short-term events.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Identifying the normal price range helps in setting realistic price targets.

This indicates unstable long-term performance and a possible loss of investor confidence, which reflects negative business outcomes.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
#Relationship between Opening and Closing Prices
plt.figure(figsize=(10, 6))
# Create a scatter plot of opening vs. closing prices
plt.scatter(df['Open'], df['Close'], color='blue')
plt.title('Relationship between Opening and Closing Prices')
plt.xlabel('Opening Price (INR)')
plt.ylabel('Closing Price (INR)')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I used a scatter plot to check if there's a clear relationship between opening and closing prices. It's the best way to see how strongly they move together in a day.

##### 2. What is/are the insight(s) found from the chart?

There's a strong positive correlation – when the stock opens higher, it usually closes higher.

Most points are near a straight line, meaning the stock doesn't fluctuate wildly within a day.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, definitely.
This trend shows predictability, which helps investors and models estimate the day’s closing price early.
Yeah, a few outliers show a big gap between opening and closing prices, which could point to market panic or sudden bad news.
Such patterns warn investors to stay cautious during those times.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
#Distribution of Daily Percentage Change in Closing Price
df['Daily_Percentage_Change'] = df['Close'].pct_change() * 100
# Create a histogram of daily percentage change
plt.figure(figsize=(10, 6))
plt.hist(df['Daily_Percentage_Change'].dropna(), bins=30, color='black', edgecolor='red')
plt.title('Distribution of Daily Percentage Change in Closing Price')
plt.xlabel('Daily Percentage Change (%)')
plt.ylabel('Frequency')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a histogram to visualize how often the stock’s closing price changes by certain percentages. It’s perfect for spotting how volatile the stock is day to day.

##### 2. What is/are the insight(s) found from the chart?

Most daily changes are small, clustering around 0%, meaning the stock is usually stable.
A few days show big positive or negative changes, pointing to high-risk or high-impact events.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps investors understand risk and set expectations for daily price swings.

The long tail on the left side indicates days with large percentage drops, often caused by bad news.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#Stock Trend (Area Chart)
plt.figure(figsize=(12, 6))
df["Close"].plot(kind="area",color="maroon")
plt.title("Stock Trend (Area Chart)")
plt.xlabel("Date")
plt.ylabel("Closing Price")
plt.show()

##### 1. Why did you pick the specific chart?

An area chart was chosen because it effectively represents the trend of closing prices over time while also highlighting the magnitude of changes.

The filled area provides a clear visual of stock fluctuations, making it easy to spot trends, peaks, and dips.

This chart is useful for identifying long-term patterns, such as uptrends, downtrends, and periods of stability.

##### 2. What is/are the insight(s) found from the chart?

The general trend of the stock price is visible—whether it is increasing, decreasing, or fluctuating.

If the area steadily rises, it suggests positive growth and strong investor confidence. A declining area may indicate a bearish trend, meaning the stock is losing value over time.

If the chart shows high volatility with frequent peaks and dips, it suggests market instability or speculative trading activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

If the stock is in an uptrend, it reassures investors and encourages further investment. Recognizing long-term trends helps businesses and traders make informed investment decisions.

Negative Growth Insights:

A continuous downward trend signals declining investor confidence, possibly leading to lower market valuation. High volatility without a clear trend can indicate market uncertainty, making it risky for long-term investors.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
#price spread for distribution of prices
plt.figure(figsize=(12, 8))
sns.violinplot(data=df[["Open", "High", "Low", "Close"]])
plt.title("Price Spread (Violin Plot)")
plt.ylabel("Price")
plt.show()

##### 1. Why did you pick the specific chart?

This violin plot was chosen because it combines the benefits of a box plot and a density plot, allowing us to see both the distribution and spread of stock prices (Open, High, Low, Close).

It provides insights into price variation, volatility, and the probability density of stock prices, helping investors understand how prices are distributed.

Unlike a box plot, a violin plot also shows the shape of the distribution, making it easier to detect multimodal distributions (multiple peaks).

##### 2. What is/are the insight(s) found from the chart?

The width of the violin plot at different price levels represents the frequency of price occurrences.

A wider section means prices frequently stay around that value, while a narrower section indicates fewer occurrences.

If the distributions for Open, High, Low, and Close prices differ significantly, it suggests high volatility in stock performance.

The presence of long tails indicates extreme values or outliers, suggesting significant price swings on certain days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

Helps traders understand how stock prices fluctuate, allowing them to adjust their strategies.

If the violin plot is narrow and concentrated, it indicates stable price movements, which is reassuring for investors.

Negative Growth Insights:

If the violin plot shows high variability with long tails, it suggests unpredictability and high risk, discouraging risk-averse investors. A very asymmetrical distribution in Open vs. Close prices may indicate a bearish or highly speculative market.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# Closing PriceDensity (KDE)
sns.kdeplot(df["Close"],fill=True,color="green",edgecolor="black")
plt.title("Closing Price Density (KDE)")
plt.xlabel("Closing Price")
plt.show()

##### 1. Why did you pick the specific chart?

A Kernel Density Estimate (KDE) plot was chosen because it provides a smooth distribution of closing prices over time.

Unlike a histogram, a KDE plot avoids issues with bin size selection, offering a clearer view of how frequently different prices occur.

It helps in identifying the most common closing prices, peaks in the data, and overall price trends

##### 2. What is/are the insight(s) found from the chart?

The peak(s) in the KDE plot indicate price levels where the stock frequently closes, suggesting areas of support or resistance.

A wide spread suggests high volatility, while a narrow, tall distribution indicates stability in stock performance.

If the KDE is right-skewed (long tail on the right), it indicates occasional high closing prices, possibly due to sudden rallies.

If the KDE is left-skewed, it suggests more frequent lower closing prices, indicating bearish market trends.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

Identifying common closing price ranges helps traders make informed buy/sell decisions. Recognizing price concentration areas can help investors set realistic entry and exit points.

Negative Growth Insights:

If the KDE plot shows multiple peaks, it may indicate an unstable stock with erratic price behavior, discouraging long-term investment. A left-skewed distribution may suggest a declining stock trend, which could reduce investor confidence.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
#Five month moving Average
df["Close"].rolling(window=5).mean().plot(label="5-Month MA", color="black")
df["Close"].plot(alpha=0.6,color="red")
plt.legend()
plt.title("5-Month Moving Average")
plt.xlabel("Date")
plt.ylabel("Closing Price")
plt.show()

##### 1. Why did you pick the specific chart?

This line chart with a moving average overlay was chosen because it helps identify trends in closing prices while smoothing out short-term fluctuations.

The 5-month moving average (MA) provides a clearer view of the stock's general direction, reducing noise from daily price changes.

This is a useful trend-following indicator that helps investors and traders assess the momentum and stability of the stock.

##### 2. What is/are the insight(s) found from the chart?

If the moving average is sloping upward, it indicates a bullish trend, meaning the stock price is generally increasing.

A downward-sloping MA suggests a bearish trend, meaning the stock is losing value over time.

If the closing price consistently stays above the moving average, it signals strong market sentiment and potential price growth.

If the price crosses below the moving average, it may indicate a trend reversal or weakening stock momentum.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

Identifying trend direction allows investors to make informed decisions about buying, holding, or selling the stock. A sustained upward trend boosts investor confidence, leading to more capital inflow into the stock.

Negative Growth Insights:

If the stock price is consistently below the moving average, it suggests weak momentum and declining investor confidence. Frequent crossovers (price moving above and below the MA) indicate high volatility, making it difficult for investors to predict future trends reliably.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
#outliers in closing price
plt.figure(figsize=(8, 6))
sns.boxplot(y=df["Close"],color="green")
plt.title("Outliers in Closing Prices")
plt.ylabel("Closing Price")
plt.show()

##### 1. Why did you pick the specific chart?

This box plot was chosen because it is one of the best ways to identify outliers and understand the distribution of closing prices.

It provides key statistical insights such as minimum, first quartile (Q1), median, third quartile (Q3), and maximum values, helping to assess price variations.

The presence of outliers (data points outside the whiskers) can indicate unusual price movements due to extreme market events.

##### 2. What is/are the insight(s) found from the chart?

If there are outliers above the upper whisker, it suggests spikes in stock price, which could be due to positive news, earnings reports, or strong market performance.

If there are outliers below the lower whisker, it signals sharp price drops, possibly due to market crashes, poor earnings, or negative sentiment.

A narrow interquartile range (IQR) indicates low volatility, while a wide IQR suggests high price fluctuations over time.

The median position in the box helps understand whether the stock tends to stay in the upper or lower price range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

Identifying outliers helps investors understand abnormal price movements, allowing them to anticipate market shifts.

If the majority of prices fall within a stable range, it boosts investor confidence by showing low volatility.

Negative Growth Insights:

A high number of downward outliers could indicate frequent crashes, discouraging investors from holding long-term positions.

If the stock price shows extreme variability, it suggests market uncertainty, which can lead to lower investor trust and speculative trading.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
#Monthly return chart bar
df["Monthly Return"] = df["Close"].pct_change() * 100
plt.figure(figsize=(12, 6))
sns.barplot(x=df.index.strftime('%b-%y'), y=df["Monthly Return"],color="red")
plt.xticks(ticks=range(0, len(df), 12), labels=df.index.strftime('%b-%Y')[::12])
plt.title("Monthly Returns (%)")
plt.xlabel("Month-Year")
plt.ylabel("Return (%)")
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

This bar chart was chosen because it effectively displays the monthly return percentages over time, making it easy to compare performance across different months.

This visualization helps identify periods of strong growth and decline, revealing seasonal trends or significant market movements.

The use of percentage changes (rather than absolute values) allows for a better understanding of relative stock performance.

##### 2. What is/are the insight(s) found from the chart?

Months with high positive returns indicate strong stock performance, possibly due to market optimism, earnings reports, or external factors.

Months with negative returns highlight downturns, which may be caused by economic conditions, company-specific issues, or broader market trends.

If the returns fluctuate significantly, it suggests high volatility, which could be risky for investors. A pattern of consistent positive or negative returns can indicate a long-term trend in the stock's performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

Understanding which months tend to perform well can help investors time their buying and selling decisions strategically.

If the company identifies reasons for strong months, it can use that insight to optimize business operations or marketing strategies.

Negative Growth Insights:

A trend of declining monthly returns suggests weakening stock performance, possibly leading to a loss of investor confidence.

High volatility in monthly returns may discourage risk-averse investors, making it harder to attract stable long-term investments.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
#Autocorrelation Plot (Stock Price Dependencies)
plt.figure(figsize=(12, 6))
pd.plotting.autocorrelation_plot(df["Close"])
plt.title("Autocorrelation of Closing Prices")
plt.show()

##### 1. Why did you pick the specific chart?

An autocorrelation plot was chosen because it helps analyze the relationship between a stock’s closing prices over time.

It shows how past prices influence future prices, which is crucial for trend analysis and forecasting.

This type of chart helps identify seasonal trends, momentum, or mean-reverting behavior in stock prices.

##### 2. What is/are the insight(s) found from the chart?

High positive autocorrelation at short lags (e.g., 1–5 days) suggests strong momentum, meaning past prices have a significant impact on future prices.

Low or negative autocorrelation indicates that price changes are more random, making predictions harder.

If periodic peaks appear, it could signal seasonal trends, where prices follow a recurring pattern over weeks or months.

A gradual decline in autocorrelation suggests that past trends fade over time, meaning price movements are less predictable as the time gap increases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

If autocorrelation is high, traders can use trend-following strategies to profit from momentum.

Identifying seasonal cycles helps businesses and investors time their trades more effectively.

Negative Growth Insights:

If the stock shows no autocorrelation, it suggests a highly unpredictable price pattern, making it risky for long-term investors.

A sudden drop in autocorrelation may indicate market instability, causing uncertainty and reducing investor confidence.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(8, 6))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap of Stock Prices")
plt.show()

##### 1. Why did you pick the specific chart?

A correlation heatmap was chosen because it provides a clear visual representation of the relationships between different stock price attributes (Open, High, Low, Close, Volume, etc.).

It helps identify strong positive or negative correlations, allowing for better analysis of how different factors influence stock price movements. The color gradient makes it easier to spot patterns compared to a traditional correlation matrix.

##### 2. What is/are the insight(s) found from the chart?

A high correlation (close to +1) between Open, High, Low, and Close prices indicates that these variables move together, meaning the stock follows a predictable intraday price pattern.

A negative correlation (close to -1) with volume could suggest that price increases when trading volume is low, or vice versa.

A weak correlation (close to 0) between volume and closing price indicates that trading activity does not directly impact price changes.

If High and Close prices are almost perfectly correlated, it means the stock often closes near its daily high, which suggests bullish sentiment in the market.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
plt.figure(figsize=(12, 8),)
sns.pairplot(df[["Open", "High", "Low", "Close"]],)
plt.show()

##### 1. Why did you pick the specific chart?

This pairplot was chosen because it provides a detailed overview of relationships between multiple numerical variables (Open, High, Low, Close).

It helps visualize scatter plots for each pair of variables and their distribution in a diagonal histogram.

This allows for spotting linear relationships, clusters, and potential anomalies in stock price movements.

##### 2. What is/are the insight(s) found from the chart?

Strong positive relationships between Open, High, Low, and Close prices indicate that these variables move together, which is expected in stock price behavior.

The scatter plots can reveal price trends, such as whether higher opening prices tend to lead to higher closing prices.

Non-linear relationships or clusters may indicate periods of volatility or shifts in market behavior.

The diagonal histograms show the distribution of each variable, highlighting whether prices follow a normal distribution or have skewness.

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.My first hypothesis for this YEs bank dataset is “If the stock opens high, it probably closes high too."

H0: There is no linear relationship between the opening and closing prices of the stock.

H1: There is a linear relationship between the opening and closing prices of the stock.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
from scipy.stats import pearsonr
df_clean = df[['Open', 'Close']].dropna()

#Perform Pearson correlation test
correlation, p_value = pearsonr(df_clean['Open'], df_clean['Close'])

# Print the results
print("Pearson Correlation Coefficient (r):", correlation)
print(" P-value:", p_value)

# Interpret the result
a= 0.05
if p_value < a:
    print(" P-value is less than 0.05 → Reject the null hypothesis.")
    print("Conclusion: There is a statistically significant linear relationship between Open and Close prices.")
else:
    print(" P-value is greater than 0.05 → Fail to reject the null hypothesis.")
    print("Conclusion: No significant linear relationship between Open and Close prices.")


##### Which statistical test have you done to obtain P-Value?

Pearson Correlation Coefficient Test was used.

##### Why did you choose the specific statistical test?

Because it checks clearly the linar realationship between Two continuos variables.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

“There must have been a few months where the stock price either shot up or crashed badly — I can feel the volatility.”

H0:	Monthly returns are consistent and not highly variable (low volatility)

H1:	Monthly returns show significant variability (high volatility)

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
# Calculate Monthly Returns
from scipy.stats import ttest_1samp
df['Monthly Return'] = df['Close'].pct_change() * 100

# Drop missing values
returns = df['Monthly Return'].dropna()

# Perform a One-Sample T-test
t_stat, p_value = ttest_1samp(returns, 0)

# Display results
print("T-Statistic:", t_stat)
print("P-Value:", p_value)

# Interpret the result
a = 0.05
if p_value < a:
    print("P-value < 0.05 → Reject the null hypothesis.")
    print("Conclusion: The monthly returns are significantly volatile.")
else:
    print("P-value ≥ 0.05 → Fail to reject the null hypothesis.")
    print("Conclusion: There is no significant volatility in the monthly returns.")


##### Which statistical test have you done to obtain P-Value?

I have performed a One-Sample T-Test using the scipy.stats.ttest_1samp() function.

##### Why did you choose the specific statistical test?

I wanted to check if YES Bank's monthly returns are actually fluctuating a lot, or if they’re just hovering around the same average.

So I used a t-test to see if the average return is meaningfully different from zero.

If it’s far from zero, that tells me the stock is volatile — meaning investors could see big gains or losses instead of stable performance.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

My third Hypothesis is “I feel like most of the time, YES Bank’s closing price stayed within a regular range — but there were a few times where the price spiked really high or crashed really low.”

H0: There are no significant outliers in the closing prices

H1:There are significant outliers in the closing prices.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
from scipy.stats import zscore, norm

# Step 1: Calculate Z-scores for Closing Price
df['Z_Close'] = zscore(df['Close'])

# Step 2: Calculate p-values (2-tailed)
df['P_Value'] = 2 * (1 - norm.cdf(abs(df['Z_Close'])))

# Step 3: Filter out outliers (p-value < 0.05)
a = 0.05
outliers = df[df['P_Value'] < a]

# Step 4: Show results
print("Total Outliers Detected:", len(outliers))
outliers[['Close', 'Z_Close', 'P_Value']].head()
# Interpret the result
a = 0.05
if p_value < a:
    print("P-value < 0.05 → Reject the null hypothesis.")
    print("Conclusion: There are no significant outliers in the closing prices")
else:
    print("P-value ≥ 0.05 → Fail to reject the null hypothesis.")
    print("Conclusion: There are significant outliers in the closing prices.")

##### Which statistical test have you done to obtain P-Value?

I performed a Z-test using the Z-score method to detect statistically significant outliers in the closing prices of YES Bank stock.Answer Here.

##### Why did you choose the specific statistical test?

I wanted to find out if any closing prices were unusually high or low compared to the rest of the data.

The Z-test calculates how far each value is from the average.

Then I used the p-value to measure how rare or extreme that value is under a normal distribution.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

- Monitor Volatility and Outliers Regularly.
- Track Monthly and Seasonal Trends.
- Use Moving Averages for Trend Analysis.
- Focus on Stability and Risk Zones.
- Maintain a Clean and Structured Historical Data Repository

# **Conclusion**

In this data science project, I performed an in-depth **Exploratory Data Analysis (EDA)** on YES Bank's historical stock price data.

Key steps included:
- Cleaning and preparing the dataset
- Creating over **15+ visualizations** to uncover trends, patterns, and anomalies in stock movement
- Handling **missing values and outliers**
- Extracting business-relevant insights from various charts such as line plots, box plots, violin plots, KDE, moving averages, and correlation heatmaps

Through this analysis, I was able to:
- Understand the **volatility and growth pattern** of YES Bank stock over time
- Identify **significant outliers** and price fluctuations
- Observe **monthly and yearly trends** that could help in future decision-making

Write the conclusion here.

### ***Hurrah! You have successfully completed your Data Science Capstone Project !!!***