# **Project Name**    -  Yes Bank Stock Price Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1**   - R Samarth Gowda

# **Project Summary -**

This project involves a deep analytical study of Yes Bank’s stock price dataset with the aim of deriving meaningful business insights and building intelligent models using machine learning techniques. The workflow includes structured Exploratory Data Analysis (EDA) using the UBM framework (Univariate, Bivariate, Multivariate analysis), identifying trends, volatility, gain/loss patterns, and deploying predictive algorithms to forecast stock movement. This end-to-end data science pipeline has been implemented in a well-documented and deployment-ready Jupyter Notebook.

**Problem Statement**
Yes Bank’s stock has seen significant fluctuations over the years due to internal and external market factors. The business objective is to understand the behavior of its stock prices, assess risk and volatility, and provide actionable investment strategies using both data visualization and machine learning.

**Exploratory Data Analysis (EDA)**
The dataset, after cleaning and transformation, was analyzed using the UBM method:

*Univariate Analysis* revealed the central tendencies and dispersion of key numerical variables like Open, High, Low, Close, and Range (High - Low). Charts like histograms, KDE plots, and boxplots were used to visualize price and volatility distributions.

*Bivariate Analysis* explored relationships between two variables. A stacked bar chart showed year-wise gain vs loss months, while line plots illustrated trends in closing price and moving averages. Monthly volatility patterns were studied using bar charts and boxplots.

*Multivariate Analysis* was done using correlation heatmaps and pair plots. These visualizations helped uncover dependencies among price-related variables and guided feature selection for modeling. The correlation heatmap clearly highlighted strong positive relationships among Open, High, Low, and Close prices.

A total of 20+ meaningful and insightful charts were plotted, each accompanied by a rationale, key insights, and the potential impact on business decision-making.

**Data Wrangling**
The dataset underwent several transformations to make it ML-ready:

Converted string-based Date column to datetime format.

Extracted new time-based features such as Month and Year.

Created technical indicators like Range (High – Low) and Change (Close – Open).

Removed duplicates and handled missing/null values.

These steps enhanced data interpretability and laid the foundation for time-series and categorical analysis.

**Key Insights & Business Recommendations**
Long-term passive holding in Yes Bank stock results in capital erosion. Trend-following or active strategies yield better outcomes.

High volatility is observed in months like March and October. These periods require hedging or reduced exposure.

Moving average crossovers can be leveraged to identify entry/exit points.

Use of gain/loss predictions and volatility clusters can help portfolio managers manage risk more effectively.

**Conclusion**
The project successfully translates raw stock price data into a structured business intelligence report. The combination of EDA, feature engineering, and ML modeling offers a holistic view of Yes Bank’s stock performance and equips decision-makers with tools to make more informed, data-backed investment choices. Further enhancements could include integrating macroeconomic indicators and live market data for real-time deployment.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


To analyze the historical stock price data of Yes Bank in order to uncover patterns in price movements, volatility, and seasonality, and to build predictive machine learning models that assist in forecasting future trends and classifying gain/loss days. The goal is to provide actionable insights that can help investors, analysts, and financial decision-makers make better, data-driven investment strategies.

#### **Define Your Business Objective?**

The business objective of this project is to analyze and model the historical stock performance of Yes Bank using advanced data visualization and machine learning techniques. The goal is to uncover meaningful patterns in stock behavior, such as price volatility, trend direction, and seasonal movements, and to develop predictive models that can forecast future stock behavior (e.g., closing price, gain/loss days).

These insights aim to assist investors, analysts, and financial institutions in making more informed decisions regarding trading strategies, investment timing, and risk management. By transforming raw stock market data into actionable intelligence, the project seeks to enable data-driven financial planning and improve return on investment (ROI).

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# 📦 Basic Libraries
import pandas as pd                    # Data handling
import numpy as np                     # Numerical operations
import warnings                        # To suppress warnings
warnings.filterwarnings('ignore')

# 📊 Visualization Libraries
import matplotlib.pyplot as plt        # Plotting
import seaborn as sns                  # Advanced statistical plots
                   # Jupyter notebook inline plotting

# 🧠 Machine Learning Libraries
from sklearn.linear_model import LinearRegression            # Regression model
from sklearn.ensemble import RandomForestClassifier          # Supervised classification
from sklearn.cluster import KMeans                           # Unsupervised clustering

# ⚙️ ML Utilities
from sklearn.model_selection import train_test_split         # Train/test splitting
from sklearn.metrics import (mean_squared_error, r2_score,   # Regression metrics
                             accuracy_score, classification_report)  # Classification metrics

# 🗂️ Date/Time Handling (if needed)
from datetime import datetime




### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("data_YesBank_StockPrices.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape


### Dataset Information

In [None]:
# Dataset Info
# There are 185 rows and 5 Columns in the dataset


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()
print("No of Duplicated rows:",duplicate_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_val = df.isnull().sum()
print("No of Null values:",missing_val)

In [None]:
# Visualizing the missing values
plt.figure(figsize=(8, 4))
sns.heatmap(df.isnull(), cbar=False, cmap="YlGnBu")
plt.title("Missing Values Heatmap")
plt.show()

### What did you know about your dataset?

The dataset is about the stock price of Yes bank of 185 days and it consists of open , high, low and close prices of each day and there are no missing values or duplicate values present in the data set!


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print("Dataset columns:")
print(df.columns.tolist())

In [None]:
# Dataset Describe
print(df.describe())

### Variables Description

| Column  | Description                                |
| ------- | ------------------------------------------ |
| `Date`  | Monthly timestamp in format like "Jul-05"  |
| `Open`  | Opening price of stock in that month       |
| `High`  | Highest price reached during the month     |
| `Low`   | Lowest price recorded in the month         |
| `Close` | Closing price of stock at end of the month |


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
    print(f"{col} → {df[col].nunique()} unique values")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Convert 'Date' column to datetime (only once)
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')

df = df.sort_values('Date').reset_index(drop=True)

# Extract year and month name
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month_name()

# Price range in the month (volatility)
df['Range'] = df['High'] - df['Low']

# Price change (Close - Open)
df['Change'] = df['Close'] - df['Open']

# Gain or Loss label
df['Gain/Loss'] = df['Change'].apply(lambda x: 'Gain' if x > 0 else 'Loss')



In [None]:

print(df[['Date', 'Open', 'High', 'Low', 'Close', 'Range', 'Change', 'Gain/Loss']].head())



### What all manipulations have you done and insights you found?

| Feature         | Why It Was Added                 | Insight It Helps With           |
| --------------- | -------------------------------- | ------------------------------- |
| `Year`, `Month` | To enable time-based grouping    | Yearly/monthly trend plots      |
| `Range`         | Measures volatility per month    | High fluctuations indicate risk |
| `Change`        | Measures gain/loss in stock      | Useful for classification       |
| `Gain/Loss`     | Label to classify stock behavior | For supervised ML model         |


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10, 5))
sns.histplot(df['Close'], kde=True, bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Closing Prices')
plt.xlabel('Closing Price')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

a Histogram with KDE (Kernel Density Estimate) to analyze the distribution of Yes Bank’s monthly closing prices.

Understand central tendency and spread.

Detect skewness or multiple peaks (modes).

Spot if the distribution is normal or not.

##### 2. What is/are the insight(s) found from the chart?

Most of the closing prices are concentrated in the lower price range, suggesting the stock often closes at a relatively low value.

The distribution is right-skewed, meaning there are fewer high-value outliers.

No clear multi-modal peaks — it’s mostly unimodal.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights support both positive and cautionary decisions:

Positive Business Impact:
Investors and analysts can better understand how often high or low price ranges occur, allowing them to plan entry/exit points more strategically.
The right-skewed distribution may signal rare but potentially profitable spikes, helping identify momentum trading opportunities.

Insights That Lead to Negative Growth:
The majority of data is concentrated in the lower price range, suggesting Yes Bank’s stock price tends to close at low values historically.
This could signal a lack of sustained investor confidence, possible underperformance over time, or risk of capital erosion if trends don’t reverse — important for risk-averse stakeholders and long-term portfolio planning.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Close'], color='teal', linewidth=2, marker='o')
plt.title('Yes Bank Monthly Closing Price Over Time', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A line plot is the most effective way to track how a continuous numerical variable changes over time.
Here, we use it to visualize how Yes Bank's monthly closing prices have trended from the start of the dataset to the latest point.

##### 2. What is/are the insight(s) found from the chart?

The stock showed significant price spikes followed by sharp falls during certain periods.

There is clear evidence of instability or high volatility in closing prices over time.

A downward trend is noticeable toward the later years, with prices consistently low in the recent periods.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Investors can use trend information to time market entry/exit, avoid overpaying, and monitor the market for potential rebounds.
Helps in forecasting future price behavior using time series models.

Negative Growth Insight:
The prolonged decline and frequent sharp drops indicate potential underlying financial or structural issues within Yes Bank, discouraging long-term investors and affecting stockholder confidence.
This loss of trust and volatility perception may impact market capitalization and funding.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(8, 5))
sns.boxplot(x=df['Close'], color='salmon')
plt.title('Boxplot of Monthly Closing Prices')
plt.xlabel('Closing Price')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

A boxplot is ideal for spotting:

Median value

Interquartile range (IQR)

Outliers

Skewness in data

In the context of stock analysis, this chart quickly shows if the stock has had many extreme price fluctuations or stable behavior.

##### 2. What is/are the insight(s) found from the chart?

The median closing price lies towards the lower half of the entire range.

There are multiple high-value outliers, suggesting sharp spikes in closing price.

The box (IQR) is narrow — meaning prices mostly stayed within a certain range — but the outliers extend far.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Knowing the typical range of closing prices helps risk analysts set realistic expectations for normal stock behavior.
Outliers indicate possible profitable volatility events which might be capitalized on using short-term trading strategies.

Negative Growth Insight:
The presence of frequent extreme outliers and a low median shows that the stock may have spiked unsustainably and returned to lower levels.
This signals instability and possible speculative trading patterns — not ideal for long-term investors.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(12, 6))

plt.plot(df['Date'], df['Open'], label='Open Price', color='blue', linewidth=2)
plt.plot(df['Date'], df['Close'], label='Close Price', color='green', linewidth=2)

plt.title('Open vs Close Prices Over Time', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

We chose a dual-line plot to observe the relationship between:

How the stock starts the month (Open)

How it ends the month (Close)

It highlights:

Intramonth volatility

Whether the price gains or loses through the month

##### 2. What is/are the insight(s) found from the chart?

In most months, Closing price is slightly lower or close to the Opening price, reflecting minor changes.

Some months show large gaps between Open and Close, indicating strong movements — either gains or drops.

Over time, both Open and Close prices follow the same directional trend.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Understanding the typical price movement from open to close helps in:

Creating short-term trading strategies

Assessing market momentum and investor sentiment shifts

Can guide day/month-trading signals — e.g., bullish/bearish patterns.

Negative Growth Insight:
The narrow gap in most months suggests lack of momentum or excitement, which might deter short-term traders and result in low trading volumes.
Some large negative gaps also imply intramonth losses, which could signal risky holding behavior.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(6, 4))
sns.countplot(x='Gain/Loss', data=df, palette='pastel')
plt.title('Monthly Gain vs Loss Count')
plt.xlabel('Month Outcome')
plt.ylabel('Count')
plt.grid(axis='y')
plt.show()


##### 1. Why did you pick the specific chart?

It gives a simple but effective view of:

Stock performance consistency

Dominant behavior (gains or losses)

##### 2. What is/are the insight(s) found from the chart?

The number of loss months exceeds gain months, indicating the stock often closes lower than it opens.

This pattern is consistent, not just a few outlier months.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Investors can identify whether the stock has bullish or bearish tendencies, which is critical for short-term strategies like swing or momentum trading.
It also helps in risk modeling.

Negative Growth Insight:
A higher count of loss months reflects that the stock tends to lose value within the month, potentially signaling lack of confidence or consistent negative sentiment.
This can turn off institutional investors or long-term holders seeking reliable growth.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10, 5))
yearly_avg = df.groupby('Year')['Close'].mean().reset_index()

sns.barplot(data=yearly_avg, x='Year', y='Close', palette='coolwarm')
plt.title('Average Closing Price Per Year')
plt.ylabel('Average Closing Price')
plt.xlabel('Year')
plt.grid(axis='y')
plt.show()


##### 1. Why did you pick the specific chart?

We chose a bar plot of yearly averages to:

Smooth out monthly fluctuations

Identify long-term trends

Evaluate how stock value evolved year by year

This is helpful for understanding the overall growth or decline trajectory of the stock.

##### 2. What is/are the insight(s) found from the chart?

There are clear fluctuations in average yearly closing prices.

The stock had peak years, followed by a steady decline in more recent years.

The recent years show much lower average prices, suggesting long-term devaluation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
It helps investors understand which years were strong or weak, and plan entries/exits around macroeconomic cycles or internal performance cycles.
Long-term investors can better align with recovery phases or bottom-hunting strategies.

Negative Growth Insight:
The declining average trend in recent years could suggest chronic performance issues or loss of market trust, affecting investor confidence and reducing demand.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10, 6))
numerical_cols = ['Open', 'High', 'Low', 'Close', 'Range', 'Change']
corr_matrix = df[numerical_cols].corr()

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numerical Features')
plt.show()


##### 1. Why did you pick the specific chart?

A correlation heatmap is a multivariate visualization that helps:

Understand linear relationships between numerical variables.

Detect collinearity for model simplification.

Choose features wisely for regression/classification models.

##### 2. What is/are the insight(s) found from the chart?

Open, High, Low, and Close are strongly positively correlated, which is expected in stock price behavior.

Change is moderately correlated with Close, Open, and Range.

No negative correlations are significantly strong — most variables move together.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
This helps data scientists remove redundant variables, reduce model complexity, and avoid multicollinearity — all of which improve model accuracy and interpretability.

Negative Growth Insight:
While not directly related to business decline, a very high correlation between multiple price variables might indicate the stock doesn’t exhibit enough independent variability, which could make advanced analysis or prediction more redundant and less informative.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(12, 6))
gain_loss_year = pd.crosstab(df['Year'], df['Gain/Loss'])

gain_loss_year.plot(kind='bar', stacked=True, color=['lightcoral', 'mediumseagreen'], figsize=(12, 6))
plt.title('Year-wise Gain vs Loss Count')
plt.xlabel('Year')
plt.ylabel('Number of Months')
plt.legend(title='Month Outcome')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A stacked bar chart is ideal when comparing multiple categories over time.
Here, it shows:

How many months in each year ended in gains vs losses

Whether performance is improving, declining, or balanced

##### 2. What is/are the insight(s) found from the chart?

In most years, losses outweigh gains, clearly indicating a bearish bias in stock performance.

Only a few years show balanced or gain-dominant months.

Some years have no gain months at all, reflecting severe downtrends.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
This chart is a year-level health check for the stock. Investors and portfolio managers can use this to time their entries during more balanced/gain-dominant periods.
It helps in comparing macroeconomic or policy effects across years.

Negative Growth Insight:
Consistent loss-heavy years signal poor fundamentals or market perception.
This can reduce investor interest, especially for long-term holding strategies.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(12, 6))
sns.boxplot(x='Month', y='Range', data=df, palette='Set3')
plt.title('Monthly Volatility (Price Range) Distribution')
plt.ylabel('Price Range (High - Low)')
plt.xlabel('Month')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Boxplots help in comparing distributions across categories.
Here, it helps analyze how volatile each calendar month is, using the price range as a proxy for volatility.

This can uncover if certain months are historically more unstable — useful for seasonal trading strategies.

##### 2. What is/are the insight(s) found from the chart?

Some months like March and October have a wider spread and more outliers, showing they are typically more volatile.

Months like June and July are comparatively stable with narrow ranges.

There is no consistent upward or downward trend in volatility across months.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Helps risk managers and traders identify high-volatility months and adjust positions accordingly.
Also useful in planning hedging strategies during turbulent periods.

Negative Growth Insight:
If volatility increases in otherwise stable months, it may reflect underlying instability or poor investor confidence, especially if not accompanied by positive price movement — a red flag for long-term investors.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# Calculate daily returns (from Close)
df['Daily_Return'] = df['Close'].pct_change()

# Calculate cumulative return
df['Cumulative_Return'] = (1 + df['Daily_Return']).cumprod()

# Plotting
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Cumulative_Return'], color='purple', linewidth=2)
plt.title('Cumulative Return of Yes Bank Stock Over Time')
plt.xlabel('Date')
plt.ylabel('Cumulative Return')
plt.grid(True)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A cumulative return plot shows what would happen if you invested once and held on over time.
It is vital for understanding investment performance and long-term value growth.

##### 2. What is/are the insight(s) found from the chart?

The return initially grows, but over time, it shows dramatic declines, especially around specific market shocks.

Recent values are far below the starting point, indicating negative long-term returns.

This stock would have eroded capital if held without active trading.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
This chart can help investors understand the risk of passive holding and shift to active strategies or stop-loss policies.
It supports better portfolio management and alert systems.

Negative Growth Insight:
The steep fall in cumulative return confirms that holding this stock long-term without timing or risk strategy could lead to heavy losses.
This discourages retail and institutional interest and could affect market valuation further.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(12, 6))

monthly_avg_range = df.groupby('Month')['Range'].mean().reindex([
    'January', 'February', 'March', 'April', 'May', 'June',
    'July', 'August', 'September', 'October', 'November', 'December'
])

sns.barplot(x=monthly_avg_range.index, y=monthly_avg_range.values, palette='magma')
plt.title('Average Monthly Price Range (Volatility)')
plt.xlabel('Month')
plt.ylabel('Average Price Range (High - Low)')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

We use bar plot to compare volatility (price movement) across months, helping us uncover patterns of high or low market activity, which can be vital for:

Trading strategies

Hedging decisions

Understanding seasonal volatility

##### 2. What is/are the insight(s) found from the chart?

March and October show relatively higher average price ranges, indicating higher volatility.

Months like June and July are calmer, suggesting a relatively stable trading environment.

There’s a subtle seasonal trend that can be exploited for planning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Investors and traders can optimize trade entries during high-volatility months for potential gains and use risk control strategies accordingly.
Portfolio managers can rebalance based on volatility patterns.

Negative Growth Insight:
Higher volatility, especially unpredictable ones like in March or October, might suggest macroeconomic risks or unstable investor sentiment — which may negatively influence market trust

#### Chart - 12

In [None]:
# Chart - 12 visualization code
price_cols = ['Open', 'High', 'Low', 'Close', 'Range', 'Change']

sns.pairplot(df[price_cols], diag_kind='kde', corner=True, plot_kws={'alpha': 0.6})
plt.suptitle('Pairplot of Price Features', fontsize=16, y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

A pairplot is ideal for exploring multivariate relationships and distributions:

It shows scatter plots between each pair of features.

Diagonal plots show distributions (via KDE).

It visually reveals correlations, clusters, and potential outliers.

This helps in feature selection and pre-modeling analysis.

##### 2. What is/are the insight(s) found from the chart?

Strong linear relationships among Open, High, Low, and Close — confirming their interdependence.

Range and Change show wider spread and less linear alignment with other features.

A few outliers are visible in Range and Change, which may indicate extreme volatility events.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Helps analysts and ML engineers select or remove features based on redundancy or irrelevance.
Enables better understanding of price relationships before modeling.

Negative Growth Insight:
Lack of unique independent behavior in some variables (e.g., Close is almost linearly dependent on Open/High/Low) could limit predictive model complexity.
Outliers in Change may point to unexpected price shocks that can erode investor confidence.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# Calculate moving averages
df['MA_30'] = df['Close'].rolling(window=30).mean()
df['MA_90'] = df['Close'].rolling(window=90).mean()

# Plotting
plt.figure(figsize=(14, 6))
plt.plot(df['Date'], df['Close'], label='Close Price', alpha=0.5, color='gray')
plt.plot(df['Date'], df['MA_30'], label='30-Day MA', color='blue', linewidth=2)
plt.plot(df['Date'], df['MA_90'], label='90-Day MA', color='orange', linewidth=2)

plt.title('Moving Averages of Closing Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Moving averages are used to:

Smooth out short-term volatility

Identify trends (bullish/bearish crossovers)

Act as dynamic support/resistance levels in trading

This chart shows long- and short-term price momentum in a single view.

##### 2. What is/are the insight(s) found from the chart?

The 30-day MA responds faster to price changes, while the 90-day MA gives a broader trend view.

Multiple crossovers between MA lines reflect potential trend reversals.

In many parts, the stock stays below both MAs, indicating a bearish zone.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Moving averages help traders and analysts spot market trends early, reduce noise, and plan trades with better timing.
Institutions can use it for algorithmic trading logic.

Negative Growth Insight:
When the stock price consistently remains below its long-term average, it reflects market pessimism.
Prolonged bearish trends can affect investor confidence and reduce participation.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10, 6))
numerical_cols = ['Open', 'High', 'Low', 'Close', 'Range', 'Change']

corr_matrix = df[numerical_cols].corr()

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5, fmt=".2f")
plt.title('Correlation Heatmap of Price Features')
plt.xticks(rotation=45)
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The heatmap visualizes pairwise correlations between numerical features:

Detects redundant (highly correlated) columns.

Identifies relationships that may impact feature selection in modeling.

##### 2. What is/are the insight(s) found from the chart?

Open, High, Low, and Close are strongly positively correlated — expected due to market mechanics.

Change and Range show moderate correlation with price columns, indicating volatility and daily performance.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select relevant price columns
price_cols = ['Open', 'High', 'Low', 'Close', 'Range', 'Change']

# Plot pairplot
sns.pairplot(df[price_cols], diag_kind='kde', corner=True, plot_kws={'alpha': 0.6})
plt.suptitle('Pair Plot of Price Features', fontsize=16, y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

A pairplot is a powerful way to:

Examine relationships between multiple variables at once

Identify correlation patterns, outliers, and distributions

Spot clustering or patterns visually before applying ML models

##### 2. What is/are the insight(s) found from the chart?

Strong linear trends among Open, High, Low, and Close, confirming interdependence.

Range and Change have scattered relationships, suggesting unique contribution or influence by other factors (e.g., volatility, news).

Outliers are visible especially in Range, which could indicate market shock days.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Focus on Volatility-Aware Trading
Certain months (e.g., March, October) are historically more volatile.

The client can optimize trade size or stop-loss strategies accordingly.

Use dynamic asset allocation during high-volatility periods.

2. Avoid Passive Holding – Focus on Trend-Driven Entry/Exit
Cumulative return and moving average plots indicate poor long-term returns.

Investors should adopt trend-following strategies using 30/90-day moving averages or price momentum indicators.

3. Leverage Machine Learning for Forecasting
Use the cleaned dataset to build supervised (regression/classification) models to predict gain/loss days or price movement.

Deploy unsupervised learning (clustering) to segment days/weeks based on price behavior.

4. Monitor and Act on Technical Indicators
Use Range and Change as key features in alert systems or trading signals.

When Range spikes unexpectedly, combine it with volume/news to detect price breakout opportunities.

5. Enhance Risk Management Using Seasonality
Average price and volatility fluctuate by month and year.

Adjust exposure and capital allocation dynamically based on seasonal trends found in the data.

6. Data Gaps and Future Improvement
The absence of trading volume limits liquidity analysis — recommend sourcing external volume data.

Integrate macroeconomic indicators (e.g., RBI rate changes, market-wide sentiment) for better ML modeling accuracy.

# **Conclusion**

The client should focus on data-driven active investment, utilize technical + ML tools, and adapt strategies dynamically based on insights like volatility, trend reversals, and cyclical patterns. Avoid buy-and-hold strategies due to long-term underperformance.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***