# **Project Name**    -Yes bank stock closing price prediction



##### **Project Type**    - ML Project Regression



##### **Contribution**    - Individual


# **Project Summary -**

This project aims to predict the monthly closing prices of Yes Bank's stock. In the context of significant events involving the bank, we'll leverage time series models and data analysis to explore how these events impact stock prices. We'll use Python libraries like Pandas, Matplotlib, NumPy, and Scikit-Learn for data manipulation, visualization, and model building. The goal is to gain insights into stock price behavior and create predictive models for informed decision-making in financial markets.

# **GitHub Link -**

# **Problem Statement**


Our task is to develop a predictive model that can forecast the monthly closing prices of Yes Bank's stock. In the context of significant financial events, we aim to understand and quantify the impact of these events on stock prices. Leveraging historical data and machine learning techniques, we will provide a tool for stakeholders to make data-driven decisions in the dynamic world of financial markets.

#### **Define Your Business Objective?**

The primary goal of this project is to create a robust predictive model that can accurately forecast the monthly closing prices of Yes Bank's stock. By achieving this, we aim to provide valuable insights for investors, traders, and financial analysts, enabling them to make informed decisions in a dynamic and potentially volatile financial landscape. This predictive tool will help stakeholders anticipate market trends and navigate the impact of significant financial events, ultimately optimizing their investment strategies.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
import nltk
from scipy import stats
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.preprocessing import StandardScaler, OneHotEncoder, MinMaxScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.decomposition import PCA
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from scipy.stats import chi2_contingency
import re
import spacy
from nltk.stem import WordNetLemmatizer
!pip install contractions
import contractions
import string
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import contractions

nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
csv_file_path = '/content/drive/MyDrive/Project/Yes bank stock closing price prediction/data_YesBank_StockPrices.csv'

dataset = pd.read_csv(csv_file_path)

### Dataset First View

In [None]:
# Dataset First Look
dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
dataset.shape

### Dataset Information

In [None]:
# Dataset Info
dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(dataset[dataset.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(dataset.isnull().sum())

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(dataset.isnull(), cbar=False)

### What did you know about your dataset?

I have gained valuable insights into our dataset through exploratory analysis. The dataset consists of historical monthly stock price data for Yes Bank, covering a total of 185 data points. It comprises five columns, including 'Date,' 'Open,' 'High,' 'Low,' and 'Close.' The 'Date' column is of object data type, while the other four columns are represented as float64, indicating numerical data with decimal points. Importantly, I have confirmed that there are no missing values in the dataset, as evidenced by both the summary statistics and a heatmap visualization. Furthermore, I found that there are no duplicate rows, signifying a clean and unique dataset. This foundational understanding of the dataset has laid the groundwork for subsequent data analysis and predictive modeling.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
dataset.columns

In [None]:
# Dataset Describe
dataset.describe(include='all')

### Variables Description

Date:

Count: There are 185 records in this column, each representing a unique date.
Unique: All dates are distinct, indicating that there are no duplicate entries.
Top: The most frequently occurring date is 'Jul-05'.
Frequency: 'Jul-05' appears only once in the dataset, and all other dates occur only once. Since 'Date' represents the time component in our time series data, statistics like mean and quartiles are not applicable.

Open:

Count: There are 185 records in this column.
Mean: The average opening price is approximately 105.54.
Standard Deviation: Opening prices exhibit significant variability with a standard deviation of about 98.88.
Min: The lowest opening price recorded is 10.00.
Max: The highest opening price observed is 369.95.
Quartiles: Opening prices are divided into quartiles, offering insights into their distribution across the dataset.

High:

Similar statistics as 'Open' apply to the 'High' column, including details on the maximum, minimum, mean, and quartiles.

Low:

Similar statistics as 'Open' apply to the 'Low' column, providing information about the range and variability of low prices.

Close:

Similar statistics as 'Open' apply to the 'Close' column, including details on the maximum, minimum, mean, and quartiles.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = {}
for column in dataset.columns:
    unique_values[column] = dataset[column].unique()

# Print unique values
for column, values in unique_values.items():
    print(f"Unique values for '{column}':")
    print(values)
    print("\n")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%b-%y')
# Check for and handle duplicates (if any)
if len(dataset[dataset.duplicated()]) > 0:
    dataset.drop_duplicates(inplace=True)

### What all manipulations have you done and insights you found?

Data Type Conversion: I converted the 'Date' column to a datetime data type. This conversion allows for time series analysis and visualization.

Duplicates: I checked for and removed any duplicate records from the dataset. This ensures that each month's data is unique and not counted multiple times.

Missing Values: Fortunately, there were no missing values in the dataset, so there was no need for imputation or handling missing data.

Outliers: I identified potential outliers in the data by using statistical methods and visualizations. These outliers could be of interest for further investigation, as they may represent significant price fluctuations.

Feature Engineering: Depending on the analysis objectives, I might have performed feature engineering. For example, creating additional features such as rolling averages, month-of-year indicators, or calculating price differentials.

Insights: Some initial insights I gained from the data include the wide range of stock prices over time. The 'Open,' 'High,' 'Low,' and 'Close' prices exhibit significant variations, indicating the volatility in Yes Bank's stock prices. Additionally, there were no missing values, making the dataset clean and ready for analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Create a line plot for the closing price over time
plt.figure(figsize=(12, 6))
plt.plot(dataset['Date'], dataset['Close'], label='Closing Price', color='b')
plt.title('Closing Stock Price Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()
plt.tight_layout()

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Time-Series Data Representation: A line chart is a suitable choice when working with time-series data, as it allows us to display how a numeric variable (closing stock price) changes over a continuous time period (months/years). This chart type effectively conveys the sequential nature of the data.

Trend Identification: A line chart is excellent for identifying trends, patterns, and fluctuations over time. In this case, it helps us assess the long-term movement in Yes Bank's stock price.

Comparative Analysis: With a line chart, we can compare the closing prices at different points in time, making it easier to understand how stock prices have evolved and whether there are any recurring patterns.

Readability: Line charts are easy to interpret, making them suitable for communicating insights to a wide audience. They provide a clear visual representation of the data.

##### 2. What is/are the insight(s) found from the chart?

Long-Term Trend: The chart reveals a clear long-term trend in the closing stock price. The stock price has shown both upward and downward movements over the years, indicating the overall volatility and the influence of various market factors on Yes Bank's stock.

Periodic Fluctuations: The chart suggests that there have been periodic fluctuations in the stock price. These fluctuations may be attributed to economic conditions, financial events, and other external factors that impact the stock market.

Historical Highs and Lows: By examining the peaks and troughs in the line chart, it is possible to identify historical highs and lows in the stock price. These points may correspond to significant events or milestones in the company's history.

Breakout Points: The chart can help identify breakout points, where the stock price significantly deviates from its previous trend. These breakout points may warrant further investigation to understand the underlying causes.

Key Inflection Points: Notable changes in the direction of the line, whether upward or downward, can be observed. These inflection points can be valuable for decision-making and investment strategies.

Seasonal Patterns: Depending on the data and the time frame, seasonal patterns may become apparent. These patterns can be useful for predicting future trends.

Volatility: The presence of sharp peaks and valleys in the chart indicates periods of high volatility. Understanding these volatile periods can be critical for risk management and investment decisions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Investment Decisions: The insights derived from the chart can aid in making informed investment decisions. By identifying historical lows and breakout points, investors and stakeholders can strategize to buy or sell stock at favorable times, potentially increasing profits.

Risk Management: Understanding the historical volatility can help in crafting risk management strategies. By recognizing the periods of high volatility, the company can take proactive measures to minimize risk and protect its investments.

Long-Term Planning: The chart reveals long-term trends. Positive upward trends can instill confidence in investors and potentially attract more investment, leading to capital growth.

Market Sentiment Analysis: The data can be used to gauge market sentiment. Understanding how external factors impact stock price can help the bank anticipate and respond to market shifts effectively.

Negative Growth Impact:

Economic Downturns: If the chart reveals a consistent downward trend or periods of negative growth in the stock price, this can be a cause for concern. It may indicate underlying issues with the bank's financial health, which could negatively impact investor confidence.

Volatility: While volatility can present opportunities for profit, it also poses risks. High volatility can result in sudden and severe price declines, leading to potential financial losses for the bank and its investors.

Negative Sentiment: Prolonged negative trends may contribute to negative sentiment in the market. This can lead to a loss of trust in the bank, making it more challenging to attract investors and maintain a positive image.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Extract year from the 'Date' column and create a new column 'Year'
dataset['Year'] = pd.to_datetime(dataset['Date'], format='%b-%y').dt.year

# Calculate the average closing stock price for each year
average_closing_prices = dataset.groupby('Year')['Close'].mean()

# Create a bar chart
plt.figure(figsize=(12, 6))
plt.bar(average_closing_prices.index, average_closing_prices.values, color='skyblue')
plt.xlabel('Year')
plt.ylabel('Average Closing Stock Price')
plt.title('Average Closing Stock Price of Yes Bank by Year')
plt.xticks(average_closing_prices.index, rotation=45)
plt.tight_layout()

# Show the chart
plt.show()

##### 1. Why did you pick the specific chart?

Yearly Comparison: A bar chart is an effective choice when you want to compare a specific metric, in this case, the average closing stock price, across different years. Each bar represents a single year, making it easy to discern yearly trends and variations.

Clear Data Presentation: Bar charts are straightforward and easy to interpret, making them suitable for presenting summary statistics like averages. They provide a clear and concise way to communicate the data.

Categorical Data: Bar charts are well-suited for visualizing categorical data, in this case, the years. The categorical nature of the x-axis (years) makes it a suitable choice.

Readability: Bar charts are known for their readability, especially when there are multiple categories (years) to display. Each bar is distinct, and the chart is easy to understand at a glance.

##### 2. What is/are the insight(s) found from the chart?

Yearly Trends: The chart shows distinct annual trends in the average closing stock price. For instance, it reveals that the average closing stock price increased from 2005 to 2007, experienced a significant drop in 2008, and then fluctuated in subsequent years.

Volatility and Stability: Some years exhibit greater fluctuations in average stock price, indicating higher volatility. On the other hand, years with relatively stable stock prices are evident. For instance, 2017 and 2018 saw higher volatility compared to 2015 and 2016.

Financial Performance: The chart helps in assessing the overall financial performance of Yes Bank over the years. Significant drops in average closing stock price may correspond to challenging periods for the bank, while upward trends can suggest growth and investor confidence.

Turning Points: The chart highlights potential turning points or critical years in the bank's stock performance. For example, a sharp increase in average stock price in 2016 followed by a decline in 2018 may indicate a significant event or change in market sentiment.

Investment Insights: Investors and stakeholders can use this chart to identify years with favorable average stock prices for potential investment or divestment decisions. Years with lower average prices may present opportunities to buy, while years with higher averages may be suitable for selling.

Long-Term Analysis: The chart supports long-term analysis of Yes Bank's stock performance, allowing stakeholders to evaluate its financial health and growth prospects over the years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Informed Decision-Making: The insights allow stakeholders, including investors and management, to make more informed decisions. For example, they can identify years with historically strong stock performance and potentially take advantage of investment opportunities during those periods.

Risk Mitigation: Understanding the historical volatility in stock prices helps in risk assessment. This insight enables the bank to develop risk mitigation strategies and helps investors make decisions in line with their risk tolerance.

Strategic Planning: The chart reveals periods of growth and decline, aiding in long-term strategic planning. Positive trends can be leveraged for expansion plans, while declines can prompt corrective actions and financial planning.

Negative Business Impact:

Economic Downturn: Years with declining average stock prices, such as 2018, may indicate economic or financial challenges. This insight could be a red flag for the bank and may require closer examination to mitigate negative impacts.

Investor Confidence: Consistently declining stock prices over multiple years may erode investor confidence, potentially leading to reduced capital inflow or stock devaluation.

Market Sentiment: Negative trends in the chart may reflect broader market sentiment. The bank needs to consider how these trends align with market conditions and assess their impact on business operations.

Regulatory Scrutiny: A significant drop in stock prices may attract regulatory scrutiny and could lead to investigations or interventions, potentially impacting the bank negatively.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Data for the line chart
dates = dataset['Date']
closing_prices = dataset['Close']

# Create the line chart
plt.figure(figsize=(12, 6))
plt.plot(dates, closing_prices, marker='o', linestyle='-', color='b')
plt.title('Trend of Yes Bank Closing Stock Prices Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Stock Price')
plt.xticks(dates[::12], rotation=45)  # Show every 12th date for better readability
plt.grid(True)

# Display the chart
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a line chart for visualizing the trend of Yes Bank's closing stock prices over time because it is an effective way to illustrate the historical performance of a single variable (closing stock price) across a continuous timeframe (dates). Here are the reasons for selecting this specific chart:

Time Series Representation: A line chart is ideal for displaying time series data, which is the case here. We want to understand how the closing stock price has changed over different months and years, making it essential to maintain the chronological order of the data.

Trend Identification: A line chart allows for easy identification of trends, whether they are upward, downward, or stable. This is crucial for recognizing patterns and making informed decisions related to investments or business strategies.

Data Point Connectivity: By connecting data points with lines, we can see the continuous flow of stock prices. This helps in visualizing the smoothness or fluctuations in the data.

Comparison: A line chart enables straightforward comparisons between different time periods. We can see how the stock price has evolved from one month to the next, making it useful for performance analysis.

Clarity: Line charts are generally easy to interpret and are a common choice for conveying trends in financial and time series data. They are familiar to a wide range of audiences.

##### 2. What is/are the insight(s) found from the chart?

Overall Performance: The chart shows that Yes Bank's closing stock prices have experienced significant fluctuations over the years, reflecting the dynamic nature of the stock market.

Upward and Downward Trends: Periods of upward trends, where the stock prices increased, are followed by downward trends, indicating volatility in the stock's performance. This suggests that investors and stakeholders should be prepared for market fluctuations.

Historical Peaks and Valleys: The chart highlights specific points where the stock price reached its highest (peaks) and lowest (valleys). These historical extremes can offer insights into potential resistance and support levels, helping traders make decisions.

Short-Term Fluctuations: Within the overall trend, there are noticeable short-term fluctuations, with stock prices rising and falling over the course of a few months. These short-term variations may provide trading opportunities for investors.

Long-Term Growth and Decline: Observing the chart over the entire timeframe, it's evident that Yes Bank's stock price experienced a prolonged growth phase followed by a decline. This long-term perspective is crucial for understanding the stock's historical performance.

Cyclic Behavior: The line chart indicates that the stock price exhibits cyclical patterns, with recurring peaks and troughs. Recognizing these cycles can help investors make informed decisions about entry and exit points.

Market Response: Events such as financial crises, corporate developments, or economic conditions are likely to be responsible for the more significant price movements in the chart. Analyzing these price shifts in relation to real-world events can provide insights into the factors impacting Yes Bank's stock.

Potential Trading Opportunities: Traders and investors can use this historical data to identify patterns and develop trading strategies, such as buying at support levels and selling at resistance levels.

Risk Assessment: Understanding the volatility and trends in the stock's price history is essential for assessing risk and making informed investment decisions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Strategic Decision-Making: Understanding the historical trends and patterns in the stock's performance allows Yes Bank to make informed strategic decisions. They can use this information to plan investments, manage their portfolio, and devise strategies to capitalize on upward trends.

Risk Management: Recognizing the cyclic behavior and historical support and resistance levels can aid in risk management. The bank can implement risk mitigation strategies to protect its investments during downward trends.

Investor Confidence: If Yes Bank can demonstrate a thorough understanding of its stock's historical behavior, it can instill confidence in existing and potential investors. This can positively impact stock performance and attract more investments.

Market Timing: Traders and investors can use the insights to time their market entry and exit points more effectively, potentially leading to higher returns.

Negative Business Impact:

Market Volatility: The chart reveals significant volatility in Yes Bank's stock prices. This can lead to uncertainty and potential negative consequences for investors and stakeholders. If not managed properly, it may deter long-term investments.

Economic Factors: Negative economic factors or external events that contributed to downward trends in the past could recur, affecting the stock's performance adversely. The bank needs to be prepared for such contingencies.

Short-Term Focus: While short-term trading opportunities are evident from the chart, a focus on short-term gains may lead to higher risk and market speculation, potentially detrimental to the bank's long-term stability.

Market Sentiment: The chart reflects the influence of market sentiment on the stock's performance. Negative sentiment can lead to further downward trends, impacting the bank's reputation and investor trust.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Scatter plot for Open and Close prices
plt.figure(figsize=(12, 6))
plt.scatter(dataset['Open'], dataset['Close'], alpha=0.5)
plt.title('Scatter Plot of Open and Close Prices')
plt.xlabel('Open Price')
plt.ylabel('Close Price')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

1. Relationship Visualization: A scatter plot is an excellent choice when you want to visualize the relationship between two continuous variables, in this case, the "Open" and "Close" prices. It allows us to see how these variables are related to each other, providing insights into whether there is a correlation or pattern in their movement.

2. Price Comparison: By plotting the "Open" prices against the "Close" prices for each data point, we can easily observe how the stock's price changes from the opening to the closing of the market. This can help identify trends and patterns in price movements over time.

3. Identification of Outliers: Scatter plots can also highlight any outliers or unusual data points where the "Open" and "Close" prices significantly differ. Outliers could indicate events or factors affecting stock prices on specific dates.

4. Potential Trading Strategies: Traders and investors can use this visualization to identify potential trading opportunities. For instance, they might look for instances where the "Close" price is consistently higher or lower than the "Open" price and use this information in their trading strategies.

5. Quantitative Analysis: Scatter plots provide a quantitative way to assess the relationship between these two variables. It allows us to evaluate the dispersion of data points and identify any clusters or trends.

##### 2. What is/are the insight(s) found from the chart?

1. Positive Correlation: The majority of data points cluster along a diagonal line from the lower left to the upper right, indicating a strong positive correlation between the "Open" and "Close" prices. In other words, when the stock opens at a higher price, it tends to close at a higher price, and vice versa.

2. Few Outliers: While there is a clear positive correlation, some outliers exist. These are data points where the "Open" and "Close" prices deviate significantly from the main cluster. These outliers may represent days with unusual price movements, possibly due to external factors or market events.

3. Trading Range: The scatter plot helps visualize the trading range. The data points at the upper end of the plot represent days when the stock opened and closed at higher prices, indicating potential profit opportunities for traders who bought at the open and sold at the close.

4. Consistency in Price Movement: The data points are relatively close to the diagonal line, suggesting that the stock price movement is consistent. This consistency may be useful for traders and investors to develop strategies based on this stability.

5. Lack of Negative Growth: The plot doesn't reveal any strong negative growth patterns, where the "Open" price is consistently higher than the "Close" price. This is a positive insight for investors, as it suggests that the stock doesn't typically experience significant losses during the trading day.

6. Potential for Technical Analysis: Traders can use this scatter plot as a basis for technical analysis, such as identifying support and resistance levels, determining entry and exit points, and setting stop-loss orders.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

Trading Strategy Development: The strong positive correlation between the "Open" and "Close" prices suggests that Yes Bank's stock tends to follow a consistent pattern. Traders and investors can use this insight to develop effective trading strategies, such as trend-following or mean reversion strategies, to potentially generate profits.

Risk Management: Understanding the stock's trading range and the lack of strong negative growth patterns can help in devising risk management strategies. Investors can set stop-loss orders and risk tolerance levels based on the historical data, reducing potential losses.

Investor Confidence: The consistency in price movement is a positive sign for investors. It indicates that the stock doesn't typically experience drastic declines during a trading day, which can boost investor confidence and attract more investors to the stock.

Negative Growth Impact:

Outliers: While the plot shows a strong positive correlation, the presence of outliers signifies that there are days when the stock experienced significant price deviations from the norm. These outliers may lead to unexpected losses for traders who don't account for such extreme movements.

Overreliance on Historical Data: There is a risk that traders or investors may rely too heavily on historical data to make future predictions. Market conditions can change, and unexpected events can occur, leading to potential losses if traders solely base their decisions on past patterns.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Ensure the 'Date' column is treated as a string
dataset['Year'] = dataset['Date'].astype(str).str.split('-').str[1]

# Creating the box plot
plt.figure(figsize=(12, 6))
sns.boxplot(data=dataset, x='Year', y='High', palette='Blues')
plt.title('Distribution of High Prices by Year (Box Plot)')
plt.xlabel('Year')
plt.ylabel('High Price')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Display the chart
plt.show()

##### 1. Why did you pick the specific chart?


I chose a box plot for this visualization because it is particularly useful for showing the distribution of a continuous variable (in this case, "High" stock prices) across different categories or years (in this case, different years extracted from the "Date" column).

The box plot allows us to quickly understand the central tendency, spread, and presence of outliers in the distribution of high prices for each year. This is important for gaining insights into how high prices have evolved over time and whether there are significant variations or anomalies in different years. The box plot provides a clear visual representation of these statistics, making it an excellent choice for this analysis.


##### 2. What is/are the insight(s) found from the chart?

Yearly High Price Variation: The chart reveals the distribution of high prices for each year. It's evident that there is variation in high prices across different years, with some years having a wider range of prices than others.

Outliers: Outliers are visible as individual data points outside the "whiskers" of the boxes. These outliers represent extreme high prices that are significantly higher than the typical range for a given year. Identifying these outliers can be crucial for understanding unusual market behavior.

Trends and Patterns: The medians (represented by the horizontal lines inside the boxes) for each year provide insights into the central tendency of high prices. By comparing the medians for different years, you can identify trends or shifts in the median high price over time.

High Price Stability: The height of the boxes indicates the interquartile range (IQR), which is a measure of price stability. Smaller boxes suggest that high prices for a particular year are relatively stable and consistent, while larger boxes suggest greater variability.

Yearly Comparison: This chart allows for a quick visual comparison of high price distributions between different years, making it easier to spot anomalies or noteworthy changes in the data.

Potential Investment Insights: Investors can use this information to identify years with more stable high prices, potentially reducing investment risk. Conversely, years with higher price variability might present more significant opportunities or risks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impacts:

Risk Mitigation: Understanding the variation in high prices by year allows businesses and investors to make informed decisions regarding risk management. Years with stable high prices indicate reduced risk, which can be beneficial for long-term investments or financial planning.

Opportunity Identification: On the flip side, years with higher price variability present potential opportunities for traders and investors. Identifying such years can lead to more significant profits for those willing to take on higher risk.

Diversification: Investors can use the insights to diversify their portfolios. By analyzing the historical price variations, they can make strategic investment decisions to include stocks from years with lower price volatility to balance out riskier investments.

Negative Impacts:

Loss Aversion: Businesses and investors might become overly cautious and risk-averse in years with high price volatility. This cautious approach may lead to missed investment opportunities, resulting in potential negative growth or underperformance compared to more proactive competitors.

Overlooking Outliers: Focusing solely on the central tendencies and variability may lead to overlooking valuable insights from outliers. Outliers could indicate significant market events, and ignoring them might result in negative consequences for investment strategies.

Short-Term Focus: High price variability might encourage a short-term focus, leading to more frequent trading and less commitment to long-term investments. While this approach can be profitable, it may also lead to higher transaction costs and reduced investment stability.

#### Chart - 6 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Calculate the correlation matrix
correlation_matrix = dataset.corr()

# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()

##### 1. Why did you pick the specific chart?

I chose to create a correlation heatmap because it's an effective way to visualize the relationships and correlations between numeric variables in the dataset. By using a heatmap, I can quickly identify which pairs of variables are strongly positively or negatively correlated. This helps in understanding how changes in one variable might impact another, which is valuable for our analysis. The colors in the heatmap make it easy to interpret: warmer colors (e.g., red) indicate strong positive correlations, while cooler colors (e.g., blue) represent negative correlations. This chart provides a clear and concise summary of the relationships within the data, aiding in making data-driven decisions and predictions for our business objective.

##### 2. What is/are the insight(s) found from the chart?

High Correlation Between Open and Close Prices: There is a strong positive correlation between the "Open" and "Close" prices. This suggests that the closing price of a month is often close to the opening price, indicating relatively stable performance over time.

High Correlation Between High and Low Prices: The "High" and "Low" prices also exhibit a strong positive correlation. This implies that when the stock price reaches a high point in a month, it tends not to dip significantly below that level during the same month.

Limited Correlation with Date: The "Date" variable shows very low correlation with the numeric variables, which is expected since it represents the month and year. This indicates that the date does not have a significant direct impact on the stock prices.

No Significant Negative Correlations: There are no strong negative correlations between any of the variables, suggesting that changes in one variable do not have a consistently negative impact on another.

#### Chart - 7 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select numeric columns for the Pair Plot
numeric_columns = ['Open', 'High', 'Low', 'Close']

# Create a Pair Plot
sns.pairplot(dataset[numeric_columns])
plt.show()

##### 1. Why did you pick the specific chart?

I chose to create a Pair Plot because it's an effective visualization for understanding the relationships and correlations between multiple numeric variables in a dataset. In this case, our dataset contains numeric columns like 'Open', 'High', 'Low', and 'Close,' and a Pair Plot allows us to simultaneously visualize the pairwise relationships between these columns.

By using a Pair Plot, we can quickly identify patterns, correlations, and potential outliers in the data, which is essential for our analysis. It helps us gain insights into how these variables interact with each other, providing a comprehensive view of the dataset's numerical characteristics. This visualization is particularly valuable for understanding how different stock price parameters, such as the high and low prices, are related to each other and to the closing price.

In summary, the Pair Plot is an appropriate choice to explore the multivariate relationships within the dataset and gain insights into how these variables impact each other, which is crucial for our analysis.

##### 2. What is/are the insight(s) found from the chart?

The Pair Plot visualization reveals several insights from the dataset:

Correlations: We can observe strong positive correlations between 'Open' and 'High' prices, as well as between 'Low' and 'Close' prices. This suggests that when the opening price is high, the highest price during the day is likely to be high as well. Similarly, when the lowest price during the day is low, the closing price tends to be lower.

Scatter Patterns: The scatter plots indicate that there are linear relationships between these price parameters. This suggests that changes in 'Open' prices have a linear impact on 'High' prices, and changes in 'Low' prices have a linear impact on 'Close' prices.

Outliers: We can identify some potential outliers in the data, where the points do not follow the linear trends. These outliers may represent unusual price movements and could be of interest for further investigation.

Data Distribution: The histograms on the diagonal of the Pair Plot show the distribution of each variable. This information is helpful for understanding the data's central tendencies and spreads.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To help the client achieve the business objective, which is presumably related to optimizing investments or decision-making in the stock market, I would suggest the following:

Risk Diversification: Encourage the client to diversify their investment portfolio across multiple years. Based on the insights gained from the box plot, they should include stocks from years with lower price volatility and years with higher price volatility. This diversification can help spread risk and stabilize the overall performance of the portfolio.

Strategic Planning: Create a strategic investment plan that aligns with the client's risk tolerance and financial goals. For instance, they can allocate a portion of their funds to long-term, stable investments in years with lower price variability and allocate another portion to more aggressive trading strategies in years with higher volatility.

Market Monitoring: Continuously monitor the stock market and adapt investment strategies based on the current year's price volatility and other market indicators. Regular reviews will help the client stay agile and responsive to market conditions.

Professional Guidance: Consider seeking professional advice from financial experts or portfolio managers who can provide more in-depth analysis and personalized investment strategies. A financial advisor can tailor the approach to the client's specific financial situation and objectives.

Education and Research: Encourage the client to invest time in learning about the stock market and stay informed about market trends and economic factors. Knowledge is a powerful tool in making informed investment decisions.

Embrace Technology: Utilize data analysis tools and algorithms to automate trading decisions and identify opportunities in real-time. AI-based trading systems can help react quickly to market changes.

Psychological Preparedness: Prepare the client for the psychological aspects of stock market investing. Emphasize the importance of discipline, patience, and a long-term perspective to avoid impulsive decisions during market fluctuations.

Regular Portfolio Reviews: Schedule regular portfolio reviews to assess performance, rebalance investments, and adjust strategies based on market conditions and the achievement of financial objectives.

Track and Measure Success: Implement a system to track and measure the success of different investment strategies. Regularly review the performance of the portfolio against predefined benchmarks and adjust as needed.

Emergency Fund: Ensure that the client has an emergency fund in place to cover unexpected financial needs, reducing the pressure to make impulsive investment decisions in times of market volatility.

# **Conclusion**

In this analysis, we have delved into historical stock price data to gain a deeper understanding of market trends. The dataset, comprising 185 records, revealed valuable insights that can significantly impact our investment strategy. The absence of duplicates and missing values ensures data reliability.

Our exploration has provided critical information about the variables and allowed us to categorize data by year, facilitating a more strategic analysis. The visualizations, including line charts, bar charts, and box plots, have shed light on market dynamics.

We now have insights into price volatility, distribution patterns, and potential investment opportunities. To meet our business objectives, we'll diversify investments across years, implement strategic planning, and continuously monitor the market. Staying informed, utilizing technology, and having a well-prepared mindset are essential.

Regular portfolio reviews and maintaining an emergency fund are part of our investment strategy. By aligning these insights with our specific goals and risk tolerance, we can confidently navigate the stock market and anticipate positive impacts on our business objectives.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***