---
# Title: Exploratory Data Analysis (EDA) of NSE Fundamental Dataset
Author: [Anant Kacholia]

Description:
This notebook performs exploratory data analysis (EDA) on a comprehensive dataset of fundamental metrics for stocks listed on the National Stock Exchange (NSE). The dataset covers a four-year period from 2019 to 2022 and includes over 1800 stocks across various sectors.

The objective of this analysis is to gain insights into the financial performance of individual stocks and sectors, identify trends and patterns over time, and explore factors that influence stock returns and future price movements. The dataset contains a wide range of fundamental metrics, including Price/Earnings to Growth ratio, Profit Growth, Basic EPS, Revenue from Operations/Share, and more.

Key Tasks:
1. Descriptive statistics: Compute summary statistics for fundamental metrics.
2. Time series analysis: Explore trends and patterns over the four-year period.
3. Sector-wise comparison: Compare fundamental metrics across different sectors.
4. Growth analysis: Analyze annual growth rates of fundamental metrics.

Dataset Information:
- Dataset Name: NSE Fundamental Dataset
- Time Period: 2019-2022
- Number of Stocks: 1800+
- Features: Price/Earnings to Growth ratio, Profit Growth, Basic EPS, Revenue from Operations/Share, and more.
- Targets: Stock returns for each year and next year's close price.
- Train-Test Split: 3-year training data (2019-2021), 1-year test data (2022).



**Dataset Description:**

The NSE Fundamental Dataset is a comprehensive collection of fundamental metrics for stocks listed on the National Stock Exchange (NSE) over a four-year period from 2019 to 2022. This dataset encompasses over 1800 individual stocks across various sectors and segments of the market.

**Key Features:**

The dataset includes a wide range of fundamental metrics that provide insights into the financial performance and health of individual companies. Some of the key features include:

1. **Price/Earnings to Growth ratio (PEG):** A measure of a stock's valuation relative to its earnings growth potential.
2. **Profit Growth:** The percentage change in a company's profits over time, indicating its growth trajectory.
3. **Basic EPS (Earnings Per Share):** The portion of a company's profit allocated to each outstanding share of common stock.
4. **Revenue from Operations/Share:** The amount of revenue generated per share of stock outstanding.
5. **Net Profit Margin:** The percentage of revenue that remains as net profit after accounting for all expenses.
6. **Return on Equity (ROE):** A measure of a company's profitability relative to its shareholders' equity.
7. **Debt-to-Equity Ratio:** A measure of a company's financial leverage, calculated as total debt divided by total equity.
8. **Market Capitalization:** The total market value of a company's outstanding shares of stock.

Hybrid Features:

Hybrid features combine multiple fundamental metrics to provide deeper insights into a company's financial condition and market position. These features leverage growth rates to assess the trajectory of key metrics over time and incorporate industry-relative performance to benchmark a company against its peers within the same sector. By integrating both historical trends and comparative analysis, hybrid features offer a holistic view of a company's competitive standing and growth potential.

1. Growth-to-Industry Ratio: This feature calculates the ratio of a company's growth rate in a specific metric (e.g., revenue, earnings) to the average growth rate of the industry sector to which it belongs. It indicates whether the company is growing faster or slower than its industry peers.
2. Relative Profit Margin: Combining a company's net profit margin with the average net profit margin of its industry sector, this feature assesses the company's profitability relative to its industry peers. A higher relative profit margin suggests that the company is more profitable compared to its competitors.
3. Valuation-to-Growth Ratio: By dividing a company's Price/Earnings ratio by its earnings growth rate, this feature evaluates the company's valuation relative to its growth prospects. It helps investors assess whether the stock is overvalued or undervalued based on its growth potential.
4. Industry Relative Return on Equity (ROE):This feature compares a company's return on equity (ROE) with the average ROE of its industry peers. It indicates how efficiently the company is utilizing its equity capital compared to its competitors.
5. Market Cap-to-Industry Revenue Ratio: This feature divides a company's market capitalization by the average revenue from operations of its industry sector. It offers insights into the company's valuation relative to its revenue-generating capacity compared to industry benchmarks.
6. Relative Earnings Yield: Calculated as the reciprocal of the Price/Earnings ratio, this feature compares a company's earnings yield with the average earnings yield of its industry sector. It provides a measure of the company's earnings relative to its market value compared to industry norms.

**Potential Uses:**

1. **Predictive Analysis:** The dataset can be used to build predictive models to forecast stock returns and future price movements. By analyzing historical fundamental metrics and stock performance, predictive algorithms can be trained to anticipate market trends and identify profitable investment opportunities.

2. **Analysis of Company Wellbeing:** Investors and financial analysts can utilize the dataset to assess the financial health and wellbeing of individual companies. By examining key financial ratios and metrics, such as profitability, liquidity, and solvency, stakeholders can evaluate a company's ability to generate returns and manage risks.

3. **Financial Health Metrics:** The dataset provides a wealth of financial health metrics that can be used to gauge the overall stability and performance of companies. By analyzing trends in metrics such as profit growth, earnings per share, and return on equity, analysts can assess a company's competitive position and growth prospects.

4. **Statistical Relations Between Features:** Statistical analysis can be performed to explore relationships and correlations between different fundamental metrics. By examining the pairwise correlations between features, analysts can identify patterns and dependencies that may influence stock performance and valuation.

5. **Sectoral Analysis:** The dataset allows for sector-wise analysis, enabling investors to compare the financial performance of companies within the same industry. By aggregating data at the sector level, analysts can identify sector-specific trends and investment opportunities.

6. **Risk Management:** The dataset can be used for risk management purposes by identifying companies with high levels of debt, low profitability, or other indicators of financial distress. By assessing risk factors and mitigating exposures, investors can optimize their portfolios and minimize potential losses.

7. Decision Support: In the complex and dynamic environment of financial markets, data-driven decision-making is essential for investors, portfolio managers, and financial institutions. The availability of comprehensive features empowers decision-makers with actionable insights and informed strategies for optimizing investment portfolios, mitigating risks, and maximizing returns.
---

**Code Description:**

1. **Importing Libraries:**
   - We begin by importing the necessary libraries for data analysis and visualization.
   - `numpy` (imported as `np`): This library is commonly used for numerical computing and provides support for handling arrays, matrices, and mathematical operations efficiently.
   - `pandas` (imported as `pd`): Pandas is a versatile library for data manipulation and analysis in Python. It offers data structures like DataFrame and Series, along with functions for reading, writing, and processing tabular data, including CSV file I/O.
   - `matplotlib.pyplot` (imported as `plt`): Matplotlib is a popular plotting library in Python used for creating static visualizations. We import the pyplot module, which provides a MATLAB-like interface for generating plots.
   - `plotly.express` (imported as `px`): Plotly is a library for interactive and web-based visualizations. We import the express module, which offers a high-level interface for creating complex and interactive plots with minimal code.
   - `seaborn` (imported as `sns`): Seaborn is another powerful visualization library built on top of Matplotlib. It provides a high-level interface for creating informative and visually appealing statistical graphics.

**Importance:**

- **Plotly Express:** Plotly Express is particularly useful for creating interactive and web-based visualizations, allowing for exploration and interaction with the data directly within the plot.
- **Seaborn:** Seaborn provides additional statistical plotting capabilities and offers a more aesthetically pleasing default style compared to Matplotlib.


In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

**Code Description:**

1. **Importing `os` Module:**
   - We import the `os` module, which provides a way to interact with the operating system. This module allows us to perform various file and directory operations, such as listing files in a directory, checking file existence, and navigating file paths.

2. **Defining Directory Path:**
   - We define the directory path where the dataset files are located. In this case, the directory path is set to '/kaggle/input/nse-fundamental-insights-fundamentals-2019-2022/mmdp dataset/2022'.
   - Adjust this path according to the actual location of your dataset files.

3. **Listing Files in the Directory:**
   - We use the `os.listdir()` function to list all the files present in the specified directory.
   - The `os.listdir()` function returns a list containing the names of all the files and directories in the specified directory.

4. **Printing Files in the Directory:**
   - We iterate over the list of files obtained from `os.listdir()` and print each file name.
   - This provides us with a simple way to verify the contents of the directory and ensure that we have access to the dataset files we need for analysis.

**Next Steps:**

By executing this code snippet, we can verify the files present in the specified directory and ensure that we have access to the dataset files required for further analysis. Once we have confirmed the presence of the dataset files, we can proceed with loading and processing the data for exploratory data analysis (EDA) and other tasks.


In [2]:
import os

directory = '/kaggle/input/nse-fundamental-insights-fundamentals-2019-2022/mmdp dataset/2022'

files = os.listdir(directory)

print("Files in the directory:")
for file in files:
    print(file)


Files in the directory:
Media_ 2022.csv
energy_ 2022.csv
services_ 2022.csv
Auto_ 2022.csv
utilities_ 2022.csv
commodities_ 2022.csv
IT_ 2022.csv
FMCG_ 2022.csv
Retail_ 2022.csv
health_ 2022.csv
Textile_ 2022.csv
Real_Estate_ 2022.csv
telecom_ 2022.csv
finserv_ 2022.csv
Ind_prod_ 2022.csv
Defense_ 2022.csv
consumber_durables_ 2022.csv
Leisure_ 2022.csv


Next Steps:

By executing this code snippet, we extract the sector names from the file names in the specified directory. This allows us to identify the sectors represented in the dataset and organize the data accordingly for further analysis. The sectors list can be used to access data specific to each sector and perform sector-wise analysis as needed.

In [3]:
sectors = []
for file in files:
    sectors.append(file[:-9])

Next Steps:

By executing this code snippet, we obtain the number of stocks present in each sector from the dataset files. This information allows us to understand the distribution of stocks across different sectors and assess the granularity of the dataset for further analysis. The number_of_stocks list can be used to perform sector-wise analysis and comparisons based on the number of stocks available in each sector.

In [4]:
number_of_stocks = []
for file in files:
    number_of_stocks.append(len(pd.read_csv(directory+'/'+file)))

Next Steps:

By executing this code snippet, we generate an interactive bar plot visualizing the number of stocks in each sector. This plot provides a clear overview of the distribution of stocks across different sectors in the dataset, allowing for easy comparison and analysis. Users can interact with the plot to explore specific sectors and corresponding stock counts.

In [5]:


fig = px.bar(x=sectors, y=number_of_stocks,
             title='Number of stocks in each sector')

fig.update_layout(
    xaxis=dict(title='Sector', showgrid=False),
    yaxis=dict(title='Number of stocks', showgrid=True),
    title_x=0.5,
    title_y=0.95,
    title_font=dict(size=20),
    plot_bgcolor='white',
    autosize=False,
    width=800,
    height=600
)

# Show the interactive plot
fig.show()


**Code Description:**

- **Sector-Specific Data:** By loading the data from the CSV file corresponding to the "Auto" sector and the year 2022, this code snippet focuses on a specific segment of the dataset.
- **Granularity:** The DataFrame `df_auto` contains granular information about companies within the "Auto" sector, including fundamental metrics and financial indicators for the year 2022.
- **Analysis Scope:** This sector-specific data can be analyzed independently or in conjunction with data from other sectors to derive insights, identify trends, and make informed decisions in the financial domain.

**Next Steps:**

- **Exploratory Data Analysis (EDA):** Further analysis can be performed on `df_auto` to uncover patterns, relationships, and anomalies within the sector-specific data.
- **Feature Engineering:** Additional features may be derived or engineered from the existing dataset to enhance predictive modeling or gain deeper insights into sector performance.
- **Visualization:** Visual representations such as plots, charts, and graphs can be utilized to present key findings and communicate insights effectively.

**Significance:**

By accessing and examining sector-specific data for the year 2022, this code snippet lays the groundwork for in-depth analysis and exploration within the "Auto" sector of the NSE Fundamental Dataset. It serves as a crucial step towards understanding sector dynamics, evaluating company performance, and informing investment decisions.

In [6]:
directory = '/kaggle/input/nse-fundamental-insights-fundamentals-2019-2022/mmdp dataset/2022'
year = '2022'
df_auto = pd.read_csv(directory+'/Auto_ 2022.csv')
df_auto.head()

Unnamed: 0.1,Unnamed: 0,Stock Name,Price Earnings to Growth ratio (X),Industry Relative Price Earnings to Growth ratio (%),Special Ratio 1 (X),Profit Growth (%),Industry Relative Profit Growth (%),Basic EPS (Rs.),Diluted EPS (Rs.),Special Ratio 2 (X),...,Industry Relative EV/EBITDA (%),Industry Relative MarketCap/Net Operating Revenue (%),Industry Relative Retention Ratios (%),Industry Relative Price/BV (%),Industry Relative Price/Net Operating Revenue (%),Industry Relative Earnings Yield (%),Industry Relative Close (%),Industry Relative PE ratio (%),Next Year Close (X),Returns (%)
0,0,ASHOKLEY,-1.7641,98.4695,-5746.2402,9.0535,-61.5285,-0.4906,-1.22,-1.22,...,49.0619,-42.7448,253.6765,37.3178,-42.7448,-300.0,-92.0118,-10.0,139.2,0.1956
1,1,ATULAUTO,-0.2113,87.2233,143.72,60.0293,155.0856,0.4621,-11.37,-11.37,...,-363.9976,-50.6119,-100.0,-62.6822,-50.6119,-1500.0,-88.9232,-10.0,313.4,0.9412
2,2,BAJAJ-AUTO,0.7772,103.474,-41303.4846,5.093,-78.358,-0.1841,213.2,213.2,...,18.4074,39.4231,-18.667,3.207,39.4231,1100.0,141.5544,10.0,3884.75,0.1034
3,3,EICHERMOT,2.029,101.3307,-29066.8522,14.4591,-38.5582,-0.826,61.33,61.26,...,82.5971,185.4021,71.4421,55.6851,185.4021,300.0,67.4668,10.0,2948.85,0.2081
4,4,ESCORTS,-0.9347,97.1113,-49661.0824,-7.7183,-132.798,-0.3924,74.06,73.73,...,39.2377,34.6154,113.425,-14.2857,34.6154,500.0,16.0066,10.0,1891.05,0.1184


Next Steps:

Parameter Count Analysis:
The counts of industry-relative parameters, growth parameters, and independent parameters provide insights into the composition of the dataset and the types of financial metrics available for analysis.
Analyzing the distribution of parameters can help in understanding the emphasis on industry-relative performance metrics, growth-oriented indicators, and other independent variables within the dataset.

 
- **Number of Industry Relative Parameters 42**
- **Number of Growth Parameters 44**
- **Number of Independent Parameters 42**


In [7]:
substring = 'Industry Relative'

count_IR = sum(1 for s in df_auto.columns if substring in s)
substring = 'Growth'

count_GR = sum(1 for s in df_auto.columns if substring in s)
count_ = len(df_auto.columns)-count_IR -count_GR -2 #2 target variables

print("Number of Industry Relative Parameters",count_IR)
print("Number of Growth Parameters",count_GR)
print("Number of Independent Parameters",count_)


Number of Industry Relative Parameters 42
Number of Growth Parameters 44
Number of Independent Parameters 42


**Code Description:**

- **Selecting Columns for Analysis:**

- **Calculating Correlation Matrix:**

- **Extracting Correlation with Target:**
  - The correlation coefficients between the selected parameters and the last column (presumed to be the target variable) are extracted from the correlation matrix.

**Next Steps:**

- **Correlation Analysis:**
  - The correlation matrix allows for the exploration of relationships between independent parameters and the target variable.
  - Analyzing correlation coefficients helps identify parameters that have a strong linear relationship with the target variable, potentially indicating their predictive power or influence on target outcomes.


In [8]:
columns_to_include = df_auto.columns[2:-2].tolist() + [df_auto.columns[-1]]

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]


**Code Description:**

- **Preparing Plot Data:**
- **Creating Bar Plot:**

**Significance:**

- **Target Variable Analysis:** Understanding the correlation of independent parameters with the target variable ('Returns') for the year 2022 is crucial for predictive modeling and decision-making in the financial domain. Identifying parameters that exhibit strong correlations can inform investment strategies and risk management practices.
- **Insight Generation:** Visualizing correlations helps uncover patterns and relationships within the dataset, enabling data-driven insights and informed decision-making processes.
- **Predictive Modeling:** Parameters with significant correlations can serve as predictive features in machine learning models, enhancing their accuracy and predictive power. By incorporating correlated parameters, models can better forecast future returns or outcomes.

**Next Steps:**

- **Interpreting Correlation:** Further analysis of the correlation plot allows for the interpretation of relationships between independent parameters and the target variable. Identifying parameters with high positive or negative correlations can guide feature selection and model development efforts.
- **Refinement of Analysis:** Continual refinement of analysis techniques, such as exploring non-linear relationships and conducting feature importance assessments, contributes to a deeper understanding of the dataset and improves the efficacy of predictive modeling efforts.


In [9]:
plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Returns '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600)

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

- **Target Variable Analysis:** Understanding the correlation of independent parameters with the target variable ('Next Years Close') for the year 2022 is crucial for predictive modeling and decision-making in the financial domain. Identifying parameters that exhibit strong correlations can inform investment strategies and risk management practices.

In [10]:
columns_to_include = df_auto.columns[2:-1].tolist()

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]

plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Next year\'s close '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600)

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

# 2021

**Relevance to Dataset:**

- **Sector-Specific Data:** By loading the data from the CSV file corresponding to the "Auto" sector and the specified year, this code snippet focuses on a specific segment of the dataset.
- **Temporal Context:** Loading data for a particular year ('2021') allows for analysis of trends and patterns within the automotive sector during that period.

**Next Steps:**

- **Exploratory Data Analysis (EDA):** Further analysis can be performed on `df_auto` to uncover insights, trends, and anomalies within the automotive sector for the specified year.
- **Feature Engineering:** Additional features may be derived or engineered from the existing dataset to enhance analysis and modeling efforts.
- **Visualization:** Visual representations such as plots, charts, and graphs can be utilized to present key findings and communicate insights effectively.


In [11]:
year = '2021'
directory = '/kaggle/input/nse-fundamental-insights-fundamentals-2019-2022/mmdp dataset/'+year

df_auto = pd.read_csv(directory+'/Auto_ '+year+'.csv')
df_auto.head()

Unnamed: 0.1,Unnamed: 0,Stock Name,Price Earnings to Growth ratio (X),Industry Relative Price Earnings to Growth ratio (%),Special Ratio 1 (X),Profit Growth (%),Industry Relative Profit Growth (%),Basic EPS (Rs.),Diluted EPS (Rs.),Special Ratio 2 (X),...,Industry Relative EV/EBITDA (%),Industry Relative MarketCap/Net Operating Revenue (%),Industry Relative Retention Ratios (%),Industry Relative Price/BV (%),Industry Relative Price/Net Operating Revenue (%),Industry Relative Earnings Yield (%),Industry Relative Close (%),Industry Relative PE ratio (%),Next Year Close (X),Returns (%)
0,0,ASHOKLEY,-0.6526,-113.5911,66.27,-28.733,422.799,-1.0035,-0.56,-0.56,...,100.3467,-25.6198,-100.0,38.6074,-25.6198,-100.0,-91.816,-10.0,116.431,0.0433
1,1,ATULAUTO,-0.0635,-2095.3063,134.87,1340.6593,-24493.365,-20.0,-3.73,-3.73,...,-777.4062,-42.1488,-100.0,-57.1755,-42.1488,900.0,-86.892,-10.0,161.45,-0.0968
2,2,BAJAJ-AUTO,-2.7801,49.8572,958.67,-6.7569,22.9418,-0.7384,167.9,167.9,...,73.8377,66.5942,-100.0,27.4926,66.5942,-2600.0,150.7552,10.0,3520.766,0.0296
3,3,EICHERMOT,-0.0414,-3263.1971,319.08,-1120.2299,20282.6409,-1.9884,49.3,49.24,...,198.8377,254.9369,-100.0,103.3344,254.9369,-1100.0,88.5155,10.0,2440.905,-0.0505
4,4,ESCORTS,0.3463,502.5462,-50218.7672,35.0513,-737.7607,-0.2714,92.15,91.98,...,27.1411,7.873,290.5321,13.1089,7.873,-2600.0,-5.7614,10.0,1690.85,0.3157


In [12]:
columns_to_include = df_auto.columns[2:-2].tolist() + [df_auto.columns[-1]]

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]
plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Returns '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600,
             color_discrete_sequence=['green'])

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

In [13]:
columns_to_include = df_auto.columns[2:-1].tolist()

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]

plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Next year\'s close '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600,
             color_discrete_sequence=['green'])

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

# 2020

**Relevance to Dataset:**

- **Sector-Specific Data:** By loading the data from the CSV file corresponding to the "Auto" sector and the specified year, this code snippet focuses on a specific segment of the dataset.
- **Temporal Context:** Loading data for a particular year ('2021') allows for analysis of trends and patterns within the automotive sector during that period.

**Next Steps:**

- **Exploratory Data Analysis (EDA):** Further analysis can be performed on `df_auto` to uncover insights, trends, and anomalies within the automotive sector for the specified year.
- **Feature Engineering:** Additional features may be derived or engineered from the existing dataset to enhance analysis and modeling efforts.
- **Visualization:** Visual representations such as plots, charts, and graphs can be utilized to present key findings and communicate insights effectively.


In [14]:
year = '2020'
directory = '/kaggle/input/nse-fundamental-insights-fundamentals-2019-2022/mmdp dataset/'+year

df_auto = pd.read_csv(directory+'/Auto_ '+year+'.csv')
df_auto.head()

Unnamed: 0.1,Unnamed: 0,Stock Name,Price Earnings to Growth ratio (X),Industry Relative Price Earnings to Growth ratio (%),Special Ratio 1 (X),Profit Growth (%),Industry Relative Profit Growth (%),Basic EPS (Rs.),Diluted EPS (Rs.),Special Ratio 2 (X),...,Industry Relative EV/EBITDA (%),Industry Relative MarketCap/Net Operating Revenue (%),Industry Relative Retention Ratios (%),Industry Relative Price/BV (%),Industry Relative Price/Net Operating Revenue (%),Industry Relative Earnings Yield (%),Industry Relative Close (%),Industry Relative PE ratio (%),Next Year Close (X),Returns (%)
0,0,ASHOKLEY,-0.0714,-862.4059,1375.2042,-50.7909,-94.1761,-0.1851,1.15,1.15,...,18.5058,-43.6893,535.2952,-0.4302,-43.6893,-236.3636,-94.3181,10.0,111.603,1.6365
1,1,ATULAUTO,-2.0799,66.9688,-25352.529,-18.8663,-97.8367,0.4075,24.42,24.42,...,-40.7471,-51.4563,-302.1456,-37.9226,-51.4563,-872.7273,-81.2951,10.0,178.75,0.2827
2,2,BAJAJ-AUTO,1.903,136.1005,-36590.4288,1.3026,-100.1494,-0.2595,180.2,180.2,...,25.9482,90.2913,-176.0067,65.9496,90.2913,-509.0909,152.8873,10.0,3419.474,0.815
3,3,EICHERMOT,-0.0935,-634.5672,-228370.1056,-22.9597,-97.3673,-0.7304,669.52,669.19,...,73.0356,278.6408,-242.6977,120.0369,278.6408,-327.2727,73.5431,10.0,2570.729,0.9884
4,4,ESCORTS,-8.4581,91.8776,-45170.294,-7.4839,-99.1419,-0.4742,55.04,55.04,...,47.4166,35.9223,-317.4062,60.4179,35.9223,-372.7273,-11.4441,10.0,1285.103,0.9479


In [15]:
columns_to_include = df_auto.columns[2:-2].tolist() + [df_auto.columns[-1]]

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]
plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Returns '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600,
             color_discrete_sequence=['red'])

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

In [16]:
columns_to_include = df_auto.columns[2:-1].tolist()

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]

plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Next year\'s close '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600,
             color_discrete_sequence=['red'])

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

# 2019

**Relevance to Dataset:**

- **Sector-Specific Data:** By loading the data from the CSV file corresponding to the "Auto" sector and the specified year, this code snippet focuses on a specific segment of the dataset.
- **Temporal Context:** Loading data for a particular year ('2021') allows for analysis of trends and patterns within the automotive sector during that period.

**Next Steps:**

- **Exploratory Data Analysis (EDA):** Further analysis can be performed on `df_auto` to uncover insights, trends, and anomalies within the automotive sector for the specified year.
- **Feature Engineering:** Additional features may be derived or engineered from the existing dataset to enhance analysis and modeling efforts.
- **Visualization:** Visual representations such as plots, charts, and graphs can be utilized to present key findings and communicate insights effectively.


In [17]:
year = '2019'
directory = '/kaggle/input/nse-fundamental-insights-fundamentals-2019-2022/mmdp dataset/'+year

df_auto = pd.read_csv(directory+'/Auto_ '+year+'.csv')
df_auto.head()

Unnamed: 0.1,Unnamed: 0,Stock Name,Price Earnings to Growth ratio (X),Industry Relative Price Earnings to Growth ratio (%),Special Ratio 1 (X),Profit Growth (%),Industry Relative Profit Growth (%),Basic EPS (Rs.),Diluted EPS (Rs.),Special Ratio 2 (X),...,Industry Relative EV/EBITDA (%),Industry Relative MarketCap/Net Operating Revenue (%),Industry Relative Retention Ratios (%),Industry Relative Price/BV (%),Industry Relative Price/Net Operating Revenue (%),Industry Relative Earnings Yield (%),Industry Relative Close (%),Industry Relative PE ratio (%),Next Year Close (X),Returns (%)
0,0,ASHOKLEY,0.7512,129.0189,-7666.3711,11.4219,-874.8922,1.0169,7.08,7.08,...,-101.6914,-61.5932,25.1441,3.6234,-61.5932,-335.2941,-92.9682,10.0,42.33,-0.4684
1,1,ATULAUTO,1.0265,121.2382,-24305.7614,14.9776,-1116.1225,1.0172,25.09,25.09,...,-101.7245,-46.4201,68.7727,-2.4721,-46.4201,-305.8824,-70.4512,10.0,139.35,-0.5836
2,2,BAJAJ-AUTO,1.0496,120.7705,-68788.8313,8.264,-660.6479,1.0264,170.3,170.3,...,-102.6373,31.816,38.2284,22.9258,31.816,-276.4706,127.069,10.0,1883.99,-0.2673
3,3,EICHERMOT,0.2131,202.2847,-312514.926,7.6643,-619.9688,1.0329,807.76,806.86,...,-103.2858,171.2186,84.3757,112.6651,171.2186,-217.6471,65.7509,10.0,1292.882,-0.3112
4,4,ESCORTS,0.5531,139.4167,-48418.363,24.2347,-1744.1448,1.0248,55.82,55.82,...,-102.4783,-26.0313,102.7748,23.6031,-26.0313,-247.0588,-30.6499,10.0,659.734,-0.16


In [18]:
columns_to_include = df_auto.columns[2:-2].tolist() + [df_auto.columns[-1]]

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]
plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Returns '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600,
             color_discrete_sequence=['orange'])

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

In [19]:
columns_to_include = df_auto.columns[2:-1].tolist()

correlation_matrix = df_auto[columns_to_include].corr()
correlation_with_last_column_target = correlation_matrix.iloc[:-1, -1]

plot_data = pd.DataFrame({
    'Columns': df_auto.columns[2:-2],  # Exclude the last column
    'Correlation with Last Column': correlation_with_last_column_target
})

fig = px.bar(plot_data, x='Columns', y='Correlation with Last Column',
                 title='Correlation of Columns with Target-> Next year\'s close '+year,
                 labels={'Columns': 'Column Names', 'Correlation with Last Column': 'Correlation'},
                 width=800, height=600,
             color_discrete_sequence=['orange'])

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

fig.show()

**Conclusion:**

Exploring the correlations across various fundamental factors and their impact on returns provides invaluable insights into the intricate dynamics of financial markets. Through visualizing these correlations within the dataset, we gain a deeper understanding of the multifaceted relationships that influence investment outcomes.

The complexity of financial markets is evident in the interplay between different fundamental factors and their effects on returns. By delving into this complexity, we uncover opportunities to navigate and thrive in ever-evolving segments of the market.

This dataset serves as a gateway to unraveling the complexities of the financial landscape, offering a wealth of information and opportunities for analysis. As we continue to explore and interpret the data, we unlock potential avenues for informed decision-making and strategic investment.

In summary, the visualization of correlations within the dataset not only illuminates the intricacies of financial markets but also presents us with the opportunity to dive deep and engage with a flourishing segment, empowering us to make informed choices and navigate the complexities of the financial world with confidence.