# NIFTY - 50 | Exploratory Data Analysis

NIFTY is a market index introduced by the National Stock Exchange. It is a blended word – National Stock Exchange and Fifty coined by NSE on 21st April 1996. NIFTY 50 is a benchmark based index and also the flagship of NSE, which showcases the top 50 equity stocks traded in the stock exchange out of a total of 1600 stocks.

These stocks span across 12 sectors of the Indian economy which include – information technology, financial services, consumer goods, entertainment and media, financial services, metals, pharmaceuticals, telecommunications, cement and its products, automobiles, pesticides and fertilizers, energy, and other services.

NIFTY is one of the two national indices, the other being SENSEX, a product of the Bombay Stock Exchange. It is owned by the India Index Services and Products (IISL), which is a fully-owned subsidiary of the National Stock Exchange Strategic Investment Corporation Limited.

NIFTY 50 follows the trends and patterns of blue-chip companies, i.e. the most liquid and largest Indian securities.

NIFTY contains a host of indices – NIFTY 50, NIFTY IT, NIFTY Bank, and NIFTY Next 50; and is a part of the Futures and Options (F&O) segment of NSE which deals in derivatives.

### **How is NIFTY for Share Market Calculated?**

The NIFTY share index is managed by a team of professionals at the NSE Indices Limited. It formed an Index Advisory Committee that offers its expertise and guidance on large-scale issues pertinent to equity indices.

NIFTY 50 indices are computed based on a float-adjusted and market capitalisation weighted method. In this method, the level of index demonstrates the aggregate market value of stocks present in the index in a specific base period. Such a base period for a NIFTY 50 index is 3rd November 1995 where the base value of the index is considered 1000 and its base capital stands at Rs. 2.06 Trillion.

The formula for calculating price index is listed below –

Index value = Current MV or market value / (Base Market Capital * 1000)

The methodology involved in the calculation of indices also considers changes in corporate actions, which for instance comprise of rights issuance, stock splits, etc.

The NIFTY share market index is a benchmark standard against which all equity markets in India are measured. Therefore, NSE conducts regular index maintenance to ensure that it remains stable and persists as the benchmark in the Indian stock market context.

## Description About The DataSet Chosen

The dataset consists of 7 files.Let's quickly understand what those are:

* **NIFTY 50** - This represents the first 50 companies based on full market capitalisation from the eligible universe.

* **NIFTY SECTORAL INDICES** - This includes NIFTY AUTO,NIFTY BANK, NIFTY FMCG, NIFTY IT,NIFTY METAL, NIFTY PHARMA. These Indices are designed to reflect the behavior and performance of the segment that they reflect i.e automobiles, bank, pharma etc.

## Objective of The Notebook

The objective of this notebook is to explore NIFTY-50 data along with the sectoral indices and visualise them to obtain important information.

## STEP 1 - Importing Necessary Python Libraries

In [1]:
import numpy as np
import pandas as pd

!pip install chart_studio --upgrade -q
!pip install cufflinks --upgrade -q
import plotly.express as px
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot
import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')

from datetime import datetime

## STEP 2 - Data Preparation and Cleaning

### Data Preparation

We will now import the NIFTY 50 data.

In [2]:
nifty_50 = pd.read_csv("NIFTY_50.csv", parse_dates = ["Date"])

Let us now peek into the dataset we uploaded.

In [3]:
nifty_50.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Turnover,P/E,P/B,Div Yield
0,2000-01-03,1482.15,1592.9,1482.15,1592.2,25358322.0,8841500000.0,25.91,4.63,0.95
1,2000-01-04,1594.4,1641.95,1594.4,1638.7,38787872.0,19736900000.0,26.67,4.76,0.92
2,2000-01-05,1634.55,1635.5,1555.05,1595.8,62153431.0,30847900000.0,25.97,4.64,0.95
3,2000-01-06,1595.8,1639.0,1595.8,1617.6,51272875.0,25311800000.0,26.32,4.7,0.94
4,2000-01-07,1616.6,1628.25,1597.2,1613.3,54315945.0,19146300000.0,26.25,4.69,0.94


In [4]:
nifty_50.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5116 entries, 0 to 5115
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       5116 non-null   datetime64[ns]
 1   Open       5116 non-null   float64       
 2   High       5116 non-null   float64       
 3   Low        5116 non-null   float64       
 4   Close      5116 non-null   float64       
 5   Volume     5115 non-null   float64       
 6   Turnover   5115 non-null   float64       
 7   P/E        5116 non-null   float64       
 8   P/B        5116 non-null   float64       
 9   Div Yield  5116 non-null   float64       
dtypes: datetime64[ns](1), float64(9)
memory usage: 399.8 KB


Now that our data has been converted into the desired format, let’s take a look at its various columns for further analysis.

* **The Open and Close columns** indicate the opening and closing price of the stocks on a particular day.
* **The High and Low columns** provide the highest and the lowest price for the stock on a particular day, respectively.
* **The Volume column** tells us the total volume of stocks traded on a particular day.
* **The Turnover column** refers to the total value of stocks traded during a specific period of time. The time period may be annually, quarterly, monthly or daily
* **P/E** also called as the price-earnings ratio relates a company's share price to its earnings per share.
* **P/B** also called as Price-To-Book ratio measures the market's valuation of a company relative to its book value.
* **Div Yield** or the dividend yield is the amount of money a company pays shareholders (over the course of a year) for owning a share of its stock divided by its current stock price—displayed as a percentage.  

### Data Cleaning

We will now look into the missing values of the dataset (if any).

In [5]:
nifty_50.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Volume       1
Turnover     1
P/E          0
P/B          0
Div Yield    0
dtype: int64

As we can see the columns "Volume" and "Turnover" have 1 null value each. We can do two things here - either fill the missing values with Zero or fill the previous value to the current cell. We will go with the second option and fill the missing values with the previous ones.

In [6]:
nifty_50.fillna(method = "pad", inplace = True)

Now lets check the dataset again.

In [7]:
nifty_50.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Volume       0
Turnover     0
P/E          0
P/B          0
Div Yield    0
dtype: int64

As we can see now there are no missing values now. We will now import the Sectoral Datasets and clean them.

In [8]:
# Importing
nifty_auto = pd.read_csv("NIFTY_AUTO.csv",parse_dates=["Date"])
nifty_bank = pd.read_csv("NIFTY_BANK.csv",parse_dates=["Date"])
nifty_fmcg = pd.read_csv("NIFTY_FMCG.csv",parse_dates=["Date"])
nifty_IT = pd.read_csv("NIFTY_IT.csv",parse_dates=["Date"])
nifty_metal = pd.read_csv("NIFTY_METAL.csv",parse_dates=["Date"])
nifty_pharma = pd.read_csv("NIFTY_PHARMA.csv",parse_dates=["Date"])

# Cleaning
nifty_auto.fillna(method='pad',inplace=True)
nifty_bank.fillna(method='pad',inplace=True)
nifty_fmcg.fillna(method='pad',inplace=True)
nifty_IT.fillna(method='pad',inplace=True)
nifty_metal.fillna(method='pad',inplace=True)
nifty_pharma.fillna(method='pad',inplace=True)

Now we are ready for data visualisation.

## STEP 3 - Visualization

Let's first check the **NIFTY - 50 Trend** over the years.

In [9]:
fig = go.Figure(data=[go.Candlestick(x=nifty_50['Date'],
                open=nifty_50['Open'],
                high=nifty_50['High'],
                low=nifty_50['Low'],
                close=nifty_50['Close'])])
fig.update_layout(title_text='NIFTY - 50 Trend (2000 - 2020)',plot_bgcolor='rgb(250, 242, 242)',yaxis_title='Value')

fig.show()

Now let's check the P/E and P/B Ratios.

In [10]:
fig = go.Figure()
fig.add_trace(go.Scatter(
         x=nifty_50['Date'],
         y=nifty_50['P/E'],
         name='P/E Ratio',
    line=dict(color='green'),
    opacity=0.8))

fig.add_trace(go.Scatter(
         x=nifty_50['Date'],
         y=nifty_50['P/B'],
         name='P/B Ratio',
    line=dict(color='red'),
    opacity=0.8))
        
    
fig.update_layout(title_text='P/E vs P/B Ratio',plot_bgcolor='rgb(250, 242, 242)',yaxis_title='Value')

fig.show()

Now we will look at the **Dividend Yield** over the years.

In [11]:
fig = go.Figure()
fig.add_trace(go.Scatter(
         x=nifty_50['Date'],
         y=nifty_50['Div Yield'],
         name='Dividend Yield',
    line=dict(color='blue'),
    opacity=0.8))
    
fig.update_layout(title_text='NIFTY 50 Dividend Yield',plot_bgcolor='rgb(250, 242, 242)',yaxis_title='Value')

fig.show()

## STEP 4 - Analysis

On observing the trend plot carefully, we find there have been 2 major falls in NIFTY in the last 20 years.

* **2008 - 2010** - This can be attributed to the Great Recession of 2008
* **2020** - This can be attributed to COVID-19 pandemic

We will now closely look into the 2 major falls.

## The Great Recession of 2008

Let's see the period 2008 - 2010 closely.

In [12]:
nifty_50_2008_to_2010 = nifty_50[(nifty_50['Date'] >= '2008-01-01') & (nifty_50['Date'] <= '2010-12-31')]
fig = go.Figure()
fig.add_trace(go.Scatter(
         x=nifty_50_2008_to_2010['Date'],
         y=nifty_50_2008_to_2010['Low'],
         name='Price',
    line=dict(color='red'),
    opacity=1))
        
    
fig.update_layout(title_text="NIFTY-50 Trend (2008 - 2010)", xaxis_title = 'Year', yaxis_title='Value')

fig.show()

It is evident from the graph that the major fall was between July 2008 and July 2009.

According to The Indian Express, the major causes of the crash were:

* Mortgage Crisis
* Credit Crisis
* Bank Collapse
* Government bailout

### COVID-19

Lets now analyze the NIFTY trend of past 2 years.

In [13]:
nifty_50_2019_2020 = nifty_50[(nifty_50['Date'] >= '2019-01-01')]
fig = go.Figure()
fig.add_trace(go.Scatter(
         x=nifty_50_2019_2020['Date'],
         y=nifty_50_2019_2020['High'],
         name='High Price',
    line=dict(color='green'),
    opacity=1))

fig.add_trace(go.Scatter(
         x=nifty_50_2019_2020['Date'],
         y=nifty_50_2019_2020['Low'],
         name='Low Price',
    line=dict(color='red'),
    opacity=1))
        
    
fig.update_layout(title_text="NIFTY-50 Trend 2019 - Present",xaxis_title = 'Year', yaxis_title='Value')

fig.show()

We see in the above graph, there is a sharp fall in March 2020. This may be attributed to the beginning of lockdown.

## Asking and Answering Questions

Now we will be answring some questions related to our dataset.

**Q1** How did the following news impacted the market?
* Lockdown announced
* Coronavirus declared Pandemic by WHO
* 2019 General Elections
* Union Budget
* Cut in the corporate tax rate announced

**Answer** We will now try to link these happenings with the market. The following steps were taken to arrive at the result -
* Dates of these news announcements were found out from internet and noted.
* Time-Series plotted

In [14]:
fig = px.line(nifty_50_2019_2020, x='Date', y='Close', title='Time Series')

fig.update_layout(title='NIFTY_50 : Major single day gains -2019 onwards',
    yaxis_title='NIFTY 50 Stock',
    shapes = [dict(x0='2020-03-23', x1='2020-03-23', y0=0, y1=1, xref='x', yref='paper', line_width=2,opacity=0.3,line_color='red',editable=False),
              dict(x0='2020-03-12', x1='2020-03-12', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='red'),
              dict(x0='2019-09-3', x1='2019-09-03', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='green'),
              dict(x0='2020-02-1', x1='2020-02-1', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='red'),
              dict(x0='2019-09-20', x1='2019-09-20', y0=0, y1=1, xref='x', yref='paper',line_width=3,opacity=0.3,line_color='green')],
    annotations=[dict(x='2020-03-23', y=0.5, xref='x', yref='paper',
                    showarrow=False, xanchor='left', text='Lockdown announced'),
                 dict(x='2020-03-12', y=0.3, xref='x', yref='paper',
                    showarrow=False, xanchor='right', text='Coronavirus declared Pandemic by WHO'),
                 dict(x='2019-09-3', y=0.08, xref='x', yref='paper',
                    showarrow=False, xanchor='left', text='2019 General Elections'),
                 dict(x='2020-02-1', y=0.5, xref='x', yref='paper',
                    showarrow=False, xanchor='right', text='Union Budget'),
                 dict(x='2019-09-20', y=0.54, xref='x', yref='paper',
                    showarrow=False, xanchor='left', text='cut in the corporate tax rate announced')]
)
fig.show()

We see above that there was a rise in marked due to elections and corporate tax cut, but market fall occured due to the other three news articles.

**Q2** How did the different sectors performed during Covid-19?

**Answer** We first make a single dataset out of all different datasets. Then we plot the graphs.

In [15]:
nifty_auto_2019 = nifty_auto[nifty_auto['Date'] > '2019-12-31']
nifty_bank_2019 = nifty_bank[nifty_bank['Date'] > '2019-12-31']
nifty_fmcg_2019 = nifty_fmcg[nifty_fmcg['Date'] > '2019-12-31']
nifty_IT_2019 = nifty_IT[nifty_IT['Date'] > '2019-12-31']
nifty_metal_2019 = nifty_metal[nifty_metal['Date'] > '2019-12-31']
nifty_pharma_2019 = nifty_pharma[nifty_pharma['Date'] > '2019-12-31']

data = {'NIFTY Auto index': nifty_auto_2019['Close'].values, 
        'NIFTY Bank index': nifty_bank_2019['Close'].values,
        'NIFTY FMCG index': nifty_fmcg_2019['Close'].values,
        'NIFTY IT index': nifty_IT_2019['Close'].values,
        'NIFTY Metal index': nifty_metal_2019['Close'].values,
        'NIFTY Pharma index': nifty_pharma_2019['Close'].values,
       }
df = pd.DataFrame(data=data)
df.index=nifty_auto_2019['Date']
df.head()

Unnamed: 0_level_0,NIFTY Auto index,NIFTY Bank index,NIFTY FMCG index,NIFTY IT index,NIFTY Metal index,NIFTY Pharma index
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-01,8210.1,32102.9,30234.25,15722.15,2796.05,8047.1
2020-01-02,8267.45,32443.85,30266.2,15709.65,2869.9,8053.95
2020-01-03,8168.15,32069.25,30109.25,15936.6,2848.35,8111.95
2020-01-06,7978.75,31237.15,29799.3,15879.8,2765.75,7987.35
2020-01-07,8002.5,31399.4,29861.8,15895.2,2785.9,8036.5


In [16]:
fig = df.iplot(asFigure=True, subplots=True, subplot_titles=True, legend=False)
fig.show()

We see above that all the sectors saw a decline, but now slowly they have started rising again. Amongst all the sectors, we can clearly see Pharma, IT and FMCG are faring better than the others and Pharma market is now greater than what it was in January 2020.

**Q3** Comparing Closing prices of Sectoral Indices during The Great Recession of 2008.

**Answer** We take use data during 2008-2010 to plot the graphs.

In [17]:
nifty_auto_2008_2010 = nifty_auto[(nifty_auto['Date'] > '2007-12-31') & (nifty_auto['Date'] < '2011-01-01')]
nifty_bank_2008_2010 = nifty_bank[(nifty_bank['Date'] > '2007-12-31') & (nifty_bank['Date'] > '2011-01-01')]
nifty_fmcg_2008_2010 = nifty_fmcg[(nifty_fmcg['Date'] > '2007-12-31') & (nifty_fmcg['Date'] > '2011-01-01')]
nifty_IT_2008_2010 = nifty_IT[(nifty_IT['Date'] > '2007-12-31') & (nifty_IT['Date'] > '2011-01-01')]
nifty_pharma_2008_2010 = nifty_pharma[(nifty_pharma['Date'] > '2007-12-31') & (nifty_pharma['Date'] > '2011-01-01')]

dataframe = {'NIFTY FMCG index': nifty_fmcg_2008_2010['Close'].values,
             'NIFTY IT index': nifty_IT_2008_2010['Close'].values,
             'NIFTY Pharma index': nifty_pharma_2008_2010['Close'].values,
             'NIFTY Bank index': nifty_bank_2008_2010['Close'].values,
            }

In [18]:
dataframe = pd.DataFrame(data=dataframe)
dataframe.index=nifty_fmcg_2008_2010['Date']
dataframe.head()
dataframe.iplot(kind='box')

We can see above that FMCG and Banking sectors were more prominent during that time compared to IT and Pharma (lowest).

**Q4** Compare performance of IT sector and Automobile sector in the last 5 years.

**Answer** Data was taken from the Dataset and plotted.

In [19]:
nifty_IT_5_years = nifty_IT[nifty_IT['Date'] > '2014-12-31']
nifty_auto_5_years = nifty_auto[nifty_auto['Date'] > '2014-12-31']

fig = go.Figure()
fig.add_trace(go.Scatter(
         x=nifty_IT_5_years['Date'],
         y=nifty_IT_5_years['Close'],
         name='IT Sector',
    line=dict(color='maroon'),
    opacity=1))

fig.add_trace(go.Scatter(
         x=nifty_auto_5_years['Date'],
         y=nifty_auto_5_years['Close'],
         name='Automobile Sector',
    line=dict(color='sandybrown'),
    opacity=1))
        
fig.update_layout(title_text="IT Sector vs Automobile Sector", xaxis_title = 'Year', yaxis_title='Value')

fig.show()

We observe here that between late 2016 and early 2018, Automobile Sector was giving good competition to IT Sector, but currently IT Sector is performing better than Automobile Sector.

**Q5** Compare the current turnovers of different sectors.

**Answer** For calculating this, we find the mean of the Turnovers of different Sectors. Then we plot a Pie chart to compare.

In [20]:
nifty_IT_current = nifty_IT[nifty_IT['Date'] > '2019-12-31']['Turnover'].mean()
nifty_bank_current = nifty_bank[nifty_bank['Date'] > '2019-12-31']['Turnover'].mean()
nifty_auto_current = nifty_auto[nifty_auto['Date'] > '2019-12-31']['Turnover'].mean()
nifty_fmcg_current = nifty_fmcg[nifty_fmcg['Date'] > '2019-12-31']['Turnover'].mean()
nifty_metal_current = nifty_metal[nifty_metal['Date'] > '2019-12-31']['Turnover'].mean()
nifty_pharma_current = nifty_pharma[nifty_pharma['Date'] > '2019-12-31']['Turnover'].mean()

In [21]:
labels = ['IT','BANKING','AUTOMOBILE','FMCG', 'METAL', 'PHARMA']
values = [nifty_IT_current, nifty_bank_current, nifty_auto_current, nifty_fmcg_current, nifty_metal_current, nifty_pharma_current]

fig = go.Figure(data=[go.Pie(labels=labels, values=values)])
fig.update_layout(title_text="Current Turnovers of Different Sectors")
fig.show()

We see above that the Banking Sector is leading the market, with Automobile Sector close behind. The IT Sector and Pharma Sector are giving a close competition to each-other, just closely behind the FMCG Sector.

## Inferences and Conclusion

The following inferences were drawn from the Analysis -

* This year the market slumped due to COVID-19, but the sectors are now slowly recovering.
* Amongst all the sectors, IT and FMCG are faring better than the others. Pharma market is now greater than what it was in January 2020. This shows the growing demand for medicines and the likes.
* Comparing the sectors Turnover-wise, Banking sector is leading with Automobile Sector close behind. IT and Pharma have nearly the same turnover and are quickly approaching the FMCG market.
* Checking the past trends of IT and Automobile sectors, we saw they were giving huge competition to each-other between 2017-2018, but now Automobile sector's growth fell compared to IT Sector, though the former having greater turnover currently.
* A similar kind of market trend could be observed during 2008 - 2010. The period was known as the Great Recession.

It can be concluded from the above analysis that the downfall of markets have occurred many times in history, but we have always recovered and jumped back with more brighter prospects.

In these COVID times, we see a lot of people panicking. But this too shall pass and we will all grow more stronger than before. So with the below words I conclude my Project -

***“You get recessions, you have stock market declines. If you don’t understand that’s going to happen, then you’re not ready, you won’t do well in the markets.”*** – By **Peter Lynch**

## References and Future Work

Following websites were very helpfull for me -

* https://indianexpress.com/
* https://plotly.com/python/
* https://towardsdatascience.com/
* https://www.kaggle.com/

**Future Work** - This project can be extended to include more past data to understand the market better. Then it will become easier to make a machine learning model to predict the future market trends.