# Analysis of Unemployment in India During the COVID-19 Pandemic
Unemployment is typically quantified through the unemployment rate, which represents the percentage of people without jobs in relation to the total labor force. The COVID-19 pandemic led to a substantial upsurge in the unemployment rate.

The objective here is to conduct a Python-based analysis of the unemployment rate in India.

## Objective:

This analysis is geared towards comprehensively assessing the far-reaching consequences of the COVID-19 pandemic on India's employment landscape. The dataset available offers critical insights into the fluctuations in unemployment rates across different Indian regions. The dataset encompasses vital metrics, including a region-wise breakdown, a timeline of measurements, measurement frequency (monthly), the estimated unemployment rate (%), the count of individuals estimated to be employed, and the estimated labor participation rate (%).

## Overview of the Dataset:

The provided dataset delves into the unemployment scenario in various Indian regions:

1. regions: Encompassing the diverse regions of the Indian subcontinent.
2. Date: Records the specific dates of unemployment rate measurements.
3. Measurement Frequency: Indicates the regularity of data collection, which is on a monthly basis.
4. Estimated Unemployment Rate (%): Reflects the percentage of unemployed individuals in each Indian region.
5. Estimated Employed Individuals: Represents the count of individuals currently engaged in employment.
6. Estimated Labor Participation Rate (%): Demonstrates the percentage of the working-age population (16-64 years) actively participating in the job market, encompassing both employed individuals and those actively seeking employment.

This dataset serves as a valuable resource for understanding the fluctuations in unemployment across different regions in India during the COVID-19 pandemic. By providing crucial insights, it sheds light on the consequences for unemployment rates, employment figures, and labor force participation in distinct geographical regions across the nation. The primary goal of this analysis is to gain insights into the socioeconomic impacts of the pandemic on India's workforce and labor market.


In [145]:
#Importing required libraries
import requests
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import calendar 
import plotly.graph_objects as go

In [146]:
# Importing dataset from GitHub repository

url = "https://raw.githubusercontent.com/Amith-Mohan/datasets/main/Unemployment_in_India.csv" # Raw version of the file on GitHub
download = requests.get(url).content
df = pd.read_csv(io.StringIO(download.decode('utf-8')))

In [147]:
# Checking first 5 entries
df.head()

Unnamed: 0,Region,Date,Frequency,Estimated Unemployment Rate (%),Estimated Employed,Estimated Labour Participation Rate (%),Area
0,Andhra Pradesh,31-05-2019,Monthly,3.65,11999139.0,43.24,Rural
1,Andhra Pradesh,30-06-2019,Monthly,3.05,11755881.0,42.05,Rural
2,Andhra Pradesh,31-07-2019,Monthly,3.75,12086707.0,43.5,Rural
3,Andhra Pradesh,31-08-2019,Monthly,3.32,12285693.0,43.97,Rural
4,Andhra Pradesh,30-09-2019,Monthly,5.17,12256762.0,44.68,Rural


In [148]:
# Checking Last 5 entries
df.tail()

Unnamed: 0,Region,Date,Frequency,Estimated Unemployment Rate (%),Estimated Employed,Estimated Labour Participation Rate (%),Area
763,,,,,,,
764,,,,,,,
765,,,,,,,
766,,,,,,,
767,,,,,,,


In [149]:
# Checking datatypes of the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 7 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   Region                                    740 non-null    object 
 1    Date                                     740 non-null    object 
 2    Frequency                                740 non-null    object 
 3    Estimated Unemployment Rate (%)          740 non-null    float64
 4    Estimated Employed                       740 non-null    float64
 5    Estimated Labour Participation Rate (%)  740 non-null    float64
 6   Area                                      740 non-null    object 
dtypes: float64(3), object(4)
memory usage: 42.1+ KB


In [150]:
# Null value check
df.isnull().sum()

Region                                      28
 Date                                       28
 Frequency                                  28
 Estimated Unemployment Rate (%)            28
 Estimated Employed                         28
 Estimated Labour Participation Rate (%)    28
Area                                        28
dtype: int64

In [151]:
# Dropping rows with null values
df.dropna(inplace=True)

In [152]:
df.tail()

Unnamed: 0,Region,Date,Frequency,Estimated Unemployment Rate (%),Estimated Employed,Estimated Labour Participation Rate (%),Area
749,West Bengal,29-02-2020,Monthly,7.55,10871168.0,44.09,Urban
750,West Bengal,31-03-2020,Monthly,6.67,10806105.0,43.34,Urban
751,West Bengal,30-04-2020,Monthly,15.63,9299466.0,41.2,Urban
752,West Bengal,31-05-2020,Monthly,15.22,9240903.0,40.67,Urban
753,West Bengal,30-06-2020,Monthly,9.86,9088931.0,37.57,Urban


In [153]:
# Null value check
df.isnull().sum()

Region                                      0
 Date                                       0
 Frequency                                  0
 Estimated Unemployment Rate (%)            0
 Estimated Employed                         0
 Estimated Labour Participation Rate (%)    0
Area                                        0
dtype: int64

Null values have been removed from the dataset

In [154]:
df.columns

Index(['Region', ' Date', ' Frequency', ' Estimated Unemployment Rate (%)',
       ' Estimated Employed', ' Estimated Labour Participation Rate (%)',
       'Area'],
      dtype='object')

In [155]:
# Renaming column names
df.columns = ['region','date','frequency','estimated_unemployment_rate','estimated_employment','estimated_labour_participation_rate','area']
df.head()

Unnamed: 0,region,date,frequency,estimated_unemployment_rate,estimated_employment,estimated_labour_participation_rate,area
0,Andhra Pradesh,31-05-2019,Monthly,3.65,11999139.0,43.24,Rural
1,Andhra Pradesh,30-06-2019,Monthly,3.05,11755881.0,42.05,Rural
2,Andhra Pradesh,31-07-2019,Monthly,3.75,12086707.0,43.5,Rural
3,Andhra Pradesh,31-08-2019,Monthly,3.32,12285693.0,43.97,Rural
4,Andhra Pradesh,30-09-2019,Monthly,5.17,12256762.0,44.68,Rural


In [156]:
# Shape of dataset (rows, columns)
df.shape

(740, 7)

In [157]:
# Univariate analysis
df.describe()

Unnamed: 0,estimated_unemployment_rate,estimated_employment,estimated_labour_participation_rate
count,740.0,740.0,740.0
mean,11.787946,7204460.0,42.630122
std,10.721298,8087988.0,8.111094
min,0.0,49420.0,13.33
25%,4.6575,1190404.0,38.0625
50%,8.35,4744178.0,41.16
75%,15.8875,11275490.0,45.505
max,76.74,45777510.0,72.57


In [158]:
# Checking for duplicates in the data 
df.duplicated().any()

False

In [159]:
# Checking No. of entries for each individual region
df.region.value_counts()

Andhra Pradesh      28
Kerala              28
West Bengal         28
Uttar Pradesh       28
Tripura             28
Telangana           28
Tamil Nadu          28
Rajasthan           28
Punjab              28
Odisha              28
Madhya Pradesh      28
Maharashtra         28
Karnataka           28
Jharkhand           28
Himachal Pradesh    28
Haryana             28
Gujarat             28
Delhi               28
Chhattisgarh        28
Bihar               28
Meghalaya           27
Uttarakhand         27
Assam               26
Puducherry          26
Goa                 24
Jammu & Kashmir     21
Sikkim              17
Chandigarh          12
Name: region, dtype: int64

Chandigarh is having the least entries/data

In [160]:
# Changing datatype of 'date' to datetime format
df['date'] = pd.to_datetime(df['date'],dayfirst = True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 740 entries, 0 to 753
Data columns (total 7 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 0   region                               740 non-null    object        
 1   date                                 740 non-null    datetime64[ns]
 2   frequency                            740 non-null    object        
 3   estimated_unemployment_rate          740 non-null    float64       
 4   estimated_employment                 740 non-null    float64       
 5   estimated_labour_participation_rate  740 non-null    float64       
 6   area                                 740 non-null    object        
dtypes: datetime64[ns](1), float64(3), object(3)
memory usage: 46.2+ KB


datatype of 'date' changed to datetime format

In [161]:
# Extracting month from date
df['month_int'] = df['date'].dt.month
df.head()

Unnamed: 0,region,date,frequency,estimated_unemployment_rate,estimated_employment,estimated_labour_participation_rate,area,month_int
0,Andhra Pradesh,2019-05-31,Monthly,3.65,11999139.0,43.24,Rural,5
1,Andhra Pradesh,2019-06-30,Monthly,3.05,11755881.0,42.05,Rural,6
2,Andhra Pradesh,2019-07-31,Monthly,3.75,12086707.0,43.5,Rural,7
3,Andhra Pradesh,2019-08-31,Monthly,3.32,12285693.0,43.97,Rural,8
4,Andhra Pradesh,2019-09-30,Monthly,5.17,12256762.0,44.68,Rural,9


In [162]:
# Converting months into words for better analysis
df['month'] = df['month_int'].apply(lambda x: calendar.month_abbr[x])
df.head()

Unnamed: 0,region,date,frequency,estimated_unemployment_rate,estimated_employment,estimated_labour_participation_rate,area,month_int,month
0,Andhra Pradesh,2019-05-31,Monthly,3.65,11999139.0,43.24,Rural,5,May
1,Andhra Pradesh,2019-06-30,Monthly,3.05,11755881.0,42.05,Rural,6,Jun
2,Andhra Pradesh,2019-07-31,Monthly,3.75,12086707.0,43.5,Rural,7,Jul
3,Andhra Pradesh,2019-08-31,Monthly,3.32,12285693.0,43.97,Rural,8,Aug
4,Andhra Pradesh,2019-09-30,Monthly,5.17,12256762.0,44.68,Rural,9,Sep


In [163]:
# Numeric data grouped by months
data = df.groupby(['month'])[['estimated_unemployment_rate','estimated_employment','estimated_labour_participation_rate']].mean()
data=pd.DataFrame(data).reset_index()
data

Unnamed: 0,month,estimated_unemployment_rate,estimated_employment,estimated_labour_participation_rate
0,Apr,23.641569,5283320.0,35.141176
1,Aug,9.637925,7539815.0,43.646792
2,Dec,9.497358,7377388.0,43.667358
3,Feb,9.964717,7603996.0,43.723019
4,Jan,9.950755,7677344.0,44.051321
5,Jul,9.033889,7404425.0,43.706667
6,Jun,10.553462,7372280.0,42.211058
7,Mar,10.700577,7516581.0,43.084038
8,May,16.64619,6666624.0,41.277143
9,Nov,9.868364,7273661.0,44.110545


In [164]:
# Bar plot of unemployment rate vs labour participation rate
month = data.month
unemployment_rate = data['estimated_unemployment_rate']
labour_participation_rate = data['estimated_labour_participation_rate']

fig = go.Figure()

fig.add_trace(go.Bar(x = month,y = unemployment_rate,name = 'unemployment_rate'))
fig.add_trace(go.Bar(x = month,y = labour_participation_rate,name = 'estimated_labour_participation_rate'))

fig.update_layout(title = 'Unemployment Rate v/s Labour Participation',
                     xaxis = {'categoryorder':'array','categoryarray':['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']}      )
fig.show()

In [165]:
# Bar plot of estimated employed citizen in every month
import plotly.express as px
fig = px.bar(data,x='month',y='estimated_employment',color='month',
            category_orders ={'month':['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']},
            title='Estimated employed people from Jan to Dec',template='gridon')
fig.show()

# Region Analysis

In [166]:
region =  df.groupby(['region'])[['estimated_unemployment_rate','estimated_employment','estimated_labour_participation_rate']].mean()
region = pd.DataFrame(region).reset_index()

In [167]:
# Box plot

fig = px.box(data_frame=df,x='region',y='estimated_unemployment_rate',color='region',title='Unemployment rate',template='gridon')
fig.update_layout(xaxis={'categoryorder':'total ascending'})
fig.show()

In [168]:
# average unemployment rate bar plot

fig = px.bar(region,x='region',y='estimated_unemployment_rate',color='region',template='gridon',title='Average unemployment rate (Region wise)')
fig.update_layout(xaxis={'categoryorder':'total ascending'})
fig.show()

Highest average unemployment can be observed in Tripura and Haryana.          
Lowest average unemployment can be observed in Meghalaya and Odisha

In [169]:
# Bar plot of monthly Unemployment Rate

fig = px.bar(df,x='region',y='estimated_unemployment_rate',animation_frame='month',color='region',template='gridon',
            title='Unemployment rate from May 2019 to Apr 2020(Region wise)')

fig.update_layout(xaxis={'categoryorder':'total ascending'})
fig.show()

## Area Analysis

In [170]:
df.area.unique()

array(['Rural', 'Urban'], dtype=object)

In [171]:
# numeric data grouped by region

area = df.groupby(['area'])[['estimated_unemployment_rate','estimated_employment','estimated_labour_participation_rate']].mean()
area = pd.DataFrame(area).reset_index()

In [172]:
#Scatter plot

fig= px.scatter_matrix(df,dimensions=['estimated_unemployment_rate','estimated_employment','estimated_labour_participation_rate'],color='area',template='gridon')
fig.show()

In [173]:
df.columns

Index(['region', 'date', 'frequency', 'estimated_unemployment_rate',
       'estimated_employment', 'estimated_labour_participation_rate', 'area',
       'month_int', 'month'],
      dtype='object')

In [174]:
# Average Unemployment Rate

fig = px.bar(area,x='area',y='estimated_unemployment_rate',color='area',title='Average unemployment rate(Area wise)',template='gridon')
fig.update_layout(xaxis={'categoryorder':'total ascending'})
fig.show()

Unemployment seems to be higher in Urban area as compared to Rural area.

In [175]:
fig = px.bar(df,x='area',y='estimated_unemployment_rate',animation_frame='month',color='region',
            title='Unemployment rate from May 2019 to Apr 2020',template='gridon')

fig.update_layout(xaxis={'categoryorder':'total ascending'})
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] =2000

fig.show()

In [176]:
unemployment =df.groupby(['area','region'])['estimated_unemployment_rate'].mean().reset_index()
unemployment.head()

Unnamed: 0,area,region,estimated_unemployment_rate
0,Rural,Andhra Pradesh,5.526429
1,Rural,Assam,4.490833
2,Rural,Bihar,16.77
3,Rural,Chhattisgarh,6.628571
4,Rural,Delhi,15.258571


In [177]:
fig = px.sunburst(unemployment,path=['area','region'],values='estimated_unemployment_rate',
                 title ='Unemployment rate in area and region',height=750,template='gridon')
fig.show()

## Unemployment rate before and after Lockdown

In [178]:
# data representation before and after lockdown
before_lockdown = df[(df['month_int']>=1) &(df['month_int'] <4)]
after_lockdown = df[(df['month_int'] >=4) & (df['month_int'] <=6)]

In [179]:
af_lockdown = after_lockdown.groupby('region')['estimated_unemployment_rate'].mean().reset_index()
lockdown = before_lockdown.groupby('region')['estimated_unemployment_rate'].mean().reset_index()
lockdown['unemployment rate before lockdown'] = af_lockdown['estimated_unemployment_rate']
lockdown.columns = ['Region','Unemployment_rate_before_lockdown','Unemployment_rate_after_lockdown']
lockdown

Unnamed: 0,Region,Unemployment_rate_before_lockdown,Unemployment_rate_after_lockdown
0,Andhra Pradesh,6.243333,11.126
1,Assam,6.48,6.563333
2,Bihar,14.276667,27.459
3,Chandigarh,19.366667,12.656667
4,Chhattisgarh,8.683333,12.72
5,Delhi,16.145,19.195
6,Goa,5.074,10.301429
7,Gujarat,6.138333,8.814
8,Haryana,24.165,30.887
9,Himachal Pradesh,20.283333,14.982


Unemployment rate has significantly gone up after the lockdown

In [180]:
# Change in unemployment rate after lockdown
lockdown['rate change in unemployment'] =round(lockdown['Unemployment_rate_after_lockdown']-lockdown['Unemployment_rate_before_lockdown']
                                              /lockdown['Unemployment_rate_before_lockdown'],2)

In [143]:
fig = px.bar(lockdown,x='Region',y='rate change in unemployment',color='rate change in unemployment',
            title='Percentage change in Unemployment rate in each region after lockdown',template='gridon')
fig.update_layout(xaxis={'categoryorder':'total ascending'})
fig.show()

As per the data, Post lockdown:                          
Highest percentage change in unemployment rate can be observed in Jharkhand and Haryana    
Lowest percentage change in unemployment rate can be observed in Assam and Sikkim