# Unemployment Analysis with python

Introduction:

The Unemployment Analysis project aims to assess the impact of the COVID-19 pandemic on the job market in various regions of India. By analyzing a dataset sourced from Kaggle, we will examine the unemployment trends and patterns across different states, shedding light on the states that have been most affected.

Through the application of data analysis techniques in Python, we will uncover valuable insights regarding the dynamics of unemployment during the pandemic. Our analysis will encompass visualizations and statistical assessments, providing policymakers, economists, and researchers with comprehensive information to aid in decision-making and policy formulation.

By exploring the dataset, we will identify the regions within India that have experienced the highest unemployment rates and discern any variations in the impact of COVID-19 on employment across the country. This analysis will contribute to a deeper understanding of the socioeconomic consequences of the pandemic and facilitate the development of targeted strategies to address unemployment challenges in specific regions.

In [1]:
import numpy as np # Importing numpy library for numerical computations
import pandas as pd # Import pandas library for data manipulation and analysis
import plotly.express as px # Importing Plotly Express for interactive data visualization.
import matplotlib.pyplot as plt # Importing Matplotlib for creating visualizations using Python.
import calendar # Importing the calendar module to work with dates, months, and calendars in Python.

In [2]:
df = pd.read_csv("Unemployment.csv") # Reading the dataset into a dataframe using the pandas library
df.head() # displays the first 5 rows of the DataFrame

Unnamed: 0,Region,Date,Frequency,Estimated Unemployment Rate (%),Estimated Employed,Estimated Labour Participation Rate (%),Region.1,longitude,latitude
0,Andhra Pradesh,31-01-2020,M,5.48,16635535,41.02,South,15.9129,79.74
1,Andhra Pradesh,29-02-2020,M,5.83,16545652,40.9,South,15.9129,79.74
2,Andhra Pradesh,31-03-2020,M,5.79,15881197,39.18,South,15.9129,79.74
3,Andhra Pradesh,30-04-2020,M,20.51,11336911,33.1,South,15.9129,79.74
4,Andhra Pradesh,31-05-2020,M,17.43,12988845,36.46,South,15.9129,79.74


In [3]:
# renaming columns
df.columns =['States','Date','Frequency','Estimated Unemployment Rate','Estimated Employed','Estimated Labour Participation Rate','Region','longitude','latitude']
df.head() # displays the first 5 rows of the DataFrame

Unnamed: 0,States,Date,Frequency,Estimated Unemployment Rate,Estimated Employed,Estimated Labour Participation Rate,Region,longitude,latitude
0,Andhra Pradesh,31-01-2020,M,5.48,16635535,41.02,South,15.9129,79.74
1,Andhra Pradesh,29-02-2020,M,5.83,16545652,40.9,South,15.9129,79.74
2,Andhra Pradesh,31-03-2020,M,5.79,15881197,39.18,South,15.9129,79.74
3,Andhra Pradesh,30-04-2020,M,20.51,11336911,33.1,South,15.9129,79.74
4,Andhra Pradesh,31-05-2020,M,17.43,12988845,36.46,South,15.9129,79.74


In [4]:
df.shape # returns the dimensions (rows, columns) of the DataFrame.

(267, 9)

In [5]:
df.info() # provides a concise summary of the DataFrame's structure and content.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 267 entries, 0 to 266
Data columns (total 9 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   States                               267 non-null    object 
 1   Date                                 267 non-null    object 
 2   Frequency                            267 non-null    object 
 3   Estimated Unemployment Rate          267 non-null    float64
 4   Estimated Employed                   267 non-null    int64  
 5   Estimated Labour Participation Rate  267 non-null    float64
 6   Region                               267 non-null    object 
 7   longitude                            267 non-null    float64
 8   latitude                             267 non-null    float64
dtypes: float64(4), int64(1), object(4)
memory usage: 18.9+ KB


In [6]:
df.describe() # generates summary statistics of the DataFrame's numerical columns

Unnamed: 0,Estimated Unemployment Rate,Estimated Employed,Estimated Labour Participation Rate,longitude,latitude
count,267.0,267.0,267.0,267.0,267.0
mean,12.236929,13962110.0,41.681573,22.826048,80.532425
std,10.803283,13366320.0,7.845419,6.270731,5.831738
min,0.5,117542.0,16.77,10.8505,71.1924
25%,4.845,2838930.0,37.265,18.1124,76.0856
50%,9.65,9732417.0,40.39,23.6102,79.0193
75%,16.755,21878690.0,44.055,27.2784,85.2799
max,75.85,59433760.0,69.69,33.7782,92.9376


In [7]:
df.value_counts() # returns the count of unique values

States           Date         Frequency  Estimated Unemployment Rate  Estimated Employed  Estimated Labour Participation Rate  Region  longitude  latitude
Andhra Pradesh    29-02-2020   M         5.83                         16545652            40.90                                South   15.9129    79.7400     1
Punjab            31-01-2020   M         11.11                        9442093             42.82                                North   31.1471    75.3412     1
Puducherry        29-02-2020   M         1.76                         493961              40.80                                South   11.9416    79.8083     1
                  30-04-2020   M         75.85                        117542              39.30                                South   11.9416    79.8083     1
                  30-06-2020   M         4.24                         367135              30.80                                South   11.9416    79.8083     1
                                             

In [8]:
df.isnull().sum() #returns the number of missing values (null values) in each column of the DataFrame

States                                 0
Date                                   0
Frequency                              0
Estimated Unemployment Rate            0
Estimated Employed                     0
Estimated Labour Participation Rate    0
Region                                 0
longitude                              0
latitude                               0
dtype: int64

In [9]:
# Creating a bar plot to visualize the unemployment rate across states over time.
# The plot shows the estimated unemployment rate on the y-axis, states on the x-axis, and uses color
# to differentiate between states. The plot is animated based on the date and uses a custom template.
# The x-axis categories are ordered in descending order based on the total. The resulting plot is displayed using the `show()` function.
fg = px.bar(df, x='States', y='Estimated Unemployment Rate', color='States', title='Unemployment Rate', animation_frame='Date', template='plotly')
fg.update_layout(xaxis={'categoryorder': 'total descending'})
fg.update_xaxes(tickangle=45, tickfont=dict(size=10))
fg.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 3000
fg.show()

In [10]:
# Creating a bar plot to visualize the unemployment rate across regions, with bars colored based on states.
fg = px.bar(df,x='Region',y='Estimated Unemployment Rate',color='States',title='Unemployment rate across Regions',animation_frame='Date',template='plotly')
fg.update_layout(xaxis={'categoryorder':'total descending'})
fg.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 3000
fg.show()

In [11]:
# Creating a box plot to visualize the unemployment rate across different states.
fig = px.box(df,x='States',y='Estimated Unemployment Rate',color='States',title='Unemployment rate accross States',template='plotly')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fg.update_xaxes(tickangle=45, tickfont=dict(size=10))
fig.show()

In [12]:
# Creating a scatter geospatial plot to visualize the impact of lockdown on employment across states.
# The plot displays the latitude and longitude coordinates of each state, with the color representing the states, the size
# representing the estimated unemployment rate, and the animation showcasing the changes over months. The plot is focused
# on the Indian subcontinent, with a custom template and title. The animation frame duration is set to 2000 milliseconds.
# The geospatial axes are limited to specific latitude and longitude ranges, and the ocean color is customized.
# Finally, the plot is displayed using the `show()` function.
df['Date'] = pd.to_datetime(df['Date'],dayfirst=True)
df['Month'] =  df['Date'].dt.month
df['Month_name'] =  df['Month'].apply(lambda x: calendar.month_abbr[x])
fig = px.scatter_geo(df,'longitude', 'latitude', color="States",
                     hover_name="States", size="Estimated Unemployment Rate",
                     animation_frame="Month_name",scope='asia',template='plotly',title='Impack of lockdown on employement across states')

fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 3000
fig.update_geos(lataxis_range=[5,35], lonaxis_range=[65, 100],showocean=False)
fig.show()

In [13]:
# Creating a sunburst plot to visualize the unemployment rate in each region and state.
df1 = df[['States','Region','Estimated Unemployment Rate','Estimated Employed','Estimated Labour Participation Rate']]

une = df1.groupby(['Region','States'])['Estimated Unemployment Rate'].mean().reset_index()
fig = px.sunburst(une, path=['Region','States'], values='Estimated Unemployment Rate',
                  title= 'Unemployment rate in each Region and State',
                  height=750,template='plotly')


fig.show()

In [14]:
#percentage change in Unemployment in each state after lockdown
lock = df[(df['Month'] >= 4) & (df['Month'] <= 7)]
bf = df[(df['Month'] >= 1) & (df['Month'] <= 4)]
g = lock.groupby('States')['Estimated Unemployment Rate'].mean().reset_index()
g_bf = bf.groupby('States')['Estimated Unemployment Rate'].mean().reset_index()
g['Unemployment Rate before'] = g_bf['Estimated Unemployment Rate']
g.columns = ['States', 'Unemployment Rate after', 'Unemployment Rate before']
g['Percentage Change'] = round(g['Unemployment Rate after'] - g['Unemployment Rate before'] / g['Unemployment Rate before'], 2)
plot_per = g.sort_values('Percentage Change')
fig = px.bar(plot_per, x='States', y='Percentage Change', color='Percentage Change',
title='Unemployment Change after Lockdown', template='plotly')
fig.show()

In [15]:
# Impact of lockdown on employment across states

plot_per['impact status'] = plot_per['Percentage Change']
fig = px.bar(plot_per, y='States',x='Percentage Change',color='impact status',
            title='Impact of lockdown on employment across states',template='plotly',height=650)


fig.show()

# Conclusion:

Based on the provided information and the analysis conducted:

1. The bar plot showcasing the unemployment rate across states over time indicates that the months of April and May exhibited the highest unemployment rates across states like Puducherry, Tamil Nadu, Jharkhand, Bihar, and Haryana.

2. The bar plot illustrating the unemployment rate across regions, with bars colored based on states, reveals that during April and May, the northern states, southern states, and eastern states experienced the highest unemployment rates.

3. The box and bar plots further highlight the impact of the lockdown, with Puducherry being the most affected and Meghalaya the least affected. These findings align with the insights gained from the percentage change plot.

4. The sunburst plot provides an overview of the unemployment rates in different regions, showing that the northern region had the highest unemployment rate, followed by the southern region, east region, northwest region, and west region.

In conclusion, the analysis indicates that the months of April and May witnessed the highest unemployment rates, with certain states and regions being more severely affected by the lockdown measures. Puducherry emerges as one of the most impacted states, while Meghalaya appears to have experienced a relatively lower impact. These findings provide valuable insights into the dynamics of unemployment during the specified period and can aid in formulating targeted strategies to address the challenges faced in specific states and regions.