# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -**Vipil Khapre


# **Project Summary -**

Unemployment Analysis with Python is a data science project aimed at comprehensively exploring the unemployment rate, a crucial economic metric. The project delves into the challenges posed by the COVID-19 pandemic and its significant impact on unemployment rates, offering valuable insights and recommendations.

Key Objectives:

Unemployment Rate Focus: The primary objective is to analyze the unemployment rate as an essential economic indicator.
COVID-19 Impact Assessment: The project investigates the pandemic's role in unemployment rate fluctuations, providing a deeper understanding of the associated challenges.
Data-Driven Insights: Utilizing Python, the project conducts in-depth data analysis to uncover patterns and correlations between unemployment rates and other economic indicators.
Actionable Recommendations: The project's findings will culminate in actionable recommendations that can guide decision-makers, policy planners, and organizations in addressing unemployment issues.
Significance:

The project is of significant importance due to the pressing need to address unemployment concerns caused by the COVID-19 pandemic. The insights gained from this analysis can empower policymakers and stakeholders to make informed decisions, implement interventions, and develop strategies for economic recovery.

Unemployment Analysis with Python is an informative and actionable exploration of an economic challenge with far-reaching implications, making it a valuable contribution to data-driven decision-making.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Unemployment, a critical economic indicator, is typically quantified using the unemployment rate, which represents the proportion of individuals without employment within the total labor force. The COVID-19 pandemic has significantly impacted the global labor market, resulting in a substantial surge in the unemployment rate. Analyzing and comprehending the dynamics of this economic metric through data science can offer valuable insights and potential solutions.

Objective:

The primary goal of this project is to perform a comprehensive analysis of unemployment using Python. This analysis aims to shed light on the factors contributing to changes in the unemployment rate, identify patterns, and generate actionable recommendations to address unemployment challenges.

Project Details:

Unemployment Rate: The primary focus is on understanding the unemployment rate, which serves as an essential economic indicator.
COVID-19 Impact: Given the profound effects of the COVID-19 pandemic, the analysis will pay particular attention to the pandemic's role in unemployment rate fluctuations.
Data Analysis: Python will be the primary tool for data analysis, enabling the exploration of unemployment data, the identification of correlations with other economic indicators, and the visualization of trends.
Recommendations: The project will conclude by providing actionable recommendations or insights that can guide policy-making, labor market interventions, or economic recovery strategies.
This project holds significant importance, as addressing unemployment challenges and understanding their causes is crucial for policymakers, governments, and organizations in fostering economic recovery and ensuring the well-being of individuals within the labor force.



#### **Define Your Business Objective?**

Answer Here.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
df= pd.read_csv('/content/drive/MyDrive/cipherByte technology/Unemployment_Rate_upto_11_2020 - Unemployment_Rate_upto_11_2020.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head() #This code give us the overview of the dataset with the help of first five rows

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape #With the help ofthis code we see that there are 267 rows and 9 columns are avalible in the dataset

### Dataset Information

In [None]:
# Dataset Info
df.info() # this code give use the infoamtion about the dataset

#### Duplicate Values

In [None]:
# There is no duplicate values avaliable in the dataset
duplicate_values = len(df[df.duplicated()]) # with the help of this code we are able to check the infoamtion about duplicate values in the dataset
duplicate_values

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
misssing_values = df.isna().sum() # with the help of this code we can check the missing values avaliable in the dataset
misssing_values

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(df.isnull(), cbar=False)

As we can see that with the help of above graph there are no null values avaliable in the dataset

#### Removing Columns

In [None]:
#Droping the cloumns that as no use like Longitude,Latitude from the dataset
df.drop(['Longitude','Latitude'], axis=1, inplace=True)

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns  #This code provide the infomation about columns avaliable in the datset

In [None]:
# Dataset Describe
round(df.describe().T,2) #This code give  us the mathamatical information about the dataset

### Variables Description

1. State: The geographical region within India where the data was collected.

2. Date: The date corresponding to the data entry, indicating when the data was recorded.

3. Frequency: Indicates the frequency of data collection, such as monthly,
quarterly, or yearly.

4. Estimated Unemployment Rate (%): The estimated percentage of the population
within the region that is unemployed. This rate is typically calculated by dividing the number of unemployed individuals by the total labor force and multiplying by 100.

5. Estimated Employed: The estimated number of employed individuals within the
region.

6. Estimated Labour Participation Rate (%): The estimated percentage of the
population within the region that is part of the labor force, either employed or actively seeking employment.

7. Direction: Additional information about the geographical region within India.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist(): #with the help of this code we extract each column seprately
  print("No of unique values avaliable in",i, "is", df[i].nunique(),".") # with the help of this code we can extract the infomation about unique values avalible in each columns


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
#In this code will change the name of some columns whichever is necsssary for data Wrangling
df.rename(columns = {'Region':'State','Region.1':'Direction','Estimated Unemployment Rate (%)':'Estimated_Unemployment_Rate_(%)','Estimated Employed':'Estimated_Employed',
                     'Estimated Labour Participation Rate (%)':'Estimated_Labour_Participation_Rate_(%)'}, inplace=True)

In [None]:
#With the help of below code we are trying to find outliers from the data set if any
def outlier_remover(data):
  print('shape before removing outlier:',data.shape) #checking the shape before removing the outlier
  quartiles = np.quantile(data,[0.25,0.75]) #checking for quartiles in the dataset
  iqr = quartiles[1] - quartiles[0] #calculating iqr
  lower_bound  = quartiles[0] - 1.5 * iqr #caluating lower bound with the help of formula
  upper_bound = quartiles[1] + 1.5 * iqr #caluating  higher bound with the help of formula

  data += data[(data >= lower_bound) & (data <= upper_bound)] #removing the outliers from the data
  print('Shape after removing outliers:', data.shape) #Checking the shape after removing the outliers



checking_outlier = outlier_remover(df[['Estimated_Unemployment_Rate_(%)','Estimated_Employed','Estimated_Labour_Participation_Rate_(%)']])
print(checking_outlier)

In  the above output we are able to see that there is no outlier avaliable in the data set, hence shoing 'None' as an output in the result with 'shape'

In [None]:
round(df.groupby(['Direction']).apply(lambda x:x.pivot_table(values = ['Estimated_Unemployment_Rate_(%)','Estimated_Employed','Estimated_Labour_Participation_Rate_(%)'],
                                                                 index = ['State'],
                                                                 aggfunc = np.mean,
                                                                 margins=True,
                                                                 margins_name='Subtotal').reset_index()),2)

In the above output we are able to see result for Estimated Unemployment Rat(%),Estimated Employed,Estimated Labour Participation Rate(%) by direction and state wise.

In [None]:
# In the below code we converting the Date column from object to datetime
df['Date']  = pd.to_datetime(df['Date'], dayfirst=True)
df['Day']   = df['Date'].dt.day   #Seprating day from the date coulumn and create an new column name Day with the help of following code
df['Month'] = df['Date'].dt.month #Seprating month from the date coulumn and create an new column name Month with the help of following code
df['Year']  = df['Date'].dt.year  #Seprating Year from the date coulumn and create an new column name Year with the help of following code

In [None]:
df.head() #In the below result we are able to see the new columns with name Day, Month and Year.

In [None]:
# In the below code we are able to see the result for all the months seprately for Estimated Unemployment Rat(%), Estimated Employed and Estimated Labour Participation Rate(%)
round(df.groupby(['Year']).apply(lambda x:x.pivot_table(values = ['Estimated_Unemployment_Rate_(%)','Estimated_Employed','Estimated_Labour_Participation_Rate_(%)'],
                                                                 index = ['Month'],
                                                                 aggfunc = np.mean,
                                                                 margins=True,
                                                                 margins_name='Total').reset_index()),2)

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
#Correlation table between columns on unemployment Dataset
sns.heatmap(df.corr(numeric_only=True), annot=True)
plt.title('Correlation table')
plt.show()

#### Chart - 2

In [None]:
#Indian Unemployment chart as per Estimated Employed and Direction
plt.title('Indian Unemployment by Numbers')
sns.histplot(x="Estimated_Employed",hue="Direction",data=df)
plt.show()

#### Chart - 3

In [None]:
#Indian Unemployment Rate as per Estimated Unemployment Rate(%) and Direction
plt.title('Indian Unemployment Rate')
sns.histplot(x="Estimated_Unemployment_Rate_(%)",hue="Direction",data=df)
plt.show()

#### Chart - 4

In [None]:
# Below chart showing um=nemployment rate in india through Histogram
plt.style.use('seaborn-darkgrid')
plt.figure(figsize=(10, 6))
sns.histplot(df['Estimated_Unemployment_Rate_(%)'], bins=20, kde=True,color='skyblue', edgecolor='black')
plt.title('Unemployment Rate in india through Histogram')
plt.xlabel('Unemployment Rate (%) in india')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

#### Chart - 15

In [None]:
# Analysing data by Box plot
sns.set_theme(style="ticks", palette="pastel")
sns.boxplot(x="Month", y='Estimated_Employed', hue="Month", data=df, palette="Set2", legend=False)
plt.title('Analysing data by Box plot')
sns.despine(offset=10, trim=True)

# **Conclusion**

We examined state-by-state unemployment rates in our EDA. The unemployment rate ranged widely, with some regions having rates as low as 0.5% and others as high as 75%. On average, 12% of people were unemployed. We also discovered that, although this figure could vary substantially, there were, on average, about 14 million employed persons in these states. There was variation in the percentage of individuals engaged in the workforce, averaging approximately 42%. The Northeast had the lowest unemployment rate (about 11%) and the greatest participation rate (more than 52%) when the statistics were sorted by direction. The South, on the other hand, had somewhat lower participation but comparable low unemployment rates.When comparing other months, we found that unemployment rates typically peaked in April and May, when they may exceed 22%. This implies that unemployment may be influenced by economic or seasonal causes. All things considered, our data show how different the unemployment situation is between states and eras, which may assist inform approaches to addressing the problem.

First, using a line chart to analyze the employment trend over time shows differences in the number of employed people, maybe indicating seasonal variations or long-term patterns. Second, examining the histogram's representation of the estimated unemployment rate across the population reveals a distribution that is somewhat biased to the right, suggesting that most locations may have comparatively lower unemployment rates. Analyzing the unemployment rates in more detail using a pie chart reveals notable differences between the states and directions, suggesting that specific areas may have greater unemployment rates than others due to possible influences from the industrial sector or economic issues. Furthermore, by examining the link between the unemployment rate and labor participation rate using a scatter plot, we can evaluate any potential correlations and comprehend the potential effects that changes in labor force participation may have on unemployment rates. These analyses offer a strong basis for further research, including the examination of regional disparities to uncover underlying economic drivers, the examination of time to identify significant events or policies, and the exploration of additional socio-economic indicators to obtain a comprehensive understanding of the dynamics of India's labor market. In the end, utilizing these insights can help guide evidence-based policy choices meant to tackle issues with unemployment and promote long-term, steady economic growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***