
# Unemployment Analysis in India

This notebook analyzes the unemployment rates across various states in India using data science techniques. The analysis includes data cleaning, visualization, and statistical analysis to understand the trends and patterns in unemployment data.


### Importing Libraries

In this section, we import the necessary libraries for data manipulation, visualization, and analysis. These include:
- `numpy` and `pandas` for data handling
- `seaborn` and `matplotlib` for data visualization
- `calendar` for handling date-related data
- `plotly.express` for interactive visualizations


In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import calendar
import plotly.express as px



### Loading the Dataset

We load the dataset containing unemployment data. The data is stored in a CSV file named `data.csv`.


In [None]:
# Load the dataset
data = 'data.csv'
dt = pd.read_csv(data)



### Data Cleaning

This section includes steps to clean the data:
- Removing leading and trailing spaces from column names
- Dropping any duplicate columns
- Dropping rows with null values to ensure the dataset is clean and ready for analysis



In [None]:
# Strip leading and trailing spaces from column names
dt.columns = dt.columns.str.strip()

# Drop any duplicate columns
dt = dt.loc[:, ~dt.columns.duplicated()]

# Print the updated columns to verify
print(dt.columns)

# Check for and drop null values
dt.dropna(inplace=True)



### Date Conversion and Feature Engineering

In this section, we:
- Convert the 'Date' column to a datetime format
- Create a new column for the month extracted from the 'Date' column
- Rename columns for consistency and readability



In [None]:
# Convert 'Date' column to datetime
dt['Date'] = pd.to_datetime(dt['Date'], dayfirst=True)

# Create a new column for the month
dt['Month'] = dt['Date'].dt.month

# Renaming columns for consistency
dt.rename(columns={
    'State': 'State',
    'Date': 'Date',
    'Frequency': 'Frequency',
    'Unemployment Rate': 'Unemployment_Rate',
    'Estimated Employed': 'Employed',
    'Estimated Labour Participation Rate (%)': 'Labor_Participation_Rate',
    'longitude': 'Longitude',
    'latitude': 'Latitude'
}, inplace=True)



### Verify the Data after Cleaning

We verify the cleaned dataset to ensure all transformations were applied correctly and the data is in the desired format.


In [None]:
# Verify the data after cleaning
print(dt.info())


### Data Visualization

This section includes various visualizations to explore the data:

1. **Histogram of the Unemployment Rate**: Displays the distribution of the unemployment rate across the dataset.
2. **Line Plot of Unemployment Rate over Time**: Shows the trend of unemployment rates over time.
3. **Box Plot of Unemployment Rate by State**: Compares the unemployment rates across different states.
4. **Scatter Plot of Employed vs. Labor Participation Rate**: Examines the relationship between the number of employed individuals and the labor participation rate.


1. **Histogram of the Unemployment Rate**

In [None]:
# Histogram of the Unemployment Rate
plt.figure(figsize=(8, 6))
sns.histplot(dt['Unemployment_Rate'], bins=20, kde=True)
plt.title('Histogram of Unemployment Rate')
plt.xlabel('Unemployment Rate')
plt.ylabel('Frequency')
plt.show()

2. **Line Plot of Unemployment Rate over Time**

In [None]:
# Line plot of Unemployment Rate over time (Date)
plt.figure(figsize=(10, 6))
sns.lineplot(x='Date', y='Unemployment_Rate', data=dt)
plt.title('Unemployment Rate over Time')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate')
plt.xticks(rotation=45)
plt.show()

3. **Box Plot of Unemployment Rate by State**

In [None]:
# Box plot of Unemployment Rate by State
plt.figure(figsize=(10, 6))
sns.boxplot(x='State', y='Unemployment_Rate', data=dt)
plt.title('Unemployment Rate by State')
plt.xlabel('State')
plt.ylabel('Unemployment Rate')
plt.xticks(rotation=90)
plt.show()


4. **Scatter Plot of Employed vs. Labor Participation Rate**

In [None]:
# Scatter plot of Employed vs. Labor Participation Rate
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Employed', y='Labor_Participation_Rate', data=dt)
plt.title('Employed vs. Labor Participation Rate')
plt.xlabel('Employed')
plt.ylabel('Labor Participation Rate')
plt.show()


### Monthly Average Unemployment Rate

We calculate and visualize the monthly average unemployment rate to understand seasonal trends and patterns.



In [None]:
# Monthly average of Unemployment Rate
monthly_avg_unemployment = dt.groupby('Month')['Unemployment_Rate'].mean()
plt.figure(figsize=(8, 6))
sns.barplot(x=monthly_avg_unemployment.index, y=monthly_avg_unemployment.values)
plt.title('Monthly Average Unemployment Rate')
plt.xlabel('Month')
plt.ylabel('Average Unemployment Rate')
plt.xticks(np.arange(1, 13), calendar.month_abbr[1:13], rotation=45)
plt.show()


### Correlation Analysis

This section includes:
- A **Correlation Heatmap** to visualize the correlations between numerical variables.
- A **Pairplot** to show pairwise relationships between numerical variables.



#### Correlation Heatmap

In [None]:
# Exclude non-numeric columns from correlation calculation
numeric_columns = dt.select_dtypes(include=[np.number])
correlation_matrix = numeric_columns.corr()

# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix Heatmap')
plt.show()




#### Pairplot of Numerical Columns

In [None]:
# Pairplot of numerical columns
numerical_columns = ['Unemployment_Rate', 'Employed', 'Labor_Participation_Rate', 'Longitude', 'Latitude']
sns.pairplot(dt[numerical_columns])
plt.title('Pairplot of Numerical Variables')
plt.show()

### Animated Bar Chart for Monthly Unemployment Rates by State

This script creates an animated bar chart to visualize the monthly unemployment rates by state using Plotly and pandas.

#### Steps

1. **Load the Dataset**
2. **Clean and Prepare the Data**
3. **Group Data by State and Month**
4. **Create the Animated Bar Chart**
5. **Customize the Animation Settings**
6. **Display the Plot**

In [None]:
# Group by state and month, then average the unemployment rate
grouped_df = dt.groupby(['State', 'Month'])['Unemployment_Rate'].mean().reset_index()

    # Create the animated bar chart
fig = px.bar(grouped_df, 
                x='State', 
                y='Unemployment_Rate', 
                color='State', 
                animation_frame='Month',
                title='Monthly Unemployment Rates by State',
                labels={'Unemployment_Rate': 'Unemployment Rate (%)'},
                range_y=[0, grouped_df['Unemployment_Rate'].max() + 2])

    # Update layout for better visualization
    # fig.update_layout(
    #     xaxis={'categoryorder':'total descending'},
    #     transition={'duration': 100},
    #     yaxis_title='Unemployment Rate (%)',
    #     xaxis_title='State'
    # )
fig.update_layout(
        sliders=[{
            'currentvalue': {
                'prefix': 'Month: ',
                'font': {'size': 20}
            },
            'pad': {'b': 10},
            'len': 0.9,
            'x': 0.1,
            'xanchor': 'center',
            'y': -0.3,
            'yanchor': 'top'
        }],
        transition={'duration': 1000}
    )

    # Show the plot
fig.show()
        

### Bar Plot of Average Unemployment Rate by State

We calculate and visualize the average unemployment rate for each state to identify which states have the highest and lowest unemployment rates.


In [None]:
# Bar plot of average unemployment rate by state
state_unemployment = dt.groupby('State')['Unemployment_Rate'].mean().reset_index()
state_unemployment_sorted = state_unemployment.sort_values(by='Unemployment_Rate', ascending=False)

plt.figure(figsize=(12, 6))
sns.barplot(x='State', y='Unemployment_Rate', data=state_unemployment_sorted)
plt.xticks(rotation=90)
plt.xlabel('State')
plt.ylabel('Average Unemployment Rate')
plt.title('Average Unemployment Rate by State')
plt.tight_layout()
plt.show()


## Conclusion

### Unemployment Rate

- **Mean Unemployment Rate**: 7.83%
- **Median Unemployment Rate**: 6.85%
- **Standard Deviation of Unemployment Rate**: 5.29%

#### Key Insights:
- The dataset shows significant variability in unemployment rates across different states and regions.
- The **highest average unemployment rates** were observed in:
  - **Tripura**: 17.43%
  - **Haryana**: 14.79%
  - **Jharkhand**: 13.35%
  - **Bihar**: 12.84%
- The **lowest average unemployment rates** were found in:
  - **Meghalaya**: 1.56%
  - **Odisha**: 2.43%
  - **Assam**: 3.67%
  - **Uttarakhand**: 4.09%

### Regional Analysis

- **North**:
  - States like Haryana and Punjab exhibit higher unemployment rates, with Haryana being significantly higher.
  
- **South**:
  - Tamil Nadu and Karnataka show moderate unemployment rates, with Tamil Nadu being on the higher end within this region.
  
- **East**:
  - Bihar and Jharkhand have higher unemployment rates, indicating economic challenges in these areas.
  
- **West**:
  - Maharashtra and Gujarat have moderate unemployment rates, with Gujarat generally performing better.
  
- **Northeast**:
  - States like Meghalaya and Assam have the lowest unemployment rates, suggesting better employment conditions.

### Labour Participation Rate

- The Labour Participation Rate (LPR) varies considerably across different states and time periods.
- The LPR showed a decreasing trend during some months, which could be correlated with economic downturns or policy changes.

### Employment Trends

- **Relationship between Employment and Labour Participation Rate**:
  - There is a noticeable inverse relationship: as the unemployment rate increases, the number of employed individuals tends to decrease.

### Regional Analysis

- **North**:
  - The mean unemployment rate in northern states is higher compared to other regions.
  
- **South**:
  - Southern states show lower mean unemployment rates, indicating better employment opportunities.
  
- **East**:
  - Eastern states have higher mean unemployment rates, similar to the northern region.
  
- **West**:
  - Western states show moderate unemployment rates, with some variability.
  
- **Northeast**:
  - Northeastern states have the lowest unemployment rates, suggesting relatively stable employment conditions.

### Most Impacted States/Union Territories

- The states/UTs most affected by high unemployment rates include:
  - **Puducherry**
  - **Jharkhand**
  - **Bihar**
  - **Haryana**
  - **Tamil Nadu**

### Miscellaneous Observations

- **Yearly Impact**:
  - The effect of specific years on unemployment is evident, with certain years showing significant deviations in unemployment rates.
- **Variability**:
  - There is extreme variability in unemployment rates across all states, with some states showing more skewed distributions.
- **Correlation**:
  - When unemployment rates increase, the number of employed individuals tends to decrease, indicating an opposite relationship.

Overall, the analysis highlights significant disparities in unemployment rates across different states and time periods, influenced by various economic, seasonal, and policy factors.
