**1.Importing Libraries**

Imagine a story told by numbers, revealing the real-life effects of lockdowns and Covid-19 on jobs across India. This dataset is like a window into that story, showing how employment ups and downs mirrored the pandemic's grip. It's not just numbers; it's about people—how their work lives were reshaped during challenging times. Let's dive into this data journey together to understand the highs, lows, and everything in between of India's employment landscape during the Covid era.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

**2.Data Collection**

In [None]:
data=pd.read_csv("/kaggle/input/unemployment-in-india/Unemployment in India.csv")

In [None]:
data

**3.EDA Before Preprocessing**

In [None]:
# Display basic information about the dataset
data.info()

In [None]:
# Display statistical summary
data.describe()

In [None]:
# Drop rows with missing values
data.dropna(inplace=True)

In [None]:
# Check for missing values
data.isnull().sum()

In [None]:
data.columns

In [None]:
pd.DataFrame(data.iloc[:,3])

In [None]:
# Visualize the distribution of unemployment rate
sns.histplot(data.iloc[:,3], kde=True)
plt.title('Distribution of Unemployment Rate')
plt.xlabel('Unemployment Rate (%)')
plt.ylabel('Frequency')
plt.show()

**4.Preprocessing**

In [None]:
# Drop irrelevant columns
data.drop(['Region', ' Frequency','Area'], axis=1, inplace=True)

In [None]:
# Convert 'Date' column to datetime format
data[' Date'] = pd.to_datetime(data[' Date'])

In [None]:
data.info()

In [None]:
# Set 'Date' column as index
data.set_index(' Date', inplace=True)

**5.EDA After Preprocessing**

In [None]:
# Plotting time series of unemployment rate
sns.histplot(data[' Estimated Unemployment Rate (%)'], kde=True)
plt.title('Unemployment Rate Over Time')
plt.xlabel('Date')
plt.ylabel('Unemployment Rate (%)')
plt.show()

**6.Training**

In [None]:
# Define features and target
X = data.drop(' Estimated Unemployment Rate (%)', axis=1)
y = data[' Estimated Unemployment Rate (%)']

In [None]:
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)

**7.Model Evaluation**

In [None]:
# Predictions
train_preds = model.predict(X_train)
test_preds = model.predict(X_test)

In [None]:
# Evaluation metrics
print("Training MSE:", mean_squared_error(y_train, train_preds))
print("Testing MSE:", mean_squared_error(y_test, test_preds))
print("Training R^2:", r2_score(y_train, train_preds))
print("Testing R^2:", r2_score(y_test, test_preds))