# India's Market Analysis

In [60]:
import pandas as pd
import dash
from dash import dcc, html, Input, Output
from collections import Counter
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# Part 1: Preprocessing Data

## 1.1 Downloading Data

In [61]:
!kaggle datasets download -d ashaychoudhary/india-job-market-dataset --unzip -p ./india_job_market


Dataset URL: https://www.kaggle.com/datasets/ashaychoudhary/india-job-market-dataset
License(s): MIT
Downloading india-job-market-dataset.zip to ./india_job_market
  0%|                                                | 0.00/406k [00:00<?, ?B/s]
100%|████████████████████████████████████████| 406k/406k [00:00<00:00, 4.23MB/s]


In [62]:
ls

India Job Market Analysis.ipynb  [34mindia_job_market[m[m/
[34mIndia-Job-Market-Analysis[m[m/       kaggle.json
README.md


## 1.2 Reviewing the first couple of rows of dataset

In [63]:
df = pd.read_csv('./india_job_market/india_job_market_dataset.csv')  
df.head()

Unnamed: 0,Job ID,Job Title,Company Name,Job Location,Job Type,Salary Range,Experience Required,Posted Date,Application Deadline,Job Portal,Number of Applicants,Education Requirement,Skills Required,Remote/Onsite,Company Size
0,JOB1,Software Engineer,Amazon,Ahmedabad,Full-time,5-8 LPA,2-5 years,2025-01-16,2025-01-25,LinkedIn,23,PhD,"C++, SQL, Python",Remote,Small (1-50)
1,JOB2,Marketing Executive,Infosys,Ahmedabad,Internship,5-8 LPA,2-5 years,2024-12-25,2025-01-19,Indeed,462,MBA,"SQL, C++, Python",Remote,Large (500+)
2,JOB3,Financial Analyst,Deloitte,Jaipur,Contract,20+ LPA,5-10 years,2025-01-22,2025-01-29,Naukri.com,430,M.Tech,"Machine Learning, Excel, React",Remote,Large (500+)
3,JOB4,Business Analyst,Amazon,Delhi,Full-time,20+ LPA,2-5 years,2025-01-07,2025-02-06,LinkedIn,387,B.Tech,"Machine Learning, Python, SQL",Hybrid,Small (1-50)
4,JOB5,Software Engineer,Infosys,Delhi,Full-time,12-20 LPA,10+ years,2024-12-26,2025-01-08,Indeed,199,MBA,"UI/UX, C++, Java",Onsite,Small (1-50)


## 1.3 Adding the coordinates for the unique cities used in this dataset to get a visual of the job location

In [78]:
list(df['Job Location'].unique())

['Ahmedabad',
 'Jaipur',
 'Delhi',
 'Pune',
 'Noida',
 'Mumbai',
 'Hyderabad',
 'Kolkata',
 'Bangalore',
 'Chennai']

In [64]:
# City-to-coordinates mapping
city_coordinates = {
    "Ahmedabad": {"lat": 23.0225, "lon": 72.5714},
    "Jaipur": {"lat": 26.9124, "lon": 75.7873},
    "Delhi": {"lat": 28.7041, "lon": 77.1025},
    "Pune": {"lat": 18.5204, "lon": 73.8567},
    "Noida": {"lat": 28.5355, "lon": 77.3910},
    "Mumbai": {"lat": 19.0760, "lon": 72.8777},
    "Hyderabad": {"lat": 17.3850, "lon": 78.4867},
    "Kolkata": {"lat": 22.5726, "lon": 88.3639},
    "Bangalore": {"lat": 12.9716, "lon": 77.5946},
    "Chennai": {"lat": 13.0827, "lon": 80.2707}
}

In [65]:
# Map latitude and longitude
df["Latitude"] = df["Job Location"].map(lambda x: city_coordinates[x]["lat"])
df["Longitude"] = df["Job Location"].map(lambda x: city_coordinates[x]["lon"])

# Part 2: Analysis of the Data

## 2.1 Interactive map of India and the Job Market. 

The provided code creates an interactive map to visualize job postings across Indian cities, based on the selected job role. This is implemented using the Dash framework, which enables dynamic updates and user-friendly interactivity.

**Note: You can interact with the chart by toggling variables on or off. Simply click on any legend item (key) corresponding to a variable to hide it from the display. Clicking it again will bring it back. This allows you to focus on specific variables or compare selected ones more effectively.**

In [80]:
# Initialize Dash app
app = dash.Dash(__name__, external_stylesheets=["https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css"])

# App layout
app.layout = html.Div([
    html.H1("India Job Market Analysis", style={'textAlign': 'center'}),
    html.Div([
        html.Label("Select Job Role:"),
        dcc.Dropdown(
            id="role-dropdown",
            options=[{"label": role, "value": role} for role in df["Job Title"].unique()],
            value="Software Engineer",
            clearable=False
        )
    ], style={'width': '50%', 'margin': '0 auto', 'padding': '20px'}),
    dcc.Graph(id="map"),
])


# Callback to update map
@app.callback(
    Output("map", "figure"),
    Input("role-dropdown", "value")
)
def update_map(selected_role):
    # Filter dataset for the selected role
    filtered_df = df[df["Job Title"] == selected_role]
    
    # Group by city to calculate total jobs posted
    city_job_counts = filtered_df.groupby("Job Location").size().reset_index(name="Total Jobs")
    
    # Merge the job counts with their corresponding lat/lon (if separate)
    city_job_counts = city_job_counts.merge(filtered_df[["Job Location", "Latitude", "Longitude"]].drop_duplicates(), on="Job Location")
    
    # Create the map
    fig = px.scatter_mapbox(
        city_job_counts,
        lat="Latitude",
        lon="Longitude",
        size="Total Jobs",  # Use total jobs to determine circle size
        color="Job Location",
        hover_name="Job Location",
        hover_data={"Total Jobs": True},
        size_max=50,
        zoom=4,
        title=f"Total Jobs for {selected_role} Across Indian Cities"
    )
    fig.update_layout(mapbox_style="carto-positron", mapbox_zoom=4, mapbox_center={"lat": 20.5937, "lon": 78.9629})
    fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
    return fig

# Run app
if __name__ == "__main__":
    app.run_server(debug=True)


**The size of the circle indicates the number of jobs posted in the area given the job title in the toggle bar.**

## Part 2.2: Job Posting Trends Over Time

 This code creates an interactive line chart to visualize job posting trends over time, segmented by company. The dataset is grouped by Posted Date and Company Name, and the number of jobs is counted for each combination. The chart uses Plotly to display these trends, with separate lines for each company.


In [67]:
df['Posted Date'] = pd.to_datetime(df['Posted Date'])


# Group by date and company
trend_df = df.groupby(['Posted Date', 'Company Name']).size().reset_index(name='Job Count')

# Interactive line chart
fig = px.line(
    trend_df,
    x='Posted Date',
    y='Job Count',
    color='Company Name',  # Different lines for each company
    title='Job Posting Trends Over Time by Company',
    labels={'Job Count': 'Number of Jobs', 'Posted Date': 'Date'}
)

fig.update_traces(mode='lines+markers')  # Add markers to the lines
fig.update_layout(
    hovermode='x unified',  # Unified hover for better comparison
    legend_title="Company Name"
)
fig.show()

**How to Use**

Hover over the chart to view the number of jobs posted by each company on a specific date.
Click on a company's name in the legend to hide or display its trend line for focused analysis.
Markers on the lines highlight individual data points for better readability.
This visualization helps compare job posting patterns across companies over time.

## Part 2.3: Skill Demand Analysis

This code generates a grouped bar chart to analyze skill demand across companies. The Skills Required column is split into individual skills, and the dataset is grouped by Company Name and Skills to count occurrences. The chart uses Plotly to display skill counts for each company, with bars grouped by skill.

In [68]:
# Explode skills into individual rows and add company information
skills_df = df.assign(Skills=df['Skills Required'].str.split(', ')).explode('Skills')

# Count skills per company
skill_counts_by_company = skills_df.groupby(['Company Name', 'Skills']).size().reset_index(name='Count')

# Grouped bar chart for company-wise skill demand
fig = px.bar(
    skill_counts_by_company,
    x='Skills',
    y='Count',
    color='Company Name',
    title="Company-Wise Skill Demand Analysis",
    barmode='group',
    text='Count'
)
fig.update_traces(textposition='outside')
fig.show()

**How to Use**

Hover over bars to view the number of times a skill is required by each company.
Use the legend to toggle specific companies on or off for clearer comparisons.
Compare grouped bars to identify the most in-demand skills across companies.
This chart highlights company-specific skill preferences, enabling targeted insights for job seekers or analysts.

## Part 2.4: Industry-Wise Job Distribution by Type of Employment

This code creates a treemap to visualize job distribution hierarchically by job type, title, and location. Each rectangle represents a category, with nested rectangles showing subcategories. The chart uses Plotly for interactive exploration.

In [82]:
fig = px.treemap(
    df,
    path=["Job Type", "Job Title", "Job Location"],  # Hierarchical structure
    title="Industry-Wise Job Distribution by Type of Employment",
)

fig.update_layout(
    margin=dict(t=50, l=25, r=25, b=25)  # Reduce margin space for larger treemap area
)
fig.show()


**How to Use**

Hover over rectangles to view details about job type, title, and location.
Click on a rectangle to zoom into specific subcategories for deeper analysis.
Use the interactive legend to toggle categories on or off.
This treemap provides an intuitive overview of the job market's structure, making it easy to explore employment patterns visually.