# Exploring the 2025 Data Science Job Market

This project uses "The AI, ML, Data Science Salary (2020- 2025)" dataset from [Kaggle](https://www.kaggle.com/datasets/samithsachidanandan/the-global-ai-ml-data-science-salary-for-2025), which contains job listings in the AI/ML/Data Science fields for the year 2025. Each listing includes attributes such as job title, experience level, company size, salary (in USD and local currency), remote ratio, and geographic location.

**Click [here](https://colab.research.google.com/github/BenGoBlue05/ds_salaries/blob/main/ds_salaries.ipynb) to view this notebook from a browser and interact with its graphs.**

### Goals
The primary goal is to understand trends in the data science job market by answering the following questions:
- What are the most common roles available?
- How does salary vary by experience level for each role?
- What is the typical distribution of experience levels within each role?

### Tasks
To meet these goals, the project centers on three key interactive tasks:
1. **Identify the most frequently listed roles** and their median salaries.
2. **Drill down into a specific role** to view how salary varies by experience level.
3. **Explore the experience-level composition** of each role to assess how seniority impacts availability.

These tasks directly shaped the interactive design of the dashboard, which begins with a bar chart of top job roles. Upon clicking a role, the dashboard reveals two additional views: a bar chart of median salary by experience level, and a pie chart representing the distribution of roles by experience level.



## Dataset Overview

The dataset, originally sourced from [Kaggle](https://www.kaggle.com/datasets/samithsachidanandan/the-global-ai-ml-data-science-salary-for-2025), contains job listings in the fields of artificial intelligence, machine learning, and data science. Each row corresponds to a single job record, and includes a variety of features:

### Key Attributes
- **job_title** – Title of the position (e.g., Data Scientist, ML Engineer)
- **experience_level** – Seniority level: `EN` (Entry), `MI` (Mid), `SE` (Senior), `EX` (Executive)
- **employment_type** – Full-time, part-time, contract, etc.
- **salary_currency** – Currency in which the salary is paid
- **salary** – Raw salary amount in original currency
- **salary_in_usd** – Converted salary in USD (standardized)
- **employee_residence** – Country where the employee is based
- **company_location** – Country where the company is located
- **company_size** – Size bucket: `S`, `M`, or `L`
- **remote_ratio** – Percentage of time the job is remote

This dataset offers a rich basis for analyzing trends in job role availability, salary ranges, and the impact of experience level — all of which directly inform the design of the interactive visualization.


In [122]:
import pandas as pd
import altair as alt
import warnings

original_df = pd.read_csv('salaries.csv')
original_df.head()


Unnamed: 0,work_year,experience_level,employment_type,job_title,salary,salary_currency,salary_in_usd,employee_residence,remote_ratio,company_location,company_size
0,2025,MI,FT,Customer Success Manager,57000,EUR,60000,NL,50,NL,L
1,2025,SE,FT,Engineer,165000,USD,165000,US,0,US,M
2,2025,SE,FT,Engineer,109000,USD,109000,US,0,US,M
3,2025,SE,FT,Applied Scientist,294000,USD,294000,US,0,US,M
4,2025,SE,FT,Applied Scientist,137600,USD,137600,US,0,US,M


## Design Summary and Justification

The final interactive visualization is composed of three linked components:

1. **Top Job Roles by Frequency (Main Bar Chart):**
   - **Purpose:** To show which AI/ML/DS job roles are most commonly listed.
   - **Justification:** Frequency analysis provides users with an overview of demand in the job market. Sorting by count helps immediately surface the most prominent roles.

2. **Median Salary by Experience Level (Bar Chart – Role-Specific):**
   - **Purpose:** Once a role is selected, this chart shows how compensation changes with experience.
   - **Justification:** Comparing salaries by seniority level (EN, MI, SE, EX) gives job seekers actionable insight into salary expectations and growth trajectories within a role.

3. **Experience Level Proportions (Pie Chart – Role-Specific):**
   - **Purpose:** Also tied to the selected role, this chart shows what percentage of listings are targeted at each experience level.
   - **Justification:** This gives an indication of accessibility — e.g., whether a role skews toward entry-level or senior candidates.

### Interactivity
- **Click-based filtering** connects all three charts. Selecting a job title dynamically updates the salary and experience breakdowns.
- **Altair's selection and binding** capabilities make this implementation smooth and declarative.

### Overall Justification
This design supports both **exploratory** and **confirmatory** analysis:
- Users can discover unexpected patterns, such as surprisingly high entry-level pay in niche roles (exploratory).
- Users can verify assumptions, such as “Senior ML Engineers earn more than Entry-level Data Analysts” (confirmatory).

By combining frequency, salary distribution, and experience ratios, the dashboard builds a **multi-faceted understanding** of the job landscape and encourages deeper insight generation through **interaction-driven workflows**.


In [123]:
# Filter for US employees
us_df = original_df[
    (original_df['employee_residence'] == 'US') & 
    (original_df['employment_type'] == 'FT') &
        (original_df['work_year'] == 2025)
        ]

# Calculate median salary and count for each job title
job_stats = us_df.groupby('job_title').agg(
    median_salary=('salary_in_usd', 'median'),
    count=('job_title', 'size')
).reset_index()

# Calculate median salary and count for each job title
job_stats_level = us_df.groupby(['job_title', 'experience_level']).agg(
    median_salary=('salary_in_usd', 'median'),
    count=('job_title', 'size')
).reset_index()

# Map experience_level values in job_stats_level
job_stats_level['experience_level'] = job_stats_level['experience_level'].map({
    'EN': 'Entry',
    'MI': 'Mid',
    'SE': 'Senior',
    'EX': 'Exec/Director'
})


# Get the top 10 most common job titles
top_10_jobs = job_stats.nlargest(10, 'count')

# Define a selection for the overview chart
selection = alt.selection_point(fields=['job_title'], name='Job Selection')

# Create the overview chart
overview = alt.Chart(top_10_jobs).mark_bar().encode(
    x=alt.X('job_title:N', sort='-y', title='Job Title'),
    y=alt.Y('median_salary:Q', title='Median Salary (USD)'),
    tooltip=['job_title', 'median_salary', 'count'],
    color=alt.condition(selection, alt.value('steelblue'), alt.value('lightgray'))
).properties(
    title='Top 10 Most Common Job Titles in the US by Median Salary',
    width=250,
    height=250
).add_selection(selection)

warnings.filterwarnings('ignore', category=DeprecationWarning)

# Create the detail chart
detail = alt.Chart(job_stats_level).transform_filter(
    selection
).mark_bar().encode(
    x=alt.X('experience_level:N', title='Experience Level',  sort=['Entry', 'Mid', 'Senior', 'Exec/Director']),
    y=alt.Y('median_salary:Q', title='Median Salary (USD)'),
    color=alt.Color('experience_level:N', legend=None),
    tooltip=['experience_level', 'median_salary', 'count']
).properties(
    title='Median Salaries by Experience Level for Selected Job Title',
    width=250,
    height=250
)

# Create the pie chart
pie_chart = alt.Chart(job_stats_level).transform_filter(
    selection
).mark_arc().encode(
    theta=alt.Theta('count:Q', title='Proportion'),
    color=alt.Color('experience_level:N', title='Experience Level'),
    tooltip=['experience_level', 'count', alt.Tooltip()]
).properties(
    title='Proportion of Experience Levels for Selected Job Title',
    width=250,
    height=250
)

# Combine the charts (overview, detail, and pie chart)
combined_chart = alt.hconcat(
    overview, detail, pie_chart
).resolve_scale(
    color='independent'
)

combined_chart

## Final Evaluation: Procedure, Participants, and Results

The goal of my evaluation was to determine whether the interactive dashboard helps users effectively explore trends in job roles, salary, and experience level for data science jobs in 2025.

### Target Participants
Ideally, I would have recruited:
- Data science hiring managers
- Recruiters working with AI/ML roles
- Professionals currently job seeking in data science

However, due to the difficulty in accessing these individuals, I asked my wife — who is a data analyst — to evaluate the dashboard.

### Evaluation Procedure
- I gave her a brief overview of how the dashboard worked.
- She spent several minutes exploring the data and used the interaction features freely.
- I asked her to provide verbal feedback while interacting and share any observations or points of confusion.

### Results and Feedback
- She found the interaction intuitive and appreciated being able to compare experience levels by job role.
- Her primary suggestion was layout-related: originally, the charts were stacked vertically, requiring scrolling to see them all.
- She recommended reducing the size of the charts and aligning them horizontally so that all views were visible at once.
- I implemented this change, and the updated layout significantly improved usability and flow.

This evaluation helped confirm that the dashboard communicates insights effectively, and that thoughtful layout adjustments can enhance the overall user experience.


## Reflection and Future Improvements

Overall, the dashboard accomplished its core goal: enabling exploration of the data science job market through role frequency, salary trends, and experience-level distribution. The interactive design supported users in drawing insights quickly, especially after the layout refinement based on feedback.

### What Worked Well
- The three-panel interactive view provided a clear, cohesive narrative.
- Using Altair allowed for intuitive filtering and linking across views with minimal code.
- Focusing on median salary instead of average helped reduce distortion from outliers.

### What I Would Improve in Future Iterations
- **Add filters for company size or remote ratio** to allow deeper exploration of job types.
- **Include additional metrics**, such as salary ranges (min/max) or interquartile ranges, to show compensation spread.
- **Improve accessibility** by adding colorblind-friendly palettes and larger text labels.
- **Expand evaluation** by collecting feedback from multiple participants with varying technical backgrounds to better understand broader usability and interpretability.

This project helped reinforce the importance of iteration, user feedback, and layout clarity — even small tweaks like re-aligning the charts had a noticeable impact on usability.
