## **1. Dataset Overview**

#### ***Brief Introduction***
This notebook utilizes the Global AI Job Market & Salary Trends 2025 dataset, available on Kaggle. The dataset is designed to simulate real-world AI and machine learning job market trends for educational and research purposes. It includes over 15,000 job postings from around the globe, featuring a wide spectrum of roles—such as Data Scientist, ML Engineer, and AI Researcher—and covers key details about salaries, company locations, job types, required skills, experience levels, and more. The information is algorithmically generated and anonymized based on industry research, resulting in a rich, comprehensive resource for analysis.

#### ***Objective of Analysis***
The main objective of this exploratory analysis is to gain a deep understanding of the dataset's structure and the current landscape of AI jobs worldwide. We’ll focus on:
- Investigating salary trends across job roles and locations.
- Identifying in-demand skills and qualifications.
- Mapping geographic hotspots and regional differences for expertise and opportunities.
- Exploring other key job attributes such as experience levels, company size, employment type, and remote work prevalence.

This analysis is intended as a foundation for potential machine learning project—such as salary prediction, upskilling suggestions, talent matching, and career advising. The goal is to familiarize ourselves with the dataset’s potential and spot actionable analytical directions appropriate for a data science workflow.

## **2. Import Libraries & Dataset**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
data = pd.read_csv('../data/raw/ai_job_dataset1.csv')

In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   job_id                  15000 non-null  object 
 1   job_title               15000 non-null  object 
 2   salary_usd              15000 non-null  int64  
 3   salary_currency         15000 non-null  object 
 4   salary_local            15000 non-null  int64  
 5   experience_level        15000 non-null  object 
 6   employment_type         15000 non-null  object 
 7   company_location        15000 non-null  object 
 8   company_size            15000 non-null  object 
 9   employee_residence      15000 non-null  object 
 10  remote_ratio            15000 non-null  int64  
 11  required_skills         15000 non-null  object 
 12  education_required      15000 non-null  object 
 13  years_experience        15000 non-null  int64  
 14  industry                15000 non-null

---
#### ***Unique Values of Attributes***

In [18]:
data["job_title"].value_counts()

job_title
Machine Learning Engineer      824
Deep Learning Engineer         786
Computer Vision Engineer       780
AI Specialist                  774
Data Engineer                  769
Principal Data Scientist       768
AI Product Manager             764
Data Scientist                 763
Robotics Engineer              762
AI Architect                   758
Autonomous Systems Engineer    755
NLP Engineer                   741
Data Analyst                   734
Machine Learning Researcher    734
Research Scientist             731
AI Software Engineer           730
AI Research Scientist          724
AI Consultant                  713
Head of AI                     701
ML Ops Engineer                689
Name: count, dtype: int64

In [19]:
data["experience_level"].value_counts()

experience_level
EX    3843
MI    3764
SE    3741
EN    3652
Name: count, dtype: int64

In [3]:
data["employment_type"].value_counts()

employment_type
CT    3841
PT    3757
FL    3705
FT    3697
Name: count, dtype: int64

In [21]:
data["company_size"].value_counts()

company_size
L    5087
S    4975
M    4938
Name: count, dtype: int64

In [24]:
data["education_required"].value_counts()

education_required
Bachelor     3863
PhD          3761
Associate    3688
Master       3688
Name: count, dtype: int64