AI Job Market Analysis

This notebook explores trends in the AI job market using pandas, with a focus on filtering, grouping, and aggregation.


Dataset Loading & Overview

In [None]:
import pandas as pd
df = pd.read_csv("ai_job_market.csv")
print(df)




In [None]:
#number of columns
print(len(df.columns))

#number of rows
print(len(df["job_id"]))

In [None]:
#column names
df.columns

In [None]:
df.info()
#numeric rows: 0, 8 (not converted yet)
#categorial: all else

#better way
df.select_dtypes(include="number").columns

df.select_dtypes(exclude="number").columns

In [None]:
#No columns appear redundant

In [None]:
print(df.head(15)) #prints first 15 rows

In [None]:
print(df.tail(10))

In [None]:
df["salary_range_usd"].dtype
#10. potentially predict: industry(classification) or salary_usd(regression)

Column Exploration & Selection

In [None]:
df["job_title"] #chooese job title row

In [None]:
df[["job_title", "industry", "company_name"]]  #two brackets because makes a dataframe, not a series and it is a list

In [None]:
print(df.iloc[99]) #100th row
print(df.iloc[99]["job_title"])

In [None]:
'''df.iloc[0:49]
df[["salary_range_usd", "job_title"]]'''
#doesnt combine both^
df.loc[0:49, ["salary_range_usd", "job_title"]]

In [None]:
list(df.iloc[99])

In [None]:
df.iloc[199:300] #rows 200-300

In [None]:
df[df["industry"] == "Tech"]

In [None]:
df.iloc[-1] #last row

Filtering & Boolean Indexing

In [None]:
#df["company_name"].str.contains("Engineer")
#prints true/false series^

df[df["job_title"].str.contains("Engineer", na=False)]


In [None]:
df[df["job_title"] == "Researcher"]

'''
df[df["job_title"].str.contains("Researcher", na=False)]'''

In [None]:
df[(df["industry"] == "Finance") & (df["job_title"] == "Data Scientist")] ###WORK ON

In [None]:
df[(df["industry"] == "Education") | (df["industry"] == "Healthcare")] #industry in Finance or healthcare

In [None]:
'''df[(df["industry"] == "Automotive") & (df["industry"] == "Tech")] #in automotive and tech'''
#not possible, since each row only has one industry
in_techs = df["industry"].str.contains("Tech", na=False)
in_auto = df["industry"].str.contains("Automotive", na=False)

both = df[in_techs & in_auto]
print(both)

In [None]:
'''print(df["company_name"].str.startswith("A"))
#company names start with A
#incorrect^ only returns boolean for the whole list'''


df[df["company_name"].str.startswith("A")]

#filter by letter

In [None]:
df["company_name"].str.endswith("LLC")
#Company names end w LLC

In [None]:
df["company_name"].str.contains("AI") #enter in, could be case sensitive

In [None]:
df["company_name"].str.contains("NLP")

In [None]:
#all jobs containing NLP
df[df["job_title"].str.contains("NLP", na=False)]

In [None]:
#job titles containing "Vision"
df[df["job_title"].str.contains("Vision", na=False)] #

String Operations & Text-Based Analysis

In [None]:
#if any jobs are in more than one industry
indus = df["industry"].str.split()
count = df["industry"].str.len()
multi_industry_jobs = df[count > 1]
print(multi_industry_jobs)

In [None]:
#senior jobs
df[df["job_title"].str.startswith("Senior")]

In [None]:
#company names that contain commas
df[df["company_name"].str.contains(",")]

In [None]:
job_title_word_split = df["job_title"].str.split()
'''job_word_count = df["job_title"].str.len()'''
#this counts characters, not words
job_word_count = job_title_word_split.str.len()
multi_job = df[job_word_count >= 3]
print(multi_job)

In [None]:
#company names that containt "and"
df[df["company_name"].str.contains("and", na=False)]

In [None]:
#jobs that contain both ai and manager
contain_ai = df["job_title"].str.contains("AI")
contain_manager = df["job_title"].str.contains("Manager")

contain_both = df[contain_ai & contain_manager]
print(contain_both)

In [None]:
#everything but Intern
df[~df["job_title"].str.contains("Intern", na=False)]

In [None]:
#Engineer jobs
df[df["job_title"].str.contains("Engineer", na=False, case=False)]
#case takes into account all type of classes whteher its uppercase, lowercase, or mixed

In [None]:
#researcher jobs
df[df["job_title"].str.contains("Research", na=False, case=False)]

In [None]:
#product roles
df[df["job_title"].str.contains("product", na=False, case=False)]

In [None]:
#length of each job title (words
#df["job_title"].str.len()
#^ this counts characters
df["job_title"].str.split().str.len()

In [None]:
#more than one industry
df[df["industry"].str.split().str.len() > 1]

In [None]:
#ai jobs
df[df["job_title"].str.contains("AI", na=False, case=False)]

Grouping & Aggregation

In [None]:

#df.groupby("industry").count()
#count how many jobs per industry
#this^ counts non-missing values per column inside each group
#.size counts rows
df.groupby("industry").size().sort_values(ascending=False).head(1)
#300 jobs

In [None]:
df.groupby("job_title").size().sort_values().head(1)
#the most common job title is AI Product Manager
#Computer Vision Engineer is the least common

In [None]:
df.groupby("job_title").size().sort_values(ascending=False).head(1)
#Data analyst is the most common

Sorting

In [None]:
#companies sorted in alphebetical order
df.sort_values(by='company_name')

In [None]:
#job titles sorted alphebetically
df.sort_values(by="job_title", ascending=True)

In [None]:
df.groupby("job_title").size().sort_values( ascending=True)

In [None]:
df.groupby("job_title").size().head(5)
#5 most common job titles

Key Observations

## Key Observations

- AI job postings are not spread evenly across industries. A small number of industries account for a large share of the roles, while others appear much less frequently.

- A handful of job titles show up far more often than the rest, suggesting that hiring demand is centered around a few core AI roles rather than being evenly distributed across many titles.

- Some industries offer a wider variety of roles, while others tend to focus on a narrower set of positions, indicating differences in how broadly AI is applied across sectors.

- Job postings are concentrated among certain companies, with a relatively small group of employers responsible for a large portion of the listings.

- Technical roles, such as engineering and research positions, appear more often than product-focused roles in this dataset.

- Job titles vary widely in length and specificity, which likely reflects differences in role seniority, specialization, and expectations.
