# Notebook 07 â€“ Statistical Analysis of Tech Salaries

**Project:** Tech Job Market Data Analysis  
**Author:** Matheus Prause  
**Date:** 2026  

## Objective
Apply descriptive statistical techniques to analyze salary distributions in the
technology job market, supporting insights with quantitative measures such as
frequencies, central tendency, and dispersion.

In [14]:
import pandas as pd

df = pd.read_csv("../datasets/tech_jobs_salaries.csv")

In [15]:
# Frequency Analysis

df["experience_level"].value_counts()

experience_level
SE    280
MI    213
EN     88
EX     26
Name: count, dtype: int64

In [16]:
# Frequency Analysis

df["remote_ratio"].value_counts()

remote_ratio
100    381
0      127
50      99
Name: count, dtype: int64

## Frequency Analysis

The dataset is predominantly composed of senior-level professionals, followed
by mid-level and entry-level roles. Executive positions represent a smaller
portion of the dataset, which is expected due to their limited availability
in the job market.

Regarding work modality, fully remote positions are the most frequent, clearly
outnumbering on-site and hybrid roles. This reflects the strong adoption of
remote work within the technology sector.

In [17]:
# Measures of Central Tendency

df["salary_in_usd"].mean()
df["salary_in_usd"].median()
df["salary_in_usd"].mode()

0    100000
Name: salary_in_usd, dtype: int64

## Measures of Central Tendency

The mean and median salaries differ significantly, indicating a right-skewed
distribution influenced by high-paying outlier roles. The most frequent salary
value (mode) is USD 100,000, reinforcing this amount as a common reference
point in the technology job market.

In [18]:
# Measures of Dispersion

df["salary_in_usd"].std()
df["salary_in_usd"].var()
df["salary_in_usd"].quantile([0.25, 0.5, 0.75])

0.25     62726.0
0.50    101570.0
0.75    150000.0
Name: salary_in_usd, dtype: float64

## Measures of Dispersion

Salary dispersion is high, with the interquartile range spanning from
approximately USD 62,000 to USD 150,000. This wide range highlights significant
variability in compensation, driven by differences in experience level,
job role, and market demand.

In [19]:
# Salary by Category (Statistical View)

df.groupby("experience_level")["salary_in_usd"].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
EN,88.0,61643.318182,44395.541126,4000.0,27505.0,56500.0,85425.75,250000.0
EX,26.0,199392.038462,117071.255697,69741.0,130006.5,171437.5,233750.0,600000.0
MI,213.0,87996.056338,63901.057478,2859.0,48000.0,76940.0,112000.0,450000.0
SE,280.0,138617.292857,57691.978337,18907.0,100000.0,135500.0,170000.0,412000.0


## Salary Distribution by Experience Level

Entry-level roles show lower central tendency values and high relative
variability, indicating inconsistent compensation at early career stages.
Mid-level salaries present moderate dispersion, while senior and executive
roles exhibit significantly higher medians and maximum values.

Executive positions, despite their small frequency, display the widest salary
ranges, reinforcing the presence of extreme high-end compensation outliers.

## Key Statistical Insights

- The tech job market salary distribution is strongly right-skewed, making the
median a more reliable indicator than the mean.
- Senior-level professionals represent the majority of the dataset and earn
substantially higher salaries compared to entry and mid-level roles.
- Fully remote positions dominate the dataset, reflecting a strong market
preference for remote work.
- Executive roles exhibit the highest salary variability, driven by a small
number of extremely high-paying positions.