title: "Salary and Compensation Trends in AI Careers"
author: "Team 3"
date: today
format:
  html:
    toc: true
    toc-depth: 2
    toc-exclude: ["Welcome to Our Research"]
bibliography: references.bib
csl: csl/econometrica.csl
---

# Welcome to Our Research

## Research Rationale

Our team chose **Salary Disparities Across Disciplines: Who Benefits Most from AI's Rise? ** as our research topic. Due to the rapid development of AI chatbots and substantial infrastructure investments by companies such as OpenAI, Google, NVIDIA, and DeepSeek, a significant restructuring of the current workforce appears inevitable. Some existing occupations will diminish or potentially disappear, while new roles will emerge and command higher compensation. Understanding these changes, identifying potential trends, and analyzing reliable data regarding AI's impact are of paramount importance.

---

## Why is this topic important?

The rapid advancement of AI is transforming the workforce by reshaping job opportunities and salary structures. While AI-driven technologies enhance productivity and facilitate the emergence of new roles, they also contribute to job displacement, particularly in routine-based occupations. 

Empirical studies have demonstrated that automation has been a key driver of income inequality [@adhikari2024], disproportionately impacting middle-skilled and less-educated workers. Conversely, AI-driven professions are experiencing new work opportunities, particularly in technology-related fields, necessitating a shift in workforce skills and adaptability.

The adoption of AI technologies brings **wage volatility and job opportunities** across different income groups. While AI roles often require unique, high-demand skillsets within specific occupations, it is overall important to implement policy measures to reduce the variations in industry adoption and workforce accessibility. As AI reshapes job requirements, reskilling becomes imperative, particularly for non-technical workers who are at higher risk of displacement. Addressing the digital skills gap through targeted training programs is essential to ensuring workforce adaptability and mitigating the socioeconomic disruptions caused by AI-driven labor market transformations.

---

## Trend Analysis of AI Development and Salary Fluctuations

The rapid advancement of AI has led to an increasing market demand for AI-related skills, significantly influencing employment trends and salary structures in the United States. Studies suggest that various job sectors are highly susceptible to AI-driven transformation, including those in traditionally high-skilled industries [@colombo2024]. 

Similarly, in China, the demand for AI professionals surged in 2024, particularly in specialized domains such as **healthcare and applied sciences**, as companies sought to attract top talent in these fields. Salaries for AI specialists have risen substantially. Moreover, the growing competition in the AI sector has reshaped existing salary distributions. AI-related professions offer **higher salary premiums and employment benefits** compared to traditional IT engineering roles, potentially exacerbating the wage disparity between AI professionals and non-AI professionals, thereby influencing labor market equilibrium [@stone2024].

AI is fundamentally transforming labor markets across various industries, exerting a significant influence on **wage structures depending on job characteristics**. Repetitive rule-based tasks, such as **basic data entry or customer service automation**, are highly susceptible to AI-driven automation, reducing the demand for such roles in the labor market, thereby cutting down employment opportunities and lowering compensation obtained. For instance, AI has been widely adopted in **human resources management candidate screening**, prompting some employers to lessen workforce hiring needs [@sezgin2023].

Conversely, occupations requiring **advanced cognitive skills, creativity, or interpersonal communication**—such as those in **professional services**—are less likely to be fully replaced by AI and may instead benefit from hybrid automation models. 

- **Medical field**: AI-powered diagnostic tools have enhanced precision and efficiency in medical decision-making, but final assessments still rely heavily on human expertise [@ansari2024].
- **Legal sector**: AI-driven automation has accelerated case analysis and legal procedures, but **complex legal judgments continue to rely on human practitioners**. Harvard Law Professor **David Wilkins** has found that while generative AI has the potential to transform legal practices, the primary role remains that of a **supportive tool rather than a replacement**.
- **Finance and Governance**: Industries with a focus on **financial and governance decision-making** may experience **wage polarization** due to AI-induced advancements, where professionals may experience **wage growth** due to AI-driven productivity enhancements.

---

## What do you expect to find in your analysis?

Our research encompasses various aspects of **salary disparities**. Specifically, we aim to investigate:
- **Income and job distribution differences**, including an analysis of which **geographical regions** see the highest demand and wage disparities for both AI-related and traditional professional roles.
- **Comprehensive analysis of recent datasets**, identifying the most sought-after required skills among job listings.
- **The differential impact of AI-driven changes on high-skill versus middle-skill occupations**.


---

## References


---
title: "Data Analysis"
subtitle: "Comprehensive Data Cleaning & Exploratory Analysis of Job Market Trends"
bibliography: references.bib
csl: csl/econometrica.csl
format: 
  html:
    toc: true
    number-sections: true
    df-print: paged
jupyter: python3
---

## Data Cleaning & Preprocessing

In [None]:
import pandas as pd
import missingno as msno
import matplotlib.pyplot as plt
import plotly.express as px

# 1. Read the CSV file
df = pd.read_csv("lightcast_job_postings.csv")

# 2. Columns to drop (if they exist)
columns_to_drop = [
    "ID", "URL", "ACTIVE_URLS", "DUPLICATES", "LAST_UPDATED_TIMESTAMP",
    "NAICS2", "NAICS3", "NAICS4", "NAICS5", "NAICS6",
    "SOC_2", "SOC_3", "SOC_5"
]
# Filter out columns that actually exist in the DataFrame to avoid KeyError
valid_cols_to_drop = [col for col in columns_to_drop if col in df.columns]
df.drop(columns=valid_cols_to_drop, inplace=True)

# 3. Visualize missing values
msno.heatmap(df)
plt.title("Missing Values Heatmap")
plt.show()

# 4. Basic missing value processing
df.dropna(thresh=len(df) * 0.5, axis=1, inplace=True) # If a column has more than 50% missing values, delete it.

# If there is a 'SALARY' column, fill missing values with the median.
if "SALARY" in df.columns:
    df["SALARY"].fillna(df["SALARY"].median(), inplace=True)
# If there is an 'Industry' column, fill missing values with 'Unknown'.
if "Industry" in df.columns:
    df["Industry"].fillna("Unknown", inplace=True)

# 5. Remove duplicates
df.drop_duplicates(subset=["TITLE", "COMPANY", "LOCATION", "POSTED"], keep="first", inplace=True)

### Job Postings by Industry

In [None]:
if "Industry" in df.columns:
    fig = px.bar(df["Industry"].value_counts(), title="Job Postings by Industry")
    fig.show()

**Explanation:**  
This bar chart shows the number of job postings across different industries. It highlights which sectors are most active in recruiting, allowing us to identify high-demand areas such as technology, healthcare, or finance.

### Salary Distribution by Industry

In [None]:
if "SALARY" in df.columns and "Industry" in df.columns:
    fig = px.box(df, x="Industry", y="SALARY", title="Salary Distribution by Industry")
    fig.show()

**Explanation:**  
This box plot illustrates how salary ranges differ across industries. It helps reveal not only the median salary but also the variability and presence of outliers, giving insight into income inequality or competitive compensation within certain sectors.

### Remote vs. On-Site Jobs

In [None]:
if "REMOTE_TYPE_NAME" in df.columns:
    fig = px.pie(df, names="REMOTE_TYPE_NAME", title="Remote vs. On-Site Jobs")
    fig.show()

**Explanation:**  
This pie chart breaks down the proportion of jobs by remote type. It shows how many roles are fully remote, hybrid, or entirely on-site, offering insights into post-pandemic work trends and flexibility offered by employers.

### Job Postings Over Time

In [None]:
if "POSTED" in df.columns:
    df['POSTED'] = pd.to_datetime(df['POSTED'], errors='coerce')
    postings_over_time = df['POSTED'].value_counts().sort_index()
    fig = px.line(x=postings_over_time.index, y=postings_over_time.values, labels={'x': 'Date Posted', 'y': 'Number of Job Postings'}, title="Job Postings Over Time")
    fig.show()

title: "Skill Gap Analysis"     
format: html
kernel: python3                     
execute:
  echo: true                    
  eval: true                     
---

## 3.1 Skill Gap Analysis

### 3.1.1 Team Skill Self-Assessment

In [None]:
import pandas as pd

# Team members and their self-rated proficiency (1–5)
# Team members and their self-rated proficiency (1–5)
skills_data = {
    "Name":             ["Deyang", "Yani", "Jiapei", "Junhao"],
    "Python":           [4,        3,      4,         3],
    "SQL":              [3,        3,      4,         3],
    "Machine Learning": [2,        5,      5,         2],
    "Cloud Computing":  [1,        4,      2,         3],
    "R":                [3,        5,      4,         2],   # 
    "AWS":              [4,        4,      2,         3],   # 
    "Git":              [4,        3,      2,         1],   # 
    "Excel":            [3,        4,      5,         2],   # 
}

df_skills = pd.DataFrame(skills_data).set_index("Name")
df_skills

3.1.2 Compare to Market Demand

In [None]:
# Load raw job postings data
raw_df = pd.read_csv("lightcast_job_postings.csv")

# Drop irrelevant columns
columns_to_drop = [
    "ID", "URL", "ACTIVE_URLS", "DUPLICATES", "LAST_UPDATED_TIMESTAMP",
    "NAICS2", "NAICS3", "NAICS4", "NAICS5", "NAICS6",
    "SOC_2", "SOC_3", "SOC_5"
]
valid_cols_to_drop = [col for col in columns_to_drop if col in raw_df.columns]
raw_df.drop(columns=valid_cols_to_drop, inplace=True)

# Drop columns with >50% missing
raw_df.dropna(thresh=len(raw_df) * 0.5, axis=1, inplace=True)

# Fill missing values
if "SALARY" in raw_df.columns:
    raw_df["SALARY"].fillna(raw_df["SALARY"].median(), inplace=True)
if "Industry" in raw_df.columns:
    raw_df["Industry"].fillna("Unknown", inplace=True)

# Remove duplicates
raw_df.drop_duplicates(subset=["TITLE", "COMPANY", "LOCATION", "POSTED"], keep="first", inplace=True)

# Count keyword occurrences
top_skills = df_skills.columns.tolist()
job_text = raw_df["BODY"].fillna("")
skill_counts = {s: job_text.str.contains(s, case=False).sum() for s in top_skills}

# Append demand row
df_skills.loc["Market Demand"] = [skill_counts[s] for s in top_skills]
df_skills

3.1.3 Visualize Skill Gaps

In [None]:
import matplotlib.pyplot as plt

# Team skill levels heatmap
plt.figure(figsize=(6,2))
plt.imshow(df_skills.iloc[:-1], aspect="auto")
plt.colorbar(label="Skill Level (1–5)")
plt.yticks(range(len(df_skills.index)-1), df_skills.index[:-1])
plt.xticks(range(len(df_skills.columns)), df_skills.columns, rotation=45, ha="right")
plt.title("Team Skill Levels")
plt.tight_layout()
plt.show()

# Market demand heatmap
plt.figure(figsize=(6,2))
plt.imshow([df_skills.loc["Market Demand"]], aspect="auto")
plt.colorbar(label="Market Demand Count")
plt.yticks([0], ["Market Demand"])
plt.xticks(range(len(df_skills.columns)), df_skills.columns, rotation=45, ha="right")
plt.title("Market Demand by Skill")
plt.tight_layout()
plt.show()

hj

Based on the heatmap comparison of each person’s self-ratings (1–5) against the normalized market-demand scores, here’s a concise, paragraph-style improvement plan:

Deyang shows the largest gaps in Cloud Computing and R. To close these, start with an AWS Fundamentals micro-course (Coursera or AWS’s own training) and follow up by building a small R Shiny dashboard using the free “R for Data Science” online book and the swirl R package for hands-on exercises. Aim to spend 2–3 hours per week on labs, then peer-review each other’s code in GitHub.

Yani needs to boost Machine Learning and AWS skills. I recommend Andrew Ng’s Machine Learning specialization on Coursera, paired with the AWS Certified Cloud Practitioner path on AWS Training. After completing each module, apply what you’ve learned by deploying a simple classification model on AWS Sagemaker and sharing the workflow in a team repo, so everyone can give feedback.

Jiapei would benefit most from deeper Cloud Computing practice and reinforcing R. Enroll in Google Cloud’s “Data Engineering” Qwiklabs quests and run through interactive R exercises via swirl. Host a 30-minute “teach-back” session after completing each mini-project—this both cements your own understanding and helps teammates pick up new tricks.

Junhao has room to grow in Excel and SQL. Take an “Excel Essentials” short course on LinkedIn Learning, then tackle Mode’s SQL tutorial problems. Organize weekly problem-solving sessions where one member presents a real-world dataset and the rest write SQL queries together. This combination of structured learning and peer collaboration will efficiently close the remaining gaps.


title: "Project 4"
format:
  html:
    self-contained: true
    theme: flatly
    toc: true
---

In [None]:
import pandas as pd

df = pd.read_csv("lightcast_job_postings.csv")

columns_to_drop = [
    "ID", "URL", "ACTIVE_URLS", "DUPLICATES", "LAST_UPDATED_TIMESTAMP",
    "NAICS2", "NAICS3", "NAICS4", "NAICS5", "NAICS6",
    "SOC_2", "SOC_3", "SOC_5"
]
valid_cols_to_drop = [col for col in columns_to_drop if col in df.columns]
df.drop(columns=valid_cols_to_drop, inplace=True)

df.dropna(thresh=len(df) * 0.5, axis=1, inplace=True) 

if "SALARY" in df.columns:
    df["SALARY"].fillna(df["SALARY"].median(), inplace=True)
if "Industry" in df.columns:
    df["Industry"].fillna("Unknown", inplace=True)

df.drop_duplicates(subset=["TITLE", "COMPANY", "LOCATION", "POSTED"], keep="first", inplace=True)

if "POSTED" in df.columns:
    df['POSTED'] = pd.to_datetime(df['POSTED'], errors='coerce')

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

In [None]:
cols_to_use = ['NAICS_2022_5_NAME', 'EMPLOYMENT_TYPE_NAME', 'REMOTE_TYPE_NAME', 'EDUCATION_LEVELS_NAME']

df_kmeans = df[cols_to_use].dropna()

In [None]:
X = pd.get_dummies(df_kmeans)

In [None]:
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

inertias = []
K_range = range(1, 15) 

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X)
    inertias.append(kmeans.inertia_)

plt.figure(figsize=(8,5))
plt.plot(K_range, inertias, marker='o')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal k (Multi-Feature)')
plt.grid(True)
plt.show()

In [None]:
k_optimal = 4

kmeans_final = KMeans(n_clusters=k_optimal, random_state=42, n_init=10)
df_kmeans['cluster'] = kmeans_final.fit_predict(X)

print(df_kmeans['cluster'].value_counts())

In [None]:
print(df_kmeans.head())

In [None]:
from sklearn.decomposition import PCA
import seaborn as sns

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

pca_df = pd.DataFrame(data=X_pca, columns=['PCA1', 'PCA2'])
pca_df['cluster'] = df_kmeans['cluster'].values

plt.figure(figsize=(8,6))
sns.scatterplot(
    x='PCA1', y='PCA2',
    hue='cluster',
    palette='tab10',
    data=pca_df,
    legend='full',
    alpha=0.8
)
plt.title('KMeans Clustering with Multiple Features (PCA Reduced)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title='Cluster')
plt.grid(True)
plt.show()

In [None]:
features = ['NAICS_2022_5_NAME', 'EMPLOYMENT_TYPE_NAME', 'REMOTE_TYPE_NAME', 'EDUCATION_LEVELS_NAME']

profile_summary = []

for cluster_label in sorted(df_kmeans['cluster'].unique()):
    cluster_data = df_kmeans[df_kmeans['cluster'] == cluster_label]
    
    top_features = {}
    for feature in features:
        if cluster_data[feature].nunique() > 0:
            if feature == 'EDUCATION_LEVELS_NAME':
                clean_edu = cluster_data[feature].astype(str).str.replace(r'[\[\]\n\"]', '', regex=True)  # 去掉[], \n, "这些符号
                clean_edu = clean_edu.str.split(',').str[0] 
                top_features[feature] = clean_edu.value_counts().idxmax()
            else:
                top_features[feature] = cluster_data[feature].value_counts().idxmax()
        else:
            top_features[feature] = 'N/A'
    
    profile_summary.append({
        'Cluster': cluster_label,
        'Top NAICS_2022_5_NAME': top_features['NAICS_2022_5_NAME'],
        'Top Employment Type': top_features['EMPLOYMENT_TYPE_NAME'],
        'Top Remote Type': top_features['REMOTE_TYPE_NAME'],
        'Top Education Level': top_features['EDUCATION_LEVELS_NAME']
        })

profile_df = pd.DataFrame(profile_summary)

print(profile_df)

In this project, we used the KMeans clustering model to group job postings based on several key features: industry classification (NAICS codes), employment type (full-time or part-time), remote work type (remote, on-site, or hybrid), and minimum education requirements (such as bachelor’s or master’s degrees).
The purpose of clustering was to identify distinct groups within the job market and provide more targeted career insights and recommendations for job seekers.

We selected four features for clustering: NAICS_2022_5_NAME, EMPLOYMENT_TYPE_NAME, REMOTE_TYPE_NAME, and EDUCATION_LEVELS_NAME.
After applying one-hot encoding to these categorical variables, we used the KMeans algorithm to build the model.
To determine the optimal number of clusters, we used the Elbow Method and selected k=4, which divided the data into four distinct job groups.
We then reduced the dimensionality of the clustered data to two components using PCA and visualized the results.
The clusters were clearly separated and showed good internal consistency.

Through analysis of the clustering results, we found the following:
Cluster 0 mainly consists of jobs classified under “Unclassified Industry,” with most being full-time positions, low levels of remote work, and a majority requiring a bachelor’s degree.
Cluster 1 also contains jobs from unclassified industries, but with a higher proportion of remote work opportunities, making it suitable for those who prefer remote jobs.
Clusters 2 and 3 are primarily composed of jobs in “Computer Systems Design and Related Services,” with higher education requirements, often needing a master’s degree or higher.
These clusters are more appropriate for technical job seekers.

Based on the cluster characteristics, we recommend that job seekers who prefer remote work focus on opportunities within Cluster 1.
Those with advanced degrees and technical backgrounds should prioritize jobs in Clusters 2 and 3.
Entry-level candidates or those with lower education requirements may find suitable opportunities in Cluster 0.

In [None]:
import pandas as pd

df = pd.read_csv("lightcast_job_postings.csv")

columns_to_drop = [
    "ID", "URL", "ACTIVE_URLS", "DUPLICATES", "LAST_UPDATED_TIMESTAMP",
    "NAICS2", "NAICS3", "NAICS4", "NAICS5", "NAICS6",
    "SOC_2", "SOC_3", "SOC_5"
]
valid_cols_to_drop = [col for col in columns_to_drop if col in df.columns]
df.drop(columns=valid_cols_to_drop, inplace=True)

if "SALARY" in df.columns:
    df["SALARY"].fillna(df["SALARY"].median(), inplace=True)
if "Industry" in df.columns:
    df["Industry"].fillna("Unknown", inplace=True)
df["MIN_YEARS_EXPERIENCE"] = df["MIN_YEARS_EXPERIENCE"].fillna(df["MIN_YEARS_EXPERIENCE"].median())

df.drop_duplicates(subset=["TITLE", "COMPANY", "LOCATION", "POSTED"], keep="first", inplace=True)

if "POSTED" in df.columns:
    df['POSTED'] = pd.to_datetime(df['POSTED'], errors='coerce')

In [None]:
cols_to_check = [
    'EMPLOYMENT_TYPE_NAME', 
    'REMOTE_TYPE_NAME', 
    'EDUCATION_LEVELS_NAME', 
    'NAICS_2022_5_NAME', 
    'TITLE_NAME', 
    'SALARY',
    'MIN_YEARS_EXPERIENCE'
]

missing_counts = df[cols_to_check].isnull().sum()

print(missing_counts)

In [None]:
cols_to_use = [
    'EMPLOYMENT_TYPE_NAME', 
    'REMOTE_TYPE_NAME', 
    'EDUCATION_LEVELS_NAME',  
    'TITLE_NAME', 
    'NAICS_2022_5_NAME',
    'SALARY'
]
df_regression = df[cols_to_use].dropna()

y = df_regression['SALARY']

X = pd.get_dummies(df_regression.drop('SALARY', axis=1), drop_first=True)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=688)

In [None]:
from sklearn.linear_model import LinearRegression

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

In [None]:
y_pred = lr_model.predict(X_test)

In [None]:
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.4f}")

In [None]:
import pandas as pd

coef_df = pd.DataFrame({
    'Feature': X_train.columns,
    'Coefficient': lr_model.coef_
})

coef_df['Coefficient_Abs'] = coef_df['Coefficient'].abs()
coef_df = coef_df.sort_values(by='Coefficient_Abs', ascending=False)

print(coef_df.head(20))

In this project, we used a linear regression model to predict salary levels based mainly on the job title feature (TITLE_NAME). The model results show that roles in healthcare, such as Urgent Care Physicians and Psychiatry Physicians, as well as positions in finance like Portfolio Strategists and IT security roles like Lead Security Architects, have a strong positive impact on salary levels. This means that jobs requiring high specialization and in-demand skills tend to offer higher salaries.
For job seekers, these findings provide useful guidance for career planning and salary expectations. People who aim for higher-paying jobs might consider focusing on fields like healthcare, finance, or IT security. However, it is also important to understand that these roles often come with higher professional requirements and work pressure. When making career choices, it is important to match personal interests, skills, and strengths with realistic goals. Companies can also use this insight to better design competitive salary strategies and attract top talent.

In [None]:
results_df = pd.DataFrame({
    'Actual Salary': y_test,
    'Predicted Salary': y_pred
})

plt.figure(figsize=(8, 6))
plt.scatter(results_df['Actual Salary'], results_df['Predicted Salary'], alpha=0.6, color='cornflowerblue')
plt.plot(
    [results_df['Actual Salary'].min(), results_df['Actual Salary'].max()],
    [results_df['Actual Salary'].min(), results_df['Actual Salary'].max()],
    color='red', linestyle='--'
)
plt.title('Actual vs Predicted Salaries', fontsize=14)
plt.xlabel('Actual Salary', fontsize=12)
plt.ylabel('Predicted Salary', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

The scatter plot shows the model’s prediction performance. The x-axis represents the actual salary, and the y-axis shows the predicted salary. The red dashed line represents a perfect prediction line. Overall, the model captures the general trend of salary changes fairly well. However, in the higher salary range, especially for jobs over 200,000 dollars per year, the model tends to underestimate salaries and shows larger prediction errors. In the lower salary range, especially for jobs below 100,000 dollars, the model predicts more accurately.

This suggests that while the model provides helpful salary estimates, job seekers should use it carefully, especially when applying for high-paying positions. It is important to also consider personal background, industry trends, and specific job requirements. For those targeting high salaries, additional preparation, such as improving professional skills and gaining more experience, can help address the model's potential underestimation.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

feature_importance = pd.Series(lr_model.coef_, index=X_train.columns)
feature_importance = feature_importance.abs().sort_values(ascending=False).head(20)

plt.figure(figsize=(10, 6))
plt.barh(feature_importance.index[::-1], feature_importance.values[::-1], color='skyblue')
plt.title("Top 20 Feature Importances (Linear Regression Coefficients)", fontsize=14)
plt.xlabel("Coefficient Magnitude", fontsize=12)
plt.ylabel("Features", fontsize=12)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

From the feature importance chart, we can see that the model's explanation of salary differences is highly concentrated around a small group of high-impact job titles. This suggests that selecting roles in key industries like healthcare, finance, and IT can significantly influence salary outcomes.

For job seekers, aiming for these impactful roles can greatly improve earning potential. On the other hand, for roles with lower feature importance, it might be a good idea to adjust salary expectations based on the broader market. People working in these positions can also consider developing new skills or transitioning into fields with stronger market demand to boost their career development and salary prospects.
