# How can you maximize your salary as a Data Analyst? 
- Which skills are most strongly correlated with higher pay?
- What type of job better to search for - Remote/In-Office?
- Do you need special degree to get bigger salary?
- When better to search for a job to get bigger salary.

In [7]:
import pandas as pd
#import plotly.express as px

#import matplotlib.pyplot as plt
#import seaborn as sns

import numpy as np

import plotly.graph_objects as go

df = pd.read_csv('/Users/kolesnikevgenia/Documents/Python_Projects/Job_Skills/Raw_Data/df_Final.csv')

In [8]:
# Filter for top 3 roles if needed, or use df directly
df_filtered = df[df['job_title_short'].isin(top3_roles)].copy()

# Ensure job_posted_date is datetime
df_filtered['job_posted_date'] = pd.to_datetime(df_filtered['job_posted_date'])

# Extract month number (1 to 12)
df_filtered['month_num'] = df_filtered['job_posted_date'].dt.month

# Calculate average salary per month_num
salary_per_month = (
    df_filtered.groupby('month_num')['salary_month_avg_eur']
    .mean()
    .reset_index()
)

# Calculate count of job postings per month_num
count_per_month = (
    df_filtered.groupby('month_num')
    .size()
    .reset_index(name='job_postings_count')
)

# Merge for correlation calculations if needed
merged = pd.merge(salary_per_month, count_per_month, on='month_num')

# Correlation between month number and average salary
corr_salary_month = merged['month_num'].corr(merged['salary_month_avg_eur'])

# Correlation between month number and job postings count
corr_count_month = merged['month_num'].corr(merged['job_postings_count'])

print(f"Correlation between month and average salary: {corr_salary_month:.3f}")
print(f"Correlation between month and job postings count: {corr_count_month:.3f}")

Correlation between month and average salary: -0.220
Correlation between month and job postings count: -0.763


Correlation between month and job postings count = −0.763

This is a strong negative correlation.
It means that the number of job postings tends to drop significantly as the year goes on. For example, there may be many postings early in the year and fewer toward the end (or vice versa depending on how month numbering is defined).
This pattern could reflect hiring cycles, budget periods, or seasonality in the job market.

In [24]:
# 1. Ensure correct datetime and extract month
df['job_posted_date'] = pd.to_datetime(df['job_posted_date'])
df['month_period'] = df['job_posted_date'].dt.to_period('M').astype(str)

# 2. Filter for "Data Analyst"
df_analyst = df[df['job_title_short'] == 'Data Analyst'].copy()

# 3. Job postings count per month
job_postings_count = (
    df_analyst.groupby('month_period')
    .size()
    .reset_index(name='job_postings_count')
    .sort_values('month_period')
    .reset_index(drop=True)
)

# 4. Median salary per month
median_salary = (
    df_analyst.groupby('month_period')['salary_month_avg_eur']
    .median()
    .reset_index(name='median_salary')
    .sort_values('month_period')
    .reset_index(drop=True)
)

# 5. Merge both into one DataFrame
df_merged = pd.merge(
    job_postings_count,
    median_salary,
    on='month_period'
)

# 6. Plot
fig = go.Figure()

# Job postings line
fig.add_trace(go.Scatter(
    x=df_merged['month_period'],
    y=df_merged['job_postings_count'],
    mode='lines+markers',
    name='Job Postings Count',
    line=dict(color='#1f77b4', width=3),
    yaxis='y1'
))

# Median salary line
fig.add_trace(go.Scatter(
    x=df_merged['month_period'],
    y=df_merged['median_salary'],
    mode='lines+markers',
    name='Median Salary (EUR)',
    line=dict(color='#ff7f0e', width=3),
    yaxis='y2'
))

# Layout with dual axes
fig.update_layout(
    title="Data Analyst: Job Postings and Median Salary Over Time",
    xaxis=dict(title='Month'),
    yaxis=dict(title='Job Postings Count', side='left'),
    yaxis2=dict(title='Median Salary (EUR)', overlaying='y', side='right'),
    legend=dict(x=0.1, y=1.1, orientation='h'),
    height=450,
    width=900,
    template='plotly_white',
    margin=dict(t=70, b=40)
)

fig.show()

# Correlation between number of postings and job salaries

In [28]:
# Ensure correct datetime and extract month
df['job_posted_date'] = pd.to_datetime(df['job_posted_date'])
df['month_period'] = df['job_posted_date'].dt.to_period('M').astype(str)

# Filter top roles if needed
df_filtered = df[df['job_title_short'].isin(top3_roles)].copy()

# 1. Job postings count per month
job_postings_count = (
    df_filtered.groupby('month_period')
    .size()
    .reset_index(name='job_postings_count')
    .sort_values('month_period')
    .reset_index(drop=True)
)

# 2. Median salary per month
median_salary = (
    df_filtered.groupby('month_period')['salary_month_avg_eur']
    .median()
    .reset_index(name='median_salary')
    .sort_values('month_period')
    .reset_index(drop=True)
)

# 3. Next month salary: shift median salary up by 1
median_salary['median_salary_next_month'] = median_salary['median_salary'].shift(-1)

# 4. Merge counts with salaries
df_corr = pd.merge(
    job_postings_count,
    median_salary[['month_period', 'median_salary', 'median_salary_next_month']],
    on='month_period'
)

# 5. Drop last row (no next month salary)
df_corr = df_corr.dropna()

# 6. Calculate correlations
corr_current = df_corr['job_postings_count'].corr(df_corr['median_salary'])
corr_next = df_corr['job_postings_count'].corr(df_corr['median_salary_next_month'])

# 7. Print results
print(f"Correlation between job postings and current month median salary: {corr_current:.3f}")
print(f"Correlation between job postings and next month median salary: {corr_next:.3f}")

Correlation between job postings and current month median salary: nan
Correlation between job postings and next month median salary: nan


Interpretation
This is a moderate to strong positive correlation.
It suggests that when there are more job postings in a given month, the average salary tends to be higher in the following month.
This could indicate a lagged effect where increased hiring demand signals or even drives up salary offers shortly after.
It might reflect market dynamics like salary negotiations catching up after spikes in demand, or employers responding to talent shortages by raising salaries in the near term.
What could cause this pattern?
Hiring cycles: Companies may ramp up postings first, then adjust salary offers in response.
Budget timing: Salary budgets or offers might be updated shortly after observing market demand.
Market signals: A surge in postings could reflect competitive pressures that lead to salary increases soon after.