# IBM Employee Salary and Tenure Analysis

This notebook recreates the analysis from Chapter 1 of "Data Science from Scratch" by Joel Grus, applying it to IBM employee data. We analyze the relationship between employee salaries and their tenure at the company.

## Import Required Libraries

Load pandas for data manipulation and analysis.

In [None]:
import pandas as pd

## Load Employee Data

Read the IBM HR employee attrition dataset from CSV file.

In [None]:
df_employees = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')
df_employees.head()

## Extract Salary and Tenure Data

Extract monthly income and years at company into separate lists for analysis.

In [None]:
monthly_salaries = list(df_employees['MonthlyIncome'])
print(monthly_salaries)
employee_tenures = list(df_employees['YearsAtCompany'])
print(employee_tenures)

## Create Salary-Tenure Pairs

Combine salary and tenure data into paired tuples for easier analysis.

In [None]:
salary_tenure_pairs = list(zip(monthly_salaries, employee_tenures))
salary_tenure_pairs

## Group Salaries by Tenure

Create a dictionary mapping each tenure value to all salaries of employees with that tenure.

In [None]:
from collections import defaultdict
salary_by_tenure = defaultdict(list)
for salary_amount, tenure_years in salary_tenure_pairs:
    salary_by_tenure[tenure_years].append(salary_amount)

## Calculate Average Salary by Tenure

Compute the average salary for each tenure value and sort by salary in descending order.

In [None]:
avg_salary_by_tenure = {
    tenure_years: sum(salaries) / len(salaries)
    for tenure_years, salaries in salary_by_tenure.items()
}
avg_salary_by_tenure = sorted(avg_salary_by_tenure.items(), key=lambda x: x[1], reverse=True)
print(avg_salary_by_tenure)

## Define Tenure Buckets

Create a function that categorizes tenure into predefined ranges for aggregate analysis.

In [None]:
def tenure_to_bucket(tenure_years):
    """
    Convert tenure in years to a tenure bracket.
    
    Args:
        tenure_years: Number of years employed
    
    Returns:
        A string representing the tenure bracket
    """
    if tenure_years < 2:
        return "less than two years"
    elif tenure_years < 5:
        return "between two and five years"
    elif tenure_years < 10:
        return "between five and ten years"
    elif tenure_years < 18:
        return "between ten and eighteen years"
    elif tenure_years < 25:
        return "between eighteen and twenty-five years"
    else:
        return "more than twenty-five years"

## Group Salaries by Tenure Bucket

Organize all salaries by their corresponding tenure brackets.

In [None]:
salary_by_bucket = defaultdict(list)
for tenure_years, salary_list in salary_by_tenure.items():
    salary_by_bucket[tenure_to_bucket(tenure_years)].extend(salary_list)

## Calculate Average Salary by Tenure Bucket

Compute the average salary for each tenure bracket and display results in descending order.

In [None]:
avg_salary_by_bucket = {
    bucket: sum(salaries) / len(salaries)
    for bucket, salaries in salary_by_bucket.items()
}
print(sorted(avg_salary_by_bucket.items(), key=lambda x: x[1], reverse=True))