Q.10. A code to generate a synthetic data is provided to you which showcases Employee Performance Dataset where we simulate 1,000 employees with features like age, department, years of experience, performance score, salary, training hours etc.

Visualize the following KPIs and gather insights from it:

- Generate a visualization of distribution of experience and find out how many employees fall under 0-5 years of experience.

- Out of several unique department categories, find:

      which department provides highest salary and  

      which department provides lowest salary  

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Number of employees
n = 1000

# Simulate dataset
employee_data = pd.DataFrame({
    'EmployeeID': np.arange(1, n+1),
    'Age': np.random.normal(loc=35, scale=8, size=n).astype(int),
    'Department': np.random.choice(['Sales', 'HR', 'Engineering', 'Finance', 'Marketing'],
                                   size=n, p=[0.25,0.1,0.35,0.15,0.15]),
    'ExperienceYears': np.random.exponential(scale=5, size=n).astype(int),
    'MonthlySalary': np.random.normal(loc=50000, scale=15000, size=n).astype(int),
    'PerformanceScore': np.random.choice([1,2,3,4,5], size=n, p=[0.05,0.1,0.4,0.3,0.15]),
    'AnnualTrainingHours': np.random.randint(5, 80, size=n),
    'PromotionLast2Yrs': np.random.choice(['Yes','No'], size=n, p=[0.15,0.85]),
    # NEW KPIs
    'WorkLifeBalanceScore': np.random.choice([1,2,3,4,5], size=n, p=[0.05,0.1,0.25,0.4,0.2]),
    'OvertimeHours': np.random.randint(0, 50, size=n),
    'LeavesTaken': np.random.randint(0, 25, size=n)
})

# Clip to realistic bounds
employee_data['Age'] = employee_data['Age'].clip(22, 60)
employee_data['ExperienceYears'] = employee_data['ExperienceYears'].clip(0, 35)
employee_data['MonthlySalary'] = employee_data['MonthlySalary'].clip(20000, 120000)

What this dataset includes

Age → Normally distributed around 35 (clipped 22–60)

Department → Sales, HR, Engineering, Finance, Marketing

ExperienceYears → Exponentially distributed (more juniors, fewer seniors)

MonthlySalary → Normal distribution but correlated with experience

PerformanceScore → Categorical (1–5) with more mid-range performers

AnnualTrainingHours → Random (5–80 hours)

PromotionLast2Yrs → Yes/No (biased towards No)