## Department Salaries

Find the number of male and female employees per department and also their corresponding total salaries.
Output department names along with the corresponding number of female employees, the total salary of female employees, the number of male employees, and the total salary of male employees.

*employee* 

| Column Name      | Type     | Description                                |
| :--------------- | :------- | :----------------------------------------- |
| `id`             | `int64`  | Unique employee identifier                 |
| `first_name`     | `object` | Employee's first name                      |
| `last_name`      | `object` | Employee's last name                       |
| `age`            | `int64`  | Employee's age                             |
| `sex`            | `object` | Employee's gender (e.g., 'Male', 'Female') |
| `employee_title` | `object` | Employee's job title                       |
| `department`     | `object` | Department the employee belongs to         |
| `salary`         | `int64`  | Employee's annual salary                   |
| `target`         | `int64`  | Performance target                         |
| `bonus`          | `int64`  | Bonus amount                               |
| `email`          | `object` | Employee's email address                   |
| `city`           | `object` | Employee's city of residence               |
| `address`        | `object` | Employee's home address                    |
| `manager_id`     | `int64`  | ID of the employee's manager               |

Used Gemini/ChatGPT to create dataset

In [None]:
import pandas as pd
import numpy as np
import random
from faker import Faker

# Setup
fake = Faker()
num_employees = 100

# Seed for reproducibility
random.seed(42)
np.random.seed(42)

# Sample data
titles = ['Software Engineer', 'Data Analyst', 'HR Manager', 'Marketing Lead', 'Sales Associate']
departments = ['Engineering', 'HR', 'Marketing', 'Sales', 'Finance']
sexes = ['Male', 'Female']

# Create employee data
employees = []

for i in range(1, num_employees + 1):
    first = fake.first_name()
    last = fake.last_name()
    full_email = f"{first.lower()}.{last.lower()}@example.com"
    age = np.random.randint(22, 65)
    sex = random.choice(sexes)
    title = random.choice(titles)
    dept = random.choice(departments)
    salary = np.random.randint(40000, 150000)
    target = np.random.randint(50000, 200000)
    bonus = np.random.randint(1000, 20000)
    city = fake.city()
    address = fake.address().replace("\n", ", ")
    manager_id = random.choice(range(1, num_employees + 1)) if i != 1 else None  # Avoid self as manager

    employees.append({
        'id': i,
        'first_name': first,
        'last_name': last,
        'age': age,
        'sex': sex,
        'employee_title': title,
        'department': dept,
        'salary': salary,
        'target': target,
        'bonus': bonus,
        'email': full_email,
        'city': city,
        'address': address,
        'manager_id': manager_id
    })

# Convert to DataFrame
employee= pd.DataFrame(employees)

# Display sample
employee.head()

Unnamed: 0,id,first_name,last_name,age,sex,employee_title,department,salary,target,bonus,email,city,address,manager_id
0,1,Marco,Smith,60,Other,Software Engineer,Engineering,55795,181932,6390,marco.smith@example.com,Samanthaport,"7966 Rosales Crest, New Joseph, TN 27174",
1,2,James,Diaz,64,Other,HR Manager,HR,116820,104886,7265,james.diaz@example.com,Pamelaville,"9797 Johnson Common Suite 113, Caitlinton, SD ...",29.0
2,3,Nicole,Lawrence,40,Male,Software Engineer,Finance,77194,137498,15423,nicole.lawrence@example.com,East Lisaview,"98660 Sampson Ranch, Katiehaven, FL 48243",12.0
3,4,Angela,Lewis,57,Other,Marketing Lead,Engineering,100263,66023,9322,angela.lewis@example.com,Johnbury,"70714 Sean Inlet Suite 088, Williamside, PR 33430",4.0
4,5,David,Hernandez,43,Male,Data Analyst,HR,104820,50769,3433,david.hernandez@example.com,New Brett,"462 Dana Cove Apt. 977, Lake Hannahland, ND 53079",65.0


### Solution

In [None]:
## Keep important columns
employee = employee[['sex','salary','department']]

## Create a flag based on sex
employee['is_male'] =  np.where(employee['sex'] == 'Male',1,0)
employee['is_female'] =  np.where(employee['sex'] == 'Female',1,0)

## Based on sex, separate male and female salary
employee['male_sal'] = np.where(employee['sex'] == 'Male',employee['salary'],0)
employee['female_sal'] = np.where(employee['sex'] == 'Female',employee['salary'],0)

## Drop extra columns that we don't need anymore
employee.drop(columns=['sex','salary'],inplace=True)


employee = employee.groupby('department').sum(['is_male','is_female','male_sal','female_sal']).reset_index()

employee.columns = ['Department','Male Employees','Female Employees','Male Employee Salary','Female Employee Salary']

employee

Unnamed: 0,Department,Male Employees,Female Employees,Male Employee Salary,Female Employee Salary
0,Engineering,11,5,1057834,558710
1,Finance,7,8,743400,832096
2,HR,6,5,619576,451012
3,Marketing,6,8,575874,768124
4,Sales,5,4,445443,455300
