# Identifying High-Earning Employees Across Departments Using Pandas

In today's data-driven world, organizations strive to gain insights from their datasets to make informed decisions. One common analytical task is identifying top performers or high earners within various departments of a company. In this blog post, we'll explore how to determine the highest-earning employees in each department using Python's Pandas library. We'll walk through the problem statement, understand the data structure, and implement a step-by-step solution.

Imagine you have access to two tables in a company's database:

### Employee Table

- **Columns**: `id`, `name`, `salary`, `departmentId`
- **Description**: Each row represents an employee, including their unique ID, name, salary, and the department they belong to.

### Department Table

- **Columns**: `id`, `name`
- **Description**: Each row represents a department within the company, identified by a unique ID and its name.

The company's executives are interested in identifying the top earners in each department. Specifically, a *high earner* is defined as an employee whose salary is among the top three unique salaries within their department.

### Example

**Employee Table:**

| id | name  | salary | departmentId |
|----|-------|--------|--------------|
| 1  | Joe   | 85000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
| 7  | Will  | 70000  | 1            |

**Department Table:**

| id | name   |
|----|--------|
| 1  | IT     |
| 2  | Sales  |

**Desired Output:**

| Department | Employee | Salary |
|------------|----------|--------|
| IT         | Max      | 90000  |
| IT         | Joe      | 85000  |
| IT         | Randy    | 85000  |
| IT         | Will     | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |

**Explanation:**

- **In the IT department:**
  - **Max** has the highest salary.
  - **Joe** and **Randy** share the second-highest salary.
  - **Will** has the third-highest salary.

- **In the Sales department:**
  - **Henry** has the highest salary.
  - **Sam** has the second-highest salary.
  - There's no third-highest salary since there are only two employees.

## Solution Overview

To solve this problem, we'll perform the following steps using Pandas:

1. **Data Preparation**: Merge the `Employee` and `Department` tables to associate each employee with their respective department name.
2. **Ranking Salaries**: For each department, rank the salaries in descending order, ensuring that only unique salaries are considered.
3. **Filtering Top Earners**: Select employees whose salaries rank within the top three unique salaries of their department.
4. **Final Output**: Present the filtered data with relevant columns.

Let's delve into each step with corresponding code snippets.


In [2]:
import pandas as pd

data = [[1, 'Joe', 85000, 1], 
        [2, 'Henry', 80000, 2], 
        [3, 'Sam', 60000, 2], 
        [4, 'Max', 90000, 1], 
        [5, 'Janet', 69000, 1], 
        [6, 'Randy', 85000, 1], 
        [7, 'Will', 70000, 1]]
employee = pd.DataFrame(data, 
                        columns=['id', 
                                 'name', 
                                 'salary', 
                                 'departmentId']).astype({'id':'Int64', 
                                                          'name':'object', 
                                                          'salary':'Int64', 
                                                          'departmentId':'Int64'})
display(employee)

data = [[1, 'IT'], 
        [2, 'Sales']]
department = pd.DataFrame(data, columns=['id', 
                                         'name']).astype({'id':'Int64', 
                                                          'name':'object'})
display(department)

Unnamed: 0,id,name,salary,departmentId
0,1,Joe,85000,1
1,2,Henry,80000,2
2,3,Sam,60000,2
3,4,Max,90000,1
4,5,Janet,69000,1
5,6,Randy,85000,1
6,7,Will,70000,1


Unnamed: 0,id,name
0,1,IT
1,2,Sales


**Step 1: Merging Employee and Department Tables**
<br>To associate each employee with their department name, we'll perform a merge operation.

In [4]:
df = pd.merge(employee, 
              department, 
              right_on ='id', 
              left_on='departmentId').rename(columns={"name_x": "Employee", 
                                                      "name_y": "Department", 
                                                      'salary':'Salary'})
display(df)

Unnamed: 0,id_x,Employee,Salary,departmentId,id_y,Department
0,1,Joe,85000,1,1,IT
1,4,Max,90000,1,1,IT
2,5,Janet,69000,1,1,IT
3,6,Randy,85000,1,1,IT
4,7,Will,70000,1,1,IT
5,2,Henry,80000,2,2,Sales
6,3,Sam,60000,2,2,Sales


**Step 2: Ranking Salaries Within Each Department**
<br>We'll assign a rank to each employee's salary within their department. The dense ranking method ensures that ranks are consecutive, even if there are ties.

In [6]:
df['Rank'] = df.groupby(['Department'])['Salary'].rank(method='dense', ascending=False)
display(df)

Unnamed: 0,id_x,Employee,Salary,departmentId,id_y,Department,Rank
0,1,Joe,85000,1,1,IT,2.0
1,4,Max,90000,1,1,IT,1.0
2,5,Janet,69000,1,1,IT,4.0
3,6,Randy,85000,1,1,IT,2.0
4,7,Will,70000,1,1,IT,3.0
5,2,Henry,80000,2,2,Sales,1.0
6,3,Sam,60000,2,2,Sales,2.0


**Step 3: Filtering Top Three Earners**
<br>Next, we'll filter the employees to retain only those with a rank of 3 or less, i.e., within the top three unique salaries of their department.

In [8]:
df = df[df['Rank'] <=3]
display(df)

Unnamed: 0,id_x,Employee,Salary,departmentId,id_y,Department,Rank
0,1,Joe,85000,1,1,IT,2.0
1,4,Max,90000,1,1,IT,1.0
3,6,Randy,85000,1,1,IT,2.0
4,7,Will,70000,1,1,IT,3.0
5,2,Henry,80000,2,2,Sales,1.0
6,3,Sam,60000,2,2,Sales,2.0


**Step 4: Selecting Relevant Columns**

Finally, we'll select and display only the pertinent columns: `Department`, `Employee`, and `Salary`.

In [10]:
df = df[['Department', 'Employee', 'Salary']]
display(df)

Unnamed: 0,Department,Employee,Salary
0,IT,Joe,85000
1,IT,Max,90000
3,IT,Randy,85000
4,IT,Will,70000
5,Sales,Henry,80000
6,Sales,Sam,60000


By leveraging Pandas' powerful data manipulation capabilities, we efficiently identified the top three unique earners in each department. This approach can be extended to various other analytical tasks, such as identifying top-performing products, highest-rated services, or leading contributors in different teams.

Understanding how to manipulate and analyze data is crucial for deriving meaningful insights that drive business decisions. Whether you're a data analyst, a business intelligence professional, or simply someone interested in data science, mastering these techniques will enhance your ability to interpret and present data effectively.

Happy analyzing!

References:
[1] https://leetcode.com/problems/department-top-three-salaries/description/