# Finding Median Salaries per Company

In data analysis, identifying key statistical measures such as the median can provide valuable insights into employee compensation structures within companies. In this task, we'll determine the median salary for each company, considering unique salary distributions and breaking ties using employee IDs.

You are provided with an **Employee** table containing information about employees, their associated companies, and their salaries.

### Employee Table

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| company     | str |
| salary      | int     |

- **id**: Primary key (unique identifier for each employee).
- **company**: The name of the company the employee belongs to.
- **salary**: The salary of the employee.

Each row in the table represents a single employee's information, including their company affiliation and salary.

### Objective
Write a solution to find the rows that contain the median salary of each company. While calculating the median, when you sort the salaries of the company, break the ties by id.

Return the result table in any order.

## Example

### Input

**Employee Table:**

| id | company | salary |
|----|---------|--------|
| 1  | A       | 2341   |
| 2  | A       | 341    |
| 3  | A       | 15     |
| 4  | A       | 15314  |
| 5  | A       | 451    |
| 6  | A       | 513    |
| 7  | B       | 15     |
| 8  | B       | 13     |
| 9  | B       | 1154   |
| 10 | B       | 1345   |
| 11 | B       | 1221   |
| 12 | B       | 234    |
| 13 | C       | 2345   |
| 14 | C       | 2645   |
| 15 | C       | 2645   |
| 16 | C       | 2652   |
| 17 | C       | 65     |

### Output

| id | company | salary |
|----|---------|--------|
| 5  | A       | 451    |
| 6  | A       | 513    |
| 12 | B       | 234    |
| 9  | B       | 1154   |
| 14 | C       | 2645   |

Let's implement this step-by-step.

In [3]:
import pandas as pd

data = [[1, 'A', 2341], 
        [2, 'A', 341], 
        [3, 'A', 15], 
        [4, 'A', 15314], 
        [5, 'A', 451], 
        [6, 'A', 513], 
        [7, 'B', 15], 
        [8, 'B', 13], 
        [9, 'B', 1154], 
        [10, 'B', 1345], 
        [11, 'B', 1221], 
        [12, 'B', 234], 
        [13, 'C', 2345], 
        [14, 'C', 2645], 
        [15, 'C', 2645], 
        [16, 'C', 2652], 
        [17, 'C', 65]]
employee = pd.DataFrame(data, 
                        columns=['id', 
                                 'company', 
                                 'salary']).astype({'id':'Int64', 
                                                    'company':'object', 
                                                    'salary':'Int64'})

display(employee)

Unnamed: 0,id,company,salary
0,1,A,2341
1,2,A,341
2,3,A,15
3,4,A,15314
4,5,A,451
5,6,A,513
6,7,B,15
7,8,B,13
8,9,B,1154
9,10,B,1345


**Step 1. Sorting the DataFrame**

Sorting Order: Primary: company (alphabetically), Secondary: salary (ascending), Tertiary: id (ascending) to break ties when salaries are equal.
- Sorting ensures that within each company, salaries are ordered from lowest to highest. Sorting by id as a tertiary key breaks ties, ensuring a consistent and reproducible order.

In [5]:
employee = employee.sort_values(by=['company', 'salary', 'id'])

display(employee)

Unnamed: 0,id,company,salary
2,3,A,15
1,2,A,341
4,5,A,451
5,6,A,513
0,1,A,2341
3,4,A,15314
7,8,B,13
6,7,B,15
11,12,B,234
8,9,B,1154


**Step 2. Assigning Ranks Within Each Company**

- Grouping: The groupby('company') groups the DataFrame by the company column.
- Ranking: cumcount() assigns a cumulative count (starting from 0) to each row within its group.
- Adjustment: Adding 1 shifts the count to start from 1 instead of 0.
- Assigning a rank to each employee within their company based on the sorted order facilitates identifying median positions.

In [7]:
employee['rank'] = employee.groupby('company').cumcount() + 1

display(employee)

Unnamed: 0,id,company,salary,rank
2,3,A,15,1
1,2,A,341,2
4,5,A,451,3
5,6,A,513,4
0,1,A,2341,5
3,4,A,15314,6
7,8,B,13,1
6,7,B,15,2
11,12,B,234,3
8,9,B,1154,4


**Step 3. Counting Employees per Company**

- Grouping: Groups the DataFrame by company.
- Counting: Counts the number of employees (id) in each company.
- Transform: The transform("count") ensures that the count is broadcasted back to each row in the original DataFrame.
- Knowing the total number of employees in each company is essential for calculating the median position.
- Midpoint Calculation: Divides the total number of employees in each company by 2 to find the midpoint. The median position depends on whether the number of employees is odd or even. Calculating the midpoint helps in identifying the median positions.

In [9]:
employee["employee_count"] = employee.groupby(["company"])["id"].transform("count")
employee["mid"] = employee["employee_count"]/2

display(employee)

Unnamed: 0,id,company,salary,rank,employee_count,mid
2,3,A,15,1,6,3.0
1,2,A,341,2,6,3.0
4,5,A,451,3,6,3.0
5,6,A,513,4,6,3.0
0,1,A,2341,5,6,3.0
3,4,A,15314,6,6,3.0
7,8,B,13,1,6,3.0
6,7,B,15,2,6,3.0
11,12,B,234,3,6,3.0
8,9,B,1154,4,6,3.0


**Step 4. Filtering Rows to Identify Median Salaries**

- For Even Number of Employees (Companies A and B):
<br>There are two median positions: mid and mid + 1.
<br>Example: Company A has 6 employees. mid = 3.0. Median ranks are 3 and 4.
- For Odd Number of Employees (Company C):
<br>Since employee_count is 5, mid = 2.5, Median rank is 3

In [11]:
employee = employee.loc[(employee['rank'] >= employee['mid']) 
                      & (employee['rank'] <= employee['mid'] + 1)]

display(employee)

Unnamed: 0,id,company,salary,rank,employee_count,mid
4,5,A,451,3,6,3.0
5,6,A,513,4,6,3.0
11,12,B,234,3,6,3.0
8,9,B,1154,4,6,3.0
13,14,C,2645,3,5,2.5


**Step 6. Selecting Relevant Columns for the Final Output**

In [13]:
employee = employee[['id', 'company', 'salary']]

display(employee)

Unnamed: 0,id,company,salary
4,5,A,451
5,6,A,513
11,12,B,234
8,9,B,1154
13,14,C,2645


References:
[1] https://leetcode.com/problems/median-employee-salary/