## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3236. CEO Subordinate Hierarchy (Hard)**

**Table: Employees**

| Column Name   | Type    |
|---------------|---------|
| employee_id   | int     |
| employee_name | varchar |
| manager_id    | int     |
| salary        | int     |

employee_id is the unique identifier for this table.
manager_id is the employee_id of the employee's manager. The CEO has a NULL manager_id.

**Write a solution to find subordinates of the CEO (both direct and indirect), along with their level in the hierarchy and their salary difference from the CEO.**

The result should have the following columns:

The query result format is in the following example.
- subordinate_id: The employee_id of the subordinate
- subordinate_name: The name of the subordinate
- hierarchy_level: The level of the subordinate in the hierarchy (1 for direct reports, 2 for their direct reports, and so on)
- salary_difference: The difference between the subordinate's salary and the CEO's salary

Return the result table ordered by hierarchy_level ascending, and then by subordinate_id ascending.

The query result format is in the following example.

**Example:**

**Input:**

**Employees table:**

| employee_id | employee_name  | manager_id | salary  |
|-------------|----------------|------------|---------|
| 1           | Alice          | NULL       | 150000  |
| 2           | Bob            | 1          | 120000  |
| 3           | Charlie        | 1          | 110000  |
| 4           | David          | 2          | 105000  |
| 5           | Eve            | 2          | 100000  |
| 6           | Frank          | 3          | 95000   |
| 7           | Grace          | 3          | 98000   |
| 8           | Helen          | 5          | 90000   |

**Output:**

| subordinate_id | subordinate_name | hierarchy_level  | salary_difference |
|----------------|------------------|------------------|-------------------|
| 2              | Bob              | 1                | -30000            |
| 3              | Charlie          | 1                | -40000            |
| 4              | David            | 2                | -45000            |
| 5              | Eve              | 2                | -50000            |
| 6              | Frank            | 2                | -55000            |
| 7              | Grace            | 2                | -52000            |
| 8              | Helen            | 3                | -60000            |

**Explanation:**

- Bob and Charlie are direct subordinates of Alice (CEO) and thus have a hierarchy_level of 1.
- David and Eve report to Bob, while Frank and Grace report to Charlie, making them second-level subordinates (hierarchy_level 2).
- Helen reports to Eve, making Helen a third-level subordinate (hierarchy_level 3).
- Salary differences are calculated relative to Alice's salary of 150000.
- The result is ordered by hierarchy_level ascending, and then by subordinate_id ascending.

**Note:** The output is ordered first by hierarchy_level in ascending order, then by subordinate_id in ascending order.


In [0]:
employees_data_3236 = [
    (1, "Alice", None, 150000),
    (2, "Bob", 1, 120000),
    (3, "Charlie", 1, 110000),
    (4, "David", 2, 105000),
    (5, "Eve", 2, 100000),
    (6, "Frank", 3, 95000),
    (7, "Grace", 3, 98000),
    (8, "Helen", 5, 90000)
]

employees_columns_3236 = ["employee_id", "employee_name", "manager_id", "salary"]
employees_df_3236 = spark.createDataFrame(employees_data_3236, employees_columns_3236)
employees_df_3236.show()


+-----------+-------------+----------+------+
|employee_id|employee_name|manager_id|salary|
+-----------+-------------+----------+------+
|          1|        Alice|      NULL|150000|
|          2|          Bob|         1|120000|
|          3|      Charlie|         1|110000|
|          4|        David|         2|105000|
|          5|          Eve|         2|100000|
|          6|        Frank|         3| 95000|
|          7|        Grace|         3| 98000|
|          8|        Helen|         5| 90000|
+-----------+-------------+----------+------+



In [0]:
ceo_3236 = employees_df_3236\
                .filter(col("manager_id").isNull()).collect()[0]
ceo_id_3236 = ceo_3236["employee_id"]
ceo_salary_3236 = ceo_3236["salary"]

In [0]:
level1 = employees_df_3236\
            .filter(col("manager_id") == ceo_id_3236) \
                .withColumn("hierarchy_level", lit(1)) \
                    .withColumn("salary_difference", col("salary") - lit(ceo_salary_3236))

In [0]:
emp = employees_df_3236.alias("emp")
lvl1 = level1.alias("lvl1")
level2 = emp.join(lvl1, emp.manager_id == lvl1.employee_id) \
    .withColumn("hierarchy_level", lit(2)) \
    .withColumn("salary_difference", col("emp.salary") - lit(ceo_salary_3236)) \
    .select(col("emp.employee_id"), col("emp.employee_name"), "hierarchy_level", "salary_difference")

In [0]:
lvl2 = level2.alias("lvl2")
level3 = emp.join(lvl2, emp.manager_id == lvl2.employee_id) \
    .withColumn("hierarchy_level", lit(3)) \
    .withColumn("salary_difference", col("emp.salary") - lit(ceo_salary_3236)) \
    .select(col("emp.employee_id"), col("emp.employee_name"), "hierarchy_level", "salary_difference")

In [0]:
level1\
    .select(col("employee_id"), col("employee_name"), "hierarchy_level", "salary_difference") \
        .union(level2) \
            .union(level3) \
                .withColumnRenamed("employee_id", "subordinate_id") \
                    .withColumnRenamed("employee_name", "subordinate_name") \
                        .orderBy("hierarchy_level", "subordinate_id").display()

subordinate_id,subordinate_name,hierarchy_level,salary_difference
2,Bob,1,-30000
3,Charlie,1,-40000
4,David,2,-45000
5,Eve,2,-50000
6,Frank,2,-55000
7,Grace,2,-52000
8,Helen,3,-60000
