## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3482. Analyze Organization Hierarchy (Hard)**

**Table: Employees**

| Column Name    | Type    | 
|----------------|---------|
| employee_id    | int     |
| employee_name  | varchar |
| manager_id     | int     |
| salary         | int     |
| department     | varchar |

employee_id is the unique key for this table.
Each row contains information about an employee, including their ID, name, their manager's ID, salary, and department.
manager_id is null for the top-level manager (CEO).

**Write a solution to analyze the organizational hierarchy and answer the following:**
- **Hierarchy Levels:** For each employee, determine their level in the organization (CEO is level 1, employees reporting directly to the CEO are level 2, and so on).
- **Team Size:** For each employee who is a manager, count the total number of employees under them (direct and indirect reports).
- **Salary Budget:** For each manager, calculate the total salary budget they control (sum of salaries of all employees under them, including indirect reports, plus their own salary).

Return the result table ordered by the result ordered by level in ascending order, then by budget in descending order, and finally by employee_name in ascending order.

The result format is in the following example.

**Example:**

**Input:**

**Employees table:**

| employee_id | employee_name | manager_id | salary | department  |
|-------------|---------------|------------|--------|-------------|
| 1           | Alice         | null       | 12000  | Executive   |
| 2           | Bob           | 1          | 10000  | Sales       |
| 3           | Charlie       | 1          | 10000  | Engineering |
| 4           | David         | 2          | 7500   | Sales       |
| 5           | Eva           | 2          | 7500   | Sales       |
| 6           | Frank         | 3          | 9000   | Engineering |
| 7           | Grace         | 3          | 8500   | Engineering |
| 8           | Hank          | 4          | 6000   | Sales       |
| 9           | Ivy           | 6          | 7000   | Engineering |
| 10          | Judy          | 6          | 7000   | Engineering |

**Output:**

| employee_id | employee_name | level | team_size | budget |
|-------------|---------------|-------|-----------|--------|
| 1           | Alice         | 1     | 9         | 84500  |
| 3           | Charlie       | 2     | 4         | 41500  |
| 2           | Bob           | 2     | 3         | 31000  |
| 6           | Frank         | 3     | 2         | 23000  |
| 4           | David         | 3     | 1         | 13500  |
| 7           | Grace         | 3     | 0         | 8500   |
| 5           | Eva           | 3     | 0         | 7500   |
| 9           | Ivy           | 4     | 0         | 7000   |
| 10          | Judy          | 4     | 0         | 7000   |
| 8           | Hank          | 4     | 0         | 6000   |

**Explanation:**

**Organization Structure:**
- Alice (ID: 1) is the CEO (level 1) with no manager
- Bob (ID: 2) and Charlie (ID: 3) report directly to Alice (level 2)
- David (ID: 4), Eva (ID: 5) report to Bob, while Frank (ID: 6) and Grace (ID: 7) report to Charlie (level 3)
- Hank (ID: 8) reports to David, and Ivy (ID: 9) and Judy (ID: 10) report to Frank (level 4)

**Level Calculation:**
- The CEO (Alice) is at level 1
- Each subsequent level of management adds 1 to the level

**Team Size Calculation:**
- Alice has 9 employees under her (the entire company except herself)
- Bob has 3 employees (David, Eva, and Hank)
- Charlie has 4 employees (Frank, Grace, Ivy, and Judy)
- David has 1 employee (Hank)
- Frank has 2 employees (Ivy and Judy)
- Eva, Grace, Hank, Ivy, and Judy have no direct reports (team_size = 0)

**Budget Calculation:**
- Alice's budget: Her salary (12000) + all employees' salaries (72500) = 84500
- Charlie's budget: His salary (10000) + Frank's budget (23000) + Grace's salary (8500) = 41500
- Bob's budget: His salary (10000) + David's budget (13500) + Eva's salary (7500) = 31000
- Frank's budget: His salary (9000) + Ivy's salary (7000) + Judy's salary (7000) = 23000
- David's budget: His salary (7500) + Hank's salary (6000) = 13500
- Employees with no direct reports have budgets equal to their own salary

**Note:**
- The result is ordered first by level in ascending order
- Within the same level, employees are ordered by budget in descending order then by name in ascending order

In [0]:
employees_data_3482 = [
    (1, "Alice", None, 12000, "Executive"),
    (2, "Bob", 1, 10000, "Sales"),
    (3, "Charlie", 1, 10000, "Engineering"),
    (4, "David", 2, 7500, "Sales"),
    (5, "Eva", 2, 7500, "Sales"),
    (6, "Frank", 3, 9000, "Engineering"),
    (7, "Grace", 3, 8500, "Engineering"),
    (8, "Hank", 4, 6000, "Sales"),
    (9, "Ivy", 6, 7000, "Engineering"),
    (10, "Judy", 6, 7000, "Engineering")
]

employees_columns_3482 = ["employee_id", "employee_name", "manager_id", "salary", "department"]
employees_df_3482 = spark.createDataFrame(employees_data_3482, employees_columns_3482)
employees_df_3482.show()

+-----------+-------------+----------+------+-----------+
|employee_id|employee_name|manager_id|salary| department|
+-----------+-------------+----------+------+-----------+
|          1|        Alice|      NULL| 12000|  Executive|
|          2|          Bob|         1| 10000|      Sales|
|          3|      Charlie|         1| 10000|Engineering|
|          4|        David|         2|  7500|      Sales|
|          5|          Eva|         2|  7500|      Sales|
|          6|        Frank|         3|  9000|Engineering|
|          7|        Grace|         3|  8500|Engineering|
|          8|         Hank|         4|  6000|      Sales|
|          9|          Ivy|         6|  7000|Engineering|
|         10|         Judy|         6|  7000|Engineering|
+-----------+-------------+----------+------+-----------+



In [0]:
level_df_3482 = employees_df_3482\
                    .filter(col("manager_id").isNull()) \
                        .select("employee_id", lit(1).alias("level"))

In [0]:
hierarchy_df_3482 = level_df_3482
next_df_3482 = level_df_3482

In [0]:
while True:
    expanded = employees_df_3482.alias("e").join(
        next_df_3482.alias("m"),
        col("e.manager_id") == col("m.employee_id"),
        "inner"
    ).select(
        col("e.employee_id"),
        (col("m.level") + 1).alias("level")
    )

    if expanded.subtract(hierarchy_df_3482).count() == 0:
        break

    hierarchy_df_3482 = hierarchy_df_3482.union(expanded).distinct()
    next_df_3482 = expanded

employees_with_level_3482 = employees_df_3482\
                                .join(hierarchy_df_3482, "employee_id", "left")

In [0]:
relation_df_3482 = employees_df_3482\
                        .filter(col("manager_id").isNotNull()) \
                            .select(col("manager_id"), col("employee_id"))

In [0]:
closure_df_3482 = relation_df_3482
next_df_3482 = relation_df_3482

In [0]:
while True:
    expanded = next_df_3482.alias("a").join(
        relation_df_3482.alias("b"),
        col("a.employee_id") == col("b.manager_id"),
        "inner"
    ).select(
        col("a.manager_id"),
        col("b.employee_id")
    ).distinct()

    # Stop if no new relationships found
    if expanded.subtract(closure_df_3482).count() == 0:
        break

    closure_df_3482 = closure_df_3482.union(expanded).distinct()
    next_df_3482 = expanded

team_df_3482 = closure_df_3482\
                    .groupBy("manager_id").agg(count("employee_id").alias("team_size"))

In [0]:
budget_df_3482 = closure_df_3482\
                    .join( employees_df_3482.select("employee_id", "salary"),"employee_id","left")\
                        .groupBy("manager_id").agg(sum("salary").alias("sub_salary"))

In [0]:
budget_df_3482 = budget_df_3482\
                    .join( employees_df_3482.select(col("employee_id").alias("manager_id"), "salary"), "manager_id", "left")\
                        .withColumn( "budget", col("sub_salary") + col("salary"))\
                            .select("manager_id", "budget")

In [0]:
employees_with_level_3482 \
    .join(team_df_3482, employees_with_level_3482.employee_id == team_df_3482.manager_id, "left") \
        .join(budget_df_3482, employees_with_level_3482.employee_id == budget_df_3482.manager_id, "left") \
            .select(
                "employee_id",
                "employee_name",
                "level",
                coalesce("team_size", lit(0)).alias("team_size"),
                coalesce("budget", col("salary")).alias("budget")
                ) \
                    .orderBy("level", desc("budget"), "employee_name").display()

employee_id,employee_name,level,team_size,budget
1,Alice,1,9,84500
3,Charlie,2,4,41500
2,Bob,2,3,31000
6,Frank,3,2,23000
4,David,3,1,13500
7,Grace,3,0,8500
5,Eva,3,0,7500
9,Ivy,4,0,7000
10,Judy,4,0,7000
8,Hank,4,0,6000
