## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2010. The Number of Seniors and Juniors to Join the Company II (Hard)**

**Table: Candidates**

| Column Name | Type |
|-------------|------|
| employee_id | int  |
| experience  | enum |
| salary      | int  |

employee_id is the column with unique values for this table.
experience is an ENUM (category) of types ('Senior', 'Junior').
Each row of this table indicates the id of a candidate, their monthly salary, and their experience.
The salary of each candidate is guaranteed to be unique.
 
A company wants to hire new employees. The budget of the company for the salaries is $70000. The company's criteria for hiring are:
- Keep hiring the senior with the smallest salary until you cannot hire any more seniors.
- Use the remaining budget to hire the junior with the smallest salary.
- Keep hiring the junior with the smallest salary until you cannot hire any more juniors.

**Write a solution to find the ids of seniors and juniors hired under the mentioned criteria.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:**

**Candidates table:**

| employee_id | experience | salary |
|-------------|------------|--------|
| 1           | Junior     | 10000  |
| 9           | Junior     | 15000  |
| 2           | Senior     | 20000  |
| 11          | Senior     | 16000  |
| 13          | Senior     | 50000  |
| 4           | Junior     | 40000  |

**Output:** 
| employee_id |
|-------------|
| 11          |
| 2           |
| 1           |
| 9           |

**Explanation:** 
- We can hire 2 seniors with IDs (11, 2). Since the budget is $70000 and the sum of their salaries is $36000, we still have $34000 but they are not enough to hire the senior candidate with ID 13.
- We can hire 2 juniors with IDs (1, 9). Since the remaining budget is $34000 and the sum of their salaries is $25000, we still have $9000 but they are not enough to hire the junior candidate with ID 4.

**Example 2:**

**Input:**

**Candidates table:**

| employee_id | experience | salary |
|-------------|------------|--------|
| 1           | Junior     | 25000  |
| 9           | Junior     | 10000  |
| 2           | Senior     | 85000  |
| 11          | Senior     | 80000  |
| 13          | Senior     | 90000  |
| 4           | Junior     | 30000  |

**Output:** 
| employee_id |
|-------------|
| 9           |
| 1           |
| 4           |

**Explanation:** 
- We cannot hire any seniors with the current budget as we need at least $80000 to hire one senior.
- We can hire all three juniors with the remaining budget.

In [0]:
candidates_data_2010 = [
    (1, "Junior", 10000),
    (9, "Junior", 15000),
    (2, "Senior", 20000),
    (11, "Senior", 16000),
    (13, "Senior", 50000),
    (4, "Junior", 40000),
]

candidates_columns_2010 = ["employee_id", "experience", "salary"]
candidates_df_2010 = spark.createDataFrame(candidates_data_2010, candidates_columns_2010)
candidates_df_2010.show()

+-----------+----------+------+
|employee_id|experience|salary|
+-----------+----------+------+
|          1|    Junior| 10000|
|          9|    Junior| 15000|
|          2|    Senior| 20000|
|         11|    Senior| 16000|
|         13|    Senior| 50000|
|          4|    Junior| 40000|
+-----------+----------+------+



In [0]:
budget = 70000

In [0]:
seniors_df_2010 = candidates_df_2010\
                    .filter(col("experience") == "Senior")\
                        .orderBy("salary")\
                            .withColumn("cum_salary", sum("salary").over(Window.orderBy("salary")))\
                                .filter(col("cum_salary") <= budget)



In [0]:
remaining_budget = budget - seniors_df_2010.agg({"salary": "sum"}).collect()[0][0]



In [0]:
juniors_df_2010 = candidates_df_2010\
                        .filter(col("experience") == "Junior")\
                            .orderBy("salary")\
                                .withColumn("cum_salary", sum("salary").over(Window.orderBy("salary")))\
                                    .filter(col("cum_salary") <= remaining_budget)



In [0]:
seniors_df_2010\
    .unionByName(juniors_df_2010)\
        .select("employee_id").show()



+-----------+
|employee_id|
+-----------+
|         11|
|          2|
|          1|
|          9|
+-----------+

