## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2004. The Number of Seniors and Juniors to Join the Company (Hard)**

**Table: Candidates**

| Column Name | Type |
|-------------|------|
| employee_id | int  |
| experience  | enum |
| salary      | int  |

employee_id is the column with unique values for this table.
experience is an ENUM (category) type of values ('Senior', 'Junior').
Each row of this table indicates the id of a candidate, their monthly salary, and their experience.
 
A company wants to hire new employees. The budget of the company for the salaries is $70000. The company's criteria for hiring are:
- Hiring the largest number of seniors.
- After hiring the maximum number of seniors, use the remaining budget to hire the largest number of juniors.

**Write a solution to find the number of seniors and juniors hired under the mentioned criteria.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Candidates table:**

| employee_id | experience | salary |
|-------------|------------|--------|
| 1           | Junior     | 10000  |
| 9           | Junior     | 10000  |
| 2           | Senior     | 20000  |
| 11          | Senior     | 20000  |
| 13          | Senior     | 50000  |
| 4           | Junior     | 40000  |

**Output:** 
| experience | accepted_candidates |
|------------|---------------------|
| Senior     | 2                   |
| Junior     | 2                   |

**Explanation:** 
- We can hire 2 seniors with IDs (2, 11). Since the budget is $70000 and the sum of their salaries is $40000, we still have $30000 but they are not enough to hire the senior candidate with ID 13.
- We can hire 2 juniors with IDs (1, 9). Since the remaining budget is $30000 and the sum of their salaries is $20000, we still have $10000 but they are not enough to hire the junior candidate with ID 4.

**Example 2:**

**Input:** 

**Candidates table:**

| employee_id | experience | salary |
|-------------|------------|--------|
| 1           | Junior     | 10000  |
| 9           | Junior     | 10000  |
| 2           | Senior     | 80000  |
| 11          | Senior     | 80000  |
| 13          | Senior     | 80000  |
| 4           | Junior     | 40000  |

**Output:** 
| experience | accepted_candidates |
|------------|---------------------|
| Senior     | 0                   |
| Junior     | 3                   |

**Explanation:** 
- We cannot hire any seniors with the current budget as we need at least $80000 to hire one senior.
- We can hire all three juniors with the remaining budget.

In [0]:
candidates_data_2004 = [
    (1, "Junior", 10000),
    (9, "Junior", 10000),
    (2, "Senior", 20000),
    (11, "Senior", 20000),
    (13, "Senior", 50000),
    (4, "Junior", 40000),
]

candidates_columns_2004 = ["employee_id", "experience", "salary"]
candidates_df_2004 = spark.createDataFrame(candidates_data_2004, candidates_columns_2004)
candidates_df_2004.show()

+-----------+----------+------+
|employee_id|experience|salary|
+-----------+----------+------+
|          1|    Junior| 10000|
|          9|    Junior| 10000|
|          2|    Senior| 20000|
|         11|    Senior| 20000|
|         13|    Senior| 50000|
|          4|    Junior| 40000|
+-----------+----------+------+



In [0]:
budget = 70000

In [0]:
senior_df_2004 = candidates_df_2004.filter("experience = 'Senior'").orderBy("salary", "employee_id")
junior_df_2004 = candidates_df_2004.filter("experience = 'Junior'").orderBy("salary", "employee_id")

In [0]:
windowSpec = Window.orderBy("salary", "employee_id")

In [0]:
seniors_hired_df_2004 = senior_df_2004\
                            .withColumn( "cumulative_salary", sum("salary").over(windowSpec))\
                                .withColumn( "accepted", when(col("cumulative_salary") <= budget, 1).otherwise(0))\
                                    .filter("accepted = 1")



In [0]:
spent_on_seniors = seniors_hired_df_2004.agg(sum("salary")).collect()[0][0] or 0
remaining_budget = budget - spent_on_seniors



In [0]:
juniors_hired_df_2004 = junior_df_2004\
                            .withColumn( "cumulative_salary", sum("salary").over(windowSpec))\
                                .withColumn( "accepted", when(col("cumulative_salary") <= remaining_budget, 1).otherwise(0))\
                                    .filter("accepted = 1")




In [0]:
final_hired_df_2004 = seniors_hired_df_2004.unionByName(juniors_hired_df_2004)

In [0]:
final_hired_df_2004\
    .groupBy("experience")\
        .agg(count("employee_id").alias("accepted_candidates"))\
            .orderBy("experience")\
                .show()



+----------+-------------------+
|experience|accepted_candidates|
+----------+-------------------+
|    Junior|                  2|
|    Senior|                  2|
+----------+-------------------+

