## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3188. Find Top Scoring Students II (Hard)**

**Table: students**

| Column Name | Type     | 
|-------------|----------|
| student_id  | int      |
| name        | varchar  |
| major       | varchar  |

student_id is the primary key for this table. 
Each row contains the student ID, student name, and their major.

**Table: courses**

| Column Name | Type              |       
|-------------|-------------------|
| course_id   | int               |    
| name        | varchar           |      
| credits     | int               |           
| major       | varchar           |       
| mandatory   | enum              |      

course_id is the primary key for this table. 
mandatory is an enum type of ('Yes', 'No').
Each row contains the course ID, course name, credits, major it belongs to, and whether the course is mandatory.

**Table: enrollments**

| Column Name | Type     | 
|-------------|----------|
| student_id  | int      |
| course_id   | int      |
| semester    | varchar  |
| grade       | varchar  |
| GPA         | decimal  | 

(student_id, course_id, semester) is the primary key (combination of columns with unique values) for this table.
Each row contains the student ID, course ID, semester, and grade received.

**Write a solution to find the students who meet the following criteria:**
- Have taken all mandatory courses and at least two elective courses offered in their major.
- Achieved a grade of A in all mandatory courses and at least B in elective courses.
- Maintained an average GPA of at least 2.5 across all their courses (including those outside their major).

Return the result table ordered by student_id in ascending order.

**Example:**

**Input:**

**students table:**

| student_id | name             | major            |
|------------|------------------|------------------|
| 1          | Alice            | Computer Science |
| 2          | Bob              | Computer Science |
| 3          | Charlie          | Mathematics      |
| 4          | David            | Mathematics      |
 
**courses table:**

| course_id | name              | credits | major            | mandatory|
|-----------|-------------------|---------|------------------|----------|
| 101       | Algorithms        | 3       | Computer Science | yes      |
| 102       | Data Structures   | 3       | Computer Science | yes      |
| 103       | Calculus          | 4       | Mathematics      | yes      |
| 104       | Linear Algebra    | 4       | Mathematics      | yes      |
| 105       | Machine Learning  | 3       | Computer Science | no       |
| 106       | Probability       | 3       | Mathematics      | no       |
| 107       | Operating Systems | 3       | Computer Science | no       |
| 108       | Statistics        | 3       | Mathematics      | no       |
 
**enrollments table:**

| student_id | course_id | semester    | grade | GPA |
|------------|-----------|-------------|-------|-----|
| 1          | 101       | Fall 2023   | A     | 4.0 |
| 1          | 102       | Spring 2023 | A     | 4.0 |
| 1          | 105       | Spring 2023 | A     | 4.0 |
| 1          | 107       | Fall 2023   | B     | 3.5 |
| 2          | 101       | Fall 2023   | A     | 4.0 |
| 2          | 102       | Spring 2023 | B     | 3.0 |
| 3          | 103       | Fall 2023   | A     | 4.0 |
| 3          | 104       | Spring 2023 | A     | 4.0 |
| 3          | 106       | Spring 2023 | A     | 4.0 |
| 3          | 108       | Fall 2023   | B     | 3.5 |
| 4          | 103       | Fall 2023   | B     | 3.0 |
| 4          | 104       | Spring 2023 | B     | 3.0 |

 
**Output:**

| student_id |
|------------|
| 1          |
| 3          |

**Explanation:**
- Alice (student_id 1) is a Computer Science major and has taken both Algorithms and Data Structures, receiving an A in both. She has also taken Machine Learning and Operating Systems as electives, receiving an A and B respectively.
- Bob (student_id 2) is a Computer Science major but did not receive an A in all required courses.
- Charlie (student_id 3) is a Mathematics major and has taken both Calculus and Linear Algebra, receiving an A in both. He has also taken Probability and Statistics as electives, receiving an A and B respectively.
- David (student_id 4) is a Mathematics major but did not receive an A in all required courses.

**Note:** Output table is ordered by student_id in ascending order.

In [0]:
students_data_3188 = [
    (1, "Alice", "Computer Science"),
    (2, "Bob", "Computer Science"),
    (3, "Charlie", "Mathematics"),
    (4, "David", "Mathematics")
]

students_columns_3188 = ["student_id", "name", "student_major"]
students_df_3188 = spark.createDataFrame(students_data_3188, students_columns_3188)
students_df_3188.show()

courses_data_3188 = [
    (101, "Algorithms", 3, "Computer Science", "yes"),
    (102, "Data Structures", 3, "Computer Science", "yes"),
    (103, "Calculus", 4, "Mathematics", "yes"),
    (104, "Linear Algebra", 4, "Mathematics", "yes"),
    (105, "Machine Learning", 3, "Computer Science", "no"),
    (106, "Probability", 3, "Mathematics", "no"),
    (107, "Operating Systems", 3, "Computer Science", "no"),
    (108, "Statistics", 3, "Mathematics", "no")
]

courses_columns_3188 = ["course_id", "name", "credits", "course_major", "mandatory"]
courses_df_3188 = spark.createDataFrame(courses_data_3188, courses_columns_3188)
courses_df_3188.show()

enrollments_data_3188 = [
    (1, 101, "Fall 2023", "A", 4.0),
    (1, 102, "Spring 2023", "A", 4.0),
    (1, 105, "Spring 2023", "A", 4.0),
    (1, 107, "Fall 2023", "B", 3.5),
    (2, 101, "Fall 2023", "A", 4.0),
    (2, 102, "Spring 2023", "B", 3.0),
    (3, 103, "Fall 2023", "A", 4.0),
    (3, 104, "Spring 2023", "A", 4.0),
    (3, 106, "Spring 2023", "A", 4.0),
    (3, 108, "Fall 2023", "B", 3.5),
    (4, 103, "Fall 2023", "B", 3.0),
    (4, 104, "Spring 2023", "B", 3.0)
]

enrollments_columns_3188 = ["student_id", "course_id", "semester", "grade", "GPA"]
enrollments_df_3188 = spark.createDataFrame(enrollments_data_3188, enrollments_columns_3188)
enrollments_df_3188.show()


+----------+-------+----------------+
|student_id|   name|   student_major|
+----------+-------+----------------+
|         1|  Alice|Computer Science|
|         2|    Bob|Computer Science|
|         3|Charlie|     Mathematics|
|         4|  David|     Mathematics|
+----------+-------+----------------+

+---------+-----------------+-------+----------------+---------+
|course_id|             name|credits|    course_major|mandatory|
+---------+-----------------+-------+----------------+---------+
|      101|       Algorithms|      3|Computer Science|      yes|
|      102|  Data Structures|      3|Computer Science|      yes|
|      103|         Calculus|      4|     Mathematics|      yes|
|      104|   Linear Algebra|      4|     Mathematics|      yes|
|      105| Machine Learning|      3|Computer Science|       no|
|      106|      Probability|      3|     Mathematics|       no|
|      107|Operating Systems|      3|Computer Science|       no|
|      108|       Statistics|      3|     Mat

In [0]:
enroll_courses_3188 = enrollments_df_3188\
                            .join(courses_df_3188, "course_id")

In [0]:
student_courses_3188 = enroll_courses_3188\
                            .join(students_df_3188, "student_id")

In [0]:
student_gpa_3188 = student_courses_3188\
                        .groupBy("student_id")\
                            .agg(avg("GPA").alias("avg_gpa"))\
                                .filter(col("avg_gpa") >= 2.5)

In [0]:
mandatory_courses_3188 = courses_df_3188\
                            .filter(col("mandatory") == "yes")\
                                .groupBy("course_major")\
                                    .agg(countDistinct("course_id").alias("total_mandatory"))

In [0]:
mandatory_check_3188 = student_courses_3188\
                            .filter((col("mandatory") == "yes") & (col("grade") == "A"))\
                                .groupBy("student_id", "student_major")\
                                    .agg(countDistinct("course_id").alias("mandatory_A_count"))

In [0]:
mandatory_valid_3188 = mandatory_check_3188\
                            .join( mandatory_courses_3188, mandatory_check_3188["student_major"] == mandatory_courses_3188["course_major"],"inner")\
                                .filter(col("mandatory_A_count") == col("total_mandatory"))\
                                    .select(col("student_id"), col("student_major"))

In [0]:
elective_check_3188 = student_courses_3188\
                            .filter((col("mandatory") == "no") & (col("grade").isin(["A", "B"])))\
                                .groupBy("student_id", "student_major")\
                                    .agg(countDistinct("course_id").alias("elective_B_count"))\
                                        .filter(col("elective_B_count") >= 2)

In [0]:
mandatory_valid_3188.alias("mv")\
    .join(
        elective_check_3188.alias("ec"), 
        (col("mv.student_id") == col("ec.student_id")) & 
        (col("mv.student_major") == col("ec.student_major")), 
        "inner")\
            .join(student_gpa_3188.alias("sg"), col("mv.student_id") == col("sg.student_id"), "inner")\
                .select(col("mv.student_id").alias("student_id")).distinct().orderBy("student_id").display()

student_id
1
3
