## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3182. Find Top Scoring Students (Medium)**

**Table: students**

| Column Name | Type     | 
|-------------|----------|
| student_id  | int      |
| name        | varchar  |
| major       | varchar  |

student_id is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the student ID, student name, and their major.

**Table: courses**

| Column Name | Type     | 
|-------------|----------|
| course_id   | int      |
| name        | varchar  |
| credits     | int      |
| major       | varchar  |

course_id is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the course ID, course name, the number of credits for the course, and the major it belongs to.

**Table: enrollments**

| Column Name | Type     | 
|-------------|----------|
| student_id  | int      |
| course_id   | int      |
| semester    | varchar  |
| grade       | varchar  |

(student_id, course_id, semester) is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the student ID, course ID, semester, and grade received.

**Write a solution to find the students who have taken all courses offered in their major and have achieved a grade of A in all these courses.**

Return the result table ordered by student_id in ascending order.

The result format is in the following example.

**Example:**

**Input:**

**students table:**

| student_id | name             | major            |
|------------|------------------|------------------|
| 1          | Alice            | Computer Science |
| 2          | Bob              | Computer Science |
| 3          | Charlie          | Mathematics      |
| 4          | David            | Mathematics      |

**courses table:**

| course_id | name            | credits | major            |
|-----------|-----------------|---------|------------------|
| 101       | Algorithms      | 3       | Computer Science |
| 102       | Data Structures | 3       | Computer Science |
| 103       | Calculus        | 4       | Mathematics      |
| 104       | Linear Algebra  | 4       | Mathematics      |

**enrollments table:**

| student_id | course_id | semester | grade |
|------------|-----------|----------|-------|
| 1          | 101       | Fall 2023| A     |
| 1          | 102       | Fall 2023| A     |
| 2          | 101       | Fall 2023| B     |
| 2          | 102       | Fall 2023| A     |
| 3          | 103       | Fall 2023| A     |
| 3          | 104       | Fall 2023| A     |
| 4          | 103       | Fall 2023| A     |
| 4          | 104       | Fall 2023| B     |

**Output:**

| student_id |
|------------|
| 1          |
| 3          |

**Explanation:**
- Alice (student_id 1) is a Computer Science major and has taken both "Algorithms" and "Data Structures", receiving an 'A' in both.
- Bob (student_id 2) is a Computer Science major but did not receive an 'A' in all required courses.
- Charlie (student_id 3) is a Mathematics major and has taken both "Calculus" and "Linear Algebra", receiving an 'A' in both.
- David (student_id 4) is a Mathematics major but did not receive an 'A' in all required courses.

**Note:** Output table is ordered by student_id in ascending order.

In [0]:
students_data_3182 = [
    (1, "Alice", "Computer Science"),
    (2, "Bob", "Computer Science"),
    (3, "Charlie", "Mathematics"),
    (4, "David", "Mathematics")
]

students_columns_3182 = ["student_id", "name", "major"]
students_df_3182 = spark.createDataFrame(students_data_3182, students_columns_3182)
students_df_3182.show()

courses_data_3182 = [
    (101, "Algorithms", 3, "Computer Science"),
    (102, "Data Structures", 3, "Computer Science"),
    (103, "Calculus", 4, "Mathematics"),
    (104, "Linear Algebra", 4, "Mathematics")
]

courses_columns_3182 = ["course_id", "name", "credits", "major"]
courses_df_3182 = spark.createDataFrame(courses_data_3182, courses_columns_3182)
courses_df_3182.show()

enrollments_data_3182 = [
    (1, 101, "Fall 2023", "A"),
    (1, 102, "Fall 2023", "A"),
    (2, 101, "Fall 2023", "B"),
    (2, 102, "Fall 2023", "A"),
    (3, 103, "Fall 2023", "A"),
    (3, 104, "Fall 2023", "A"),
    (4, 103, "Fall 2023", "A"),
    (4, 104, "Fall 2023", "B")
]

enrollments_columns_3182 = ["student_id", "course_id", "semester", "grade"]
enrollments_df_3182 = spark.createDataFrame(enrollments_data_3182, enrollments_columns_3182)
enrollments_df_3182.show()


+----------+-------+----------------+
|student_id|   name|           major|
+----------+-------+----------------+
|         1|  Alice|Computer Science|
|         2|    Bob|Computer Science|
|         3|Charlie|     Mathematics|
|         4|  David|     Mathematics|
+----------+-------+----------------+

+---------+---------------+-------+----------------+
|course_id|           name|credits|           major|
+---------+---------------+-------+----------------+
|      101|     Algorithms|      3|Computer Science|
|      102|Data Structures|      3|Computer Science|
|      103|       Calculus|      4|     Mathematics|
|      104| Linear Algebra|      4|     Mathematics|
+---------+---------------+-------+----------------+

+----------+---------+---------+-----+
|student_id|course_id| semester|grade|
+----------+---------+---------+-----+
|         1|      101|Fall 2023|    A|
|         1|      102|Fall 2023|    A|
|         2|      101|Fall 2023|    B|
|         2|      102|Fall 2023|    

In [0]:
courses_per_major_3182 = courses_df_3182\
                            .groupBy("major")\
                                .agg(countDistinct("course_id").alias("total_courses"))

In [0]:
a_courses_3182 = enrollments_df_3182\
                    .filter(col("grade") == "A") \
                        .join(courses_df_3182.select("course_id", "major"), on="course_id", how="inner") \
                            .groupBy("student_id", "major") \
                                .agg(countDistinct("course_id").alias("a_courses_count"))

In [0]:
a_courses_3182\
    .join(courses_per_major_3182, on="major", how="inner") \
        .filter(col("a_courses_count") == col("total_courses")) \
            .select("student_id") \
                .orderBy("student_id").display()

student_id
1
3
