## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2112. The Airport With the Most Traffic (Medium)**

**Table: Flights**

| Column Name       | Type |
|-------------------|------|
| departure_airport | int  |
| arrival_airport   | int  |
| flights_count     | int  |

(departure_airport, arrival_airport) is the primary key column (combination of columns with unique values) for this table.
Each row of this table indicates that there were flights_count flights that departed from departure_airport and arrived at arrival_airport.
 
**Write a solution to report the ID of the airport with the most traffic. The airport with the most traffic is the airport that has the largest total number of flights that either departed from or arrived at the airport. If there is more than one airport with the most traffic, report them all.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Flights table:**

| departure_airport | arrival_airport | flights_count |
|-------------------|-----------------|---------------|
| 1                 | 2               | 4             |
| 2                 | 1               | 5             |
| 2                 | 4               | 5             |

**Output:** 
| airport_id |
|------------|
| 2          |

**Explanation:** 
- Airport 1 was engaged with 9 flights (4 departures, 5 arrivals).
- Airport 2 was engaged with 14 flights (10 departures, 4 arrivals).
- Airport 4 was engaged with 5 flights (5 arrivals).
- The airport with the most traffic is airport 2.

**Example 2:**

**Input:** 

**Flights table:**

| departure_airport | arrival_airport | flights_count |
|-------------------|-----------------|---------------|
| 1                 | 2               | 4             |
| 2                 | 1               | 5             |
| 3                 | 4               | 5             |
| 4                 | 3               | 4             |
| 5                 | 6               | 7             |

**Output:** 
| airport_id |
|------------|
| 1          |
| 2          |
| 3          |
| 4          |

**Explanation:** 
- Airport 1 was engaged with 9 flights (4 departures, 5 arrivals).
- Airport 2 was engaged with 9 flights (5 departures, 4 arrivals).
- Airport 3 was engaged with 9 flights (5 departures, 4 arrivals).
- Airport 4 was engaged with 9 flights (4 departures, 5 arrivals).
- Airport 5 was engaged with 7 flights (7 departures).
- Airport 6 was engaged with 7 flights (7 arrivals).

The airports with the most traffic are airports 1, 2, 3, and 4.

In [0]:
flights_data_2112 = [
    (1, 2, 4),
    (2, 1, 5),
    (2, 4, 5)
]

flights_columns_2112 = ["departure_airport", "arrival_airport", "flights_count"]

flights_df_2112 = spark.createDataFrame(flights_data_2112, flights_columns_2112)
flights_df_2112.show()

+-----------------+---------------+-------------+
|departure_airport|arrival_airport|flights_count|
+-----------------+---------------+-------------+
|                1|              2|            4|
|                2|              1|            5|
|                2|              4|            5|
+-----------------+---------------+-------------+



In [0]:
departures_df_2112 = flights_df_2112\
                        .groupBy("departure_airport") \
                            .agg(sum("flights_count").alias("departures")) \
                                .withColumnRenamed("departure_airport", "airport_id")

In [0]:
arrivals_df_2112 = flights_df_2112\
                        .groupBy("arrival_airport") \
                            .agg(sum("flights_count").alias("arrivals")) \
                                .withColumnRenamed("arrival_airport", "airport_id")

In [0]:
traffic_df_2112 = departures_df_2112\
                        .unionByName(arrivals_df_2112, allowMissingColumns=True) \
                            .groupBy("airport_id") \
                                .agg(
                                    sum(coalesce(col("departures"), lit(0))).alias("total_departures"),
                                    sum(coalesce(col("arrivals"), lit(0))).alias("total_arrivals")
                                    )\
                                        .withColumn("total_traffic", col("total_departures") + col("total_arrivals"))


In [0]:
max_traffic_df_2112 = traffic_df_2112\
                            .agg(
                                max("total_traffic").alias("max_traffic")).collect()[0]["max_traffic"]


In [0]:
traffic_df_2112\
    .filter(col("total_traffic") == max_traffic_df_2112) \
        .select("airport_id").show()

+----------+
|airport_id|
+----------+
|         2|
+----------+

