## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3124. Find Longest Calls (Medium)**

**Table: Contacts**

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| first_name  | varchar |
| last_name   | varchar |

id is the primary key (column with unique values) of this table.
id is a foreign key (reference column) to Calls table.
Each row of this table contains id, first_name, and last_name.

**Table: Calls**

| Column Name | Type |
|-------------|------|
| contact_id  | int  |
| type        | enum |
| duration    | int  |

(contact_id, type, duration) is the primary key (column with unique values) of this table.
type is an ENUM (category) type of ('incoming', 'outgoing').
Each row of this table contains information about calls, comprising of contact_id, type, and duration in seconds.

**Write a solution to find the three longest incoming and outgoing calls.**

Return the result table ordered by type, duration, and first_name in descending order and duration must be formatted as HH:MM:SS.

The result format is in the following example.

**Example 1:**

**Input:**

**Contacts table:**

| id | first_name | last_name |
|----|------------|-----------|
| 1  | John       | Doe       |
| 2  | Jane       | Smith     |
| 3  | Alice      | Johnson   |
| 4  | Michael    | Brown     |
| 5  | Emily      | Davis     |

**Calls table:**

| contact_id | type     | duration |
|------------|----------|----------|
| 1          | incoming | 120      |
| 1          | outgoing | 180      |
| 2          | incoming | 300      |
| 2          | outgoing | 240      |
| 3          | incoming | 150      |
| 3          | outgoing | 360      |
| 4          | incoming | 420      |
| 4          | outgoing | 200      |
| 5          | incoming | 180      |
| 5          | outgoing | 280      |
        
**Output:**

| first_name| type     | duration_formatted|
|-----------|----------|-------------------|
| Alice     | outgoing | 00:06:00          |
| Emily     | outgoing | 00:04:40          |
| Jane      | outgoing | 00:04:00          |
| Michael   | incoming | 00:07:00          |
| Jane      | incoming | 00:05:00          |
| Emily     | incoming | 00:03:00          |


**Explanation:**
- Alice had an outgoing call lasting 6 minutes.
- Emily had an outgoing call lasting 4 minutes and 40 seconds.
- Jane had an outgoing call lasting 4 minutes.
- Michael had an incoming call lasting 7 minutes.
- Jane had an incoming call lasting 5 minutes.
- Emily had an incoming call lasting 3 minutes.

**Note:** Output table is sorted by type, duration, and first_name in descending order.

In [0]:
contacts_data_3124 = [
    (1, "John", "Doe"),
    (2, "Jane", "Smith"),
    (3, "Alice", "Johnson"),
    (4, "Michael", "Brown"),
    (5, "Emily", "Davis"),
]

contacts_columns_3124 = ["id", "first_name", "last_name"]
contacts_df_3124 = spark.createDataFrame(contacts_data_3124, contacts_columns_3124)
contacts_df_3124.show()

calls_data_3124 = [
    (1, "incoming", 120),
    (1, "outgoing", 180),
    (2, "incoming", 300),
    (2, "outgoing", 240),
    (3, "incoming", 150),
    (3, "outgoing", 360),
    (4, "incoming", 420),
    (4, "outgoing", 200),
    (5, "incoming", 180),
    (5, "outgoing", 280),
]

calls_columns_3124 = ["contact_id", "type", "duration"]
calls_df_3124 = spark.createDataFrame(calls_data_3124, calls_columns_3124)
calls_df_3124.show()

+---+----------+---------+
| id|first_name|last_name|
+---+----------+---------+
|  1|      John|      Doe|
|  2|      Jane|    Smith|
|  3|     Alice|  Johnson|
|  4|   Michael|    Brown|
|  5|     Emily|    Davis|
+---+----------+---------+

+----------+--------+--------+
|contact_id|    type|duration|
+----------+--------+--------+
|         1|incoming|     120|
|         1|outgoing|     180|
|         2|incoming|     300|
|         2|outgoing|     240|
|         3|incoming|     150|
|         3|outgoing|     360|
|         4|incoming|     420|
|         4|outgoing|     200|
|         5|incoming|     180|
|         5|outgoing|     280|
+----------+--------+--------+



In [0]:
df_3124 = contacts_df_3124\
                .join( calls_df_3124, contacts_df_3124.id == calls_df_3124.contact_id, "inner")

In [0]:
window_spec = Window.partitionBy("type").orderBy(desc("duration"), desc("first_name"))

In [0]:
df_ranked_3124 = df_3124\
                    .withColumn("rank", row_number().over(window_spec))

In [0]:
df_top3_3124 = df_ranked_3124\
                    .filter(col("rank") <= 3)

In [0]:
df_top3_3124 = df_top3_3124\
                .withColumn( "hours", floor(col("duration") / 3600))\
                    .withColumn( "minutes", floor((col("duration") % 3600) / 60))\
                        .withColumn( "seconds", col("duration") % 60)\
                            .withColumn( "duration_formatted", 
                                        concat_ws(":",
                                                  lpad(col("hours").cast("string"), 2, "0"),
                                                  lpad(col("minutes").cast("string"), 2, "0"),
                                                  lpad(col("seconds").cast("string"), 2, "0")
                                                  )
                                        )

In [0]:
df_top3_3124\
    .select("first_name", "type", "duration_formatted", "duration") \
        .orderBy(col("type").desc(), col("duration").desc(), col("first_name").desc()) \
            .drop("duration").display()

first_name,type,duration_formatted
Alice,outgoing,00:06:00
Emily,outgoing,00:04:40
Jane,outgoing,00:04:00
Michael,incoming,00:07:00
Jane,incoming,00:05:00
Emily,incoming,00:03:00
