## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1972. First and Last Call On the Same Day (Hard)**

**Table: Calls**

| Column Name  | Type     |
|--------------|----------|
| caller_id    | int      |
| recipient_id | int      |
| call_time    | datetime |

(caller_id, recipient_id, call_time) is the primary key (combination of columns with unique values) for this table.
Each row contains information about the time of a phone call between caller_id and recipient_id.
 
**Write a solution to report the IDs of the users whose first and last calls on any day were with the same person. Calls are counted regardless of being the caller or the recipient.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Calls table:**

| caller_id | recipient_id | call_time           |
|-----------|--------------|---------------------|
| 8         | 4            | 2021-08-24 17:46:07 |
| 4         | 8            | 2021-08-24 19:57:13 |
| 5         | 1            | 2021-08-11 05:28:44 |
| 8         | 3            | 2021-08-17 04:04:15 |
| 11        | 3            | 2021-08-17 13:07:00 |
| 8         | 11           | 2021-08-17 22:22:22 |

**Output:** 
| user_id |
|---------|
| 1       |
| 4       |
| 5       |
| 8       |

**Explanation:** 
On 2021-08-24, the first and last call of this day for user 8 was with user 4. User 8 should be included in the answer.
Similarly, user 4 on 2021-08-24 had their first and last call with user 8. User 4 should be included in the answer.
On 2021-08-11, user 1 and 5 had a call. This call was the only call for both of them on this day. Since this call is the first and last call of the day for both of them, they should both be included in the answer.

In [0]:
calls_data_1972 = [
    (8, 4, "2021-08-24 17:46:07"),
    (4, 8, "2021-08-24 19:57:13"),
    (5, 1, "2021-08-11 05:28:44"),
    (8, 3, "2021-08-17 04:04:15"),
    (11, 3, "2021-08-17 13:07:00"),
    (8, 11, "2021-08-17 22:22:22"),
]

calls_columns_1972 = ["caller_id", "recipient_id", "call_time"]
calls_df_1972 = spark.createDataFrame(calls_data_1972, calls_columns_1972)
calls_df_1972.show()

+---------+------------+-------------------+
|caller_id|recipient_id|          call_time|
+---------+------------+-------------------+
|        8|           4|2021-08-24 17:46:07|
|        4|           8|2021-08-24 19:57:13|
|        5|           1|2021-08-11 05:28:44|
|        8|           3|2021-08-17 04:04:15|
|       11|           3|2021-08-17 13:07:00|
|        8|          11|2021-08-17 22:22:22|
+---------+------------+-------------------+



In [0]:
calls_normalized_df_1972 = calls_df_1972\
                                .select(
                                    col("caller_id").alias("user_id"),
                                    col("recipient_id").alias("other_id"),"call_time"
                                    )\
                            .union(calls_df_1972\
                                .select(
                                    col("recipient_id").alias("user_id"),
                                    col("caller_id").alias("other_id"),"call_time")
                                )\
                                    .withColumn("call_date", to_date("call_time"))

In [0]:
windowSpec = Window.partitionBy("user_id", "call_date").orderBy("call_time")\
                .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)

In [0]:
calls_with_ranks_df_1972 = calls_normalized_df_1972\
                                .withColumn("first_other", first("other_id").over(windowSpec))\
                                    .withColumn("last_other", last("other_id").over(windowSpec))

In [0]:
calls_with_ranks_df_1972\
    .filter(col("first_other") == col("last_other"))\
        .select("user_id")\
            .distinct().show()

+-------+
|user_id|
+-------+
|      8|
|      1|
|      5|
|      4|
+-------+

