## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2494. Merge Overlapping Events in the Same Hall (Hard)**

**Table: HallEvents**

| Column Name | Type |
|-------------|------|
| hall_id     | int  |
| start_day   | date |
| end_day     | date |

This table may contain duplicates rows.
Each row of this table indicates the start day and end day of an event and the hall in which the event is held.
 
**Write a solution to merge all the overlapping events that are held in the same hall. Two events overlap if they have at least one day in common.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**HallEvents table:**
| hall_id | start_day  | end_day    |
|---------|------------|------------|
| 1       | 2023-01-13 | 2023-01-14 |
| 1       | 2023-01-14 | 2023-01-17 |
| 1       | 2023-01-18 | 2023-01-25 |
| 2       | 2022-12-09 | 2022-12-23 |
| 2       | 2022-12-13 | 2022-12-17 |
| 3       | 2022-12-01 | 2023-01-30 |

**Output:** 
| hall_id | start_day  | end_day    |
|---------|------------|------------|
| 1       | 2023-01-13 | 2023-01-17 |
| 1       | 2023-01-18 | 2023-01-25 |
| 2       | 2022-12-09 | 2022-12-23 |
| 3       | 2022-12-01 | 2023-01-30 |

**Explanation:** There are three halls.

Hall 1:
- The two events ["2023-01-13", "2023-01-14"] and ["2023-01-14", "2023-01-17"] overlap. We merge them in one event ["2023-01-13", "2023-01-17"].
- The event ["2023-01-18", "2023-01-25"] does not overlap with any other event, so we leave it as it is.

Hall 2:
- The two events ["2022-12-09", "2022-12-23"] and ["2022-12-13", "2022-12-17"] overlap. We merge them in one event ["2022-12-09", "2022-12-23"].

Hall 3:
- The hall has only one event, so we return it. Note that we only consider the events of each hall separately.

In [0]:
hall_events_data_2494 = [
    (1, "2023-01-13", "2023-01-14"),
    (1, "2023-01-14", "2023-01-17"),
    (1, "2023-01-18", "2023-01-25"),
    (2, "2022-12-09", "2022-12-23"),
    (2, "2022-12-13", "2022-12-17"),
    (3, "2022-12-01", "2023-01-30"),
]

hall_events_columns_2494 = ["hall_id", "start_day", "end_day"]
hall_events_df_2494 = spark.createDataFrame(hall_events_data_2494, hall_events_columns_2494)
hall_events_df_2494.show()


+-------+----------+----------+
|hall_id| start_day|   end_day|
+-------+----------+----------+
|      1|2023-01-13|2023-01-14|
|      1|2023-01-14|2023-01-17|
|      1|2023-01-18|2023-01-25|
|      2|2022-12-09|2022-12-23|
|      2|2022-12-13|2022-12-17|
|      3|2022-12-01|2023-01-30|
+-------+----------+----------+



In [0]:
hall_events_df_2494 = hall_events_df_2494\
                            .withColumn("start_day", to_date("start_day")) \
                                .withColumn("end_day", to_date("end_day"))

In [0]:
hall_events_df_2494 = hall_events_df_2494.dropDuplicates()

In [0]:
windowSpec = Window.partitionBy("hall_id").orderBy("start_day")

In [0]:
hall_events_df_2494 = hall_events_df_2494\
                            .withColumn("prev_end",lag("end_day").over(windowSpec))

In [0]:
hall_events_df_2494 = hall_events_df_2494\
                            .withColumn("new_group",when((col("prev_end").isNull()) | 
                                              (col("start_day") > col("prev_end")), 1).otherwise(0)
                                        )

In [0]:
hall_events_df_2494 = hall_events_df_2494\
                            .withColumn("group_id", sum("new_group").over(windowSpec.rowsBetween(Window.unboundedPreceding, 0))
                                        )

In [0]:
hall_events_df_2494\
    .groupBy("hall_id", "group_id") \
        .agg(
            min("start_day").alias("start_day"),
            max("end_day").alias("end_day")
            ) \
                .select("hall_id", "start_day", "end_day") \
                    .orderBy("hall_id", "start_day").display()

hall_id,start_day,end_day
1,2023-01-13,2023-01-17
1,2023-01-18,2023-01-25
2,2022-12-09,2022-12-23
3,2022-12-01,2023-01-30
