## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3390. Longest Team Pass Streak (Hard)**

**Table: Teams**

| Column Name | Type    |
|-------------|---------|
| player_id   | int     |
| team_name   | varchar | 

player_id is the unique key for this table.
Each row contains the unique identifier for player and the name of one of the teams participating in that match.

**Table: Passes**

| Column Name | Type    |
|-------------|---------|
| pass_from   | int     |
| time_stamp  | varchar |
| pass_to     | int     |

(pass_from, time_stamp) is the unique key for this table.
pass_from is a foreign key to player_id from Teams table.
Each row represents a pass made during a match, time_stamp represents the time in minutes (00:00-90:00) when the pass was made,
pass_to is the player_id of the player receiving the pass.

**Write a solution to find the longest successful pass streak for each team during the match. The rules are as follows:**

- A successful pass streak is defined as consecutive passes where:
  - Both the pass_from and pass_to players belong to the same team
- A streak breaks when either:
  - The pass is intercepted (received by a player from the opposing team)

Return the result table ordered by team_name in ascending order.

The result format is in the following example.

**Example:**

**Input:**

**Teams table:**

| player_id | team_name |
|-----------|-----------|
| 1         | Arsenal   |
| 2         | Arsenal   |
| 3         | Arsenal   |
| 4         | Arsenal   |
| 5         | Chelsea   |
| 6         | Chelsea   |
| 7         | Chelsea   |
| 8         | Chelsea   |

**Passes table:**

| pass_from | time_stamp | pass_to |
|-----------|------------|---------|
| 1         | 00:05      | 2       |
| 2         | 00:07      | 3       |
| 3         | 00:08      | 4       |
| 4         | 00:10      | 5       |
| 6         | 00:15      | 7       |
| 7         | 00:17      | 8       |
| 8         | 00:20      | 6       |
| 6         | 00:22      | 5       |
| 1         | 00:25      | 2       |
| 2         | 00:27      | 3       |

**Output:**

| team_name | longest_streak |
|-----------|----------------|
| Arsenal   | 3              |
| Chelsea   | 4              |


**Explanation:**
- Arsenal's streaks:
  - First streak: 3 passes (1→2→3→4) ended when player 4 passed to Chelsea's player 5
  - Second streak: 2 passes (1→2→3)
  - Longest streak = 3
- Chelsea's streaks:
  - First streak: 3 passes (6→7→8→6→5)
  - Longest streak = 4

In [0]:
teams_data_3390 = [
    (1, "Arsenal"),
    (2, "Arsenal"),
    (3, "Arsenal"),
    (4, "Arsenal"),
    (5, "Chelsea"),
    (6, "Chelsea"),
    (7, "Chelsea"),
    (8, "Chelsea")
]

teams_columns_3390 = ["player_id", "team_name"]
teams_df_3390 = spark.createDataFrame(teams_data_3390, teams_columns_3390)
teams_df_3390.show()

passes_data_3390 = [
    (1, "00:05", 2),  
    (2, "00:07", 3),  
    (3, "00:08", 4),  
    (4, "00:10", 5),  
    (6, "00:15", 7),  
    (7, "00:17", 8),  
    (8, "00:20", 6),  
    (6, "00:22", 5),  
    (1, "00:25", 2),  
    (2, "00:27", 3)   
]

passes_columns_3390 = ["pass_from", "time_stamp", "pass_to"]
passes_df_3390 = spark.createDataFrame(passes_data_3390, passes_columns_3390)
passes_df_3390.show()

+---------+---------+
|player_id|team_name|
+---------+---------+
|        1|  Arsenal|
|        2|  Arsenal|
|        3|  Arsenal|
|        4|  Arsenal|
|        5|  Chelsea|
|        6|  Chelsea|
|        7|  Chelsea|
|        8|  Chelsea|
+---------+---------+

+---------+----------+-------+
|pass_from|time_stamp|pass_to|
+---------+----------+-------+
|        1|     00:05|      2|
|        2|     00:07|      3|
|        3|     00:08|      4|
|        4|     00:10|      5|
|        6|     00:15|      7|
|        7|     00:17|      8|
|        8|     00:20|      6|
|        6|     00:22|      5|
|        1|     00:25|      2|
|        2|     00:27|      3|
+---------+----------+-------+



In [0]:
passes_df_3390 = passes_df_3390 \
                    .join(teams_df_3390\
                        .withColumnRenamed("player_id", "pass_from"), on="pass_from") \
                        .withColumnRenamed("team_name", "from_team") \
                    .join(teams_df_3390\
                        .withColumnRenamed("player_id", "pass_to"), on="pass_to") \
                        .withColumnRenamed("team_name", "to_team")

In [0]:
passes_df_3390 = passes_df_3390\
                    .withColumn( "total_seconds", 
                                col("time_stamp").substr(1,2).cast("int")*60 + col("time_stamp").substr(4,2).cast("int")
                                ).orderBy("total_seconds")

In [0]:
passes_df_3390 = passes_df_3390\
                    .withColumn( "successful", when(col("from_team") == col("to_team"), 1).otherwise(0))

In [0]:
window_spec = Window.orderBy("total_seconds")

In [0]:
passes_df_3390 = passes_df_3390\
                    .withColumn( "streak_group", sum(when(col("successful") == 0, 1).otherwise(0)).over(window_spec))



In [0]:
streaks_df_3390 = passes_df_3390\
                    .filter(col("successful") == 1) \
                        .groupBy("from_team", "streak_group") \
                            .count() \
                                .withColumnRenamed("count", "streak_length")



In [0]:
streaks_df_3390\
    .groupBy("from_team") \
        .agg(max("streak_length").alias("longest_streak")) \
            .withColumnRenamed("from_team", "team_name") \
                .orderBy("team_name").display()



team_name,longest_streak
Arsenal,3
Chelsea,4
