## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2173. Longest Winning Streak (Hard)**

**Table: Matches**

| Column Name | Type |
|-------------|------|
| player_id   | int  |
| match_day   | date |
| result      | enum |

(player_id, match_day) is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the ID of a player, the day of the match they played, and the result of that match.
The result column is an ENUM (category) type of ('Win', 'Draw', 'Lose').
 
The winning streak of a player is the number of consecutive wins uninterrupted by draws or losses.

**Write a solution to count the longest winning streak for each player.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Matches table:**

| player_id | match_day  | result |
|-----------|------------|--------|
| 1         | 2022-01-17 | Win    |
| 1         | 2022-01-18 | Win    |
| 1         | 2022-01-25 | Win    |
| 1         | 2022-01-31 | Draw   |
| 1         | 2022-02-08 | Win    |
| 2         | 2022-02-06 | Lose   |
| 2         | 2022-02-08 | Lose   |
| 3         | 2022-03-30 | Win    |

**Output:** 
| player_id | longest_streak |
|-----------|----------------|
| 1         | 3              |
| 2         | 0              |
| 3         | 1              |

**Explanation:** 

Player 1:
- From 2022-01-17 to 2022-01-25, player 1 won 3 consecutive matches.
- On 2022-01-31, player 1 had a draw.
- On 2022-02-08, player 1 won a match.
- The longest winning streak was 3 matches.

Player 2:
- From 2022-02-06 to 2022-02-08, player 2 lost 2 consecutive matches.
- The longest winning streak was 0 matches.

Player 3:
- On 2022-03-30, player 3 won a match.
- The longest winning streak was 1 match.
 
**Follow up:** If we are interested in calculating the longest streak without losing (i.e., win or draw), how will your solution change?

In [0]:
matches_data_2173 = [
    (1, "2022-01-17", "Win"),
    (1, "2022-01-18", "Win"),
    (1, "2022-01-25", "Win"),
    (1, "2022-01-31", "Draw"),
    (1, "2022-02-08", "Win"),
    (2, "2022-02-06", "Lose"),
    (2, "2022-02-08", "Lose"),
    (3, "2022-03-30", "Win"),
]

matches_columns_2173 = ["player_id", "match_day", "result"]
matches_df_2173 = spark.createDataFrame(matches_data_2173, matches_columns_2173)
matches_df_2173.show()

+---------+----------+------+
|player_id| match_day|result|
+---------+----------+------+
|        1|2022-01-17|   Win|
|        1|2022-01-18|   Win|
|        1|2022-01-25|   Win|
|        1|2022-01-31|  Draw|
|        1|2022-02-08|   Win|
|        2|2022-02-06|  Lose|
|        2|2022-02-08|  Lose|
|        3|2022-03-30|   Win|
+---------+----------+------+



In [0]:
windowSpec = Window.partitionBy("player_id").orderBy("match_day")

In [0]:
df_flagged_2173 = matches_df_2173\
                    .withColumn( "is_win", 
                                when(col("result") == "Win", 1).otherwise(0)
                                )\
                    .withColumn( "is_break", 
                                when(col("result") != "Win", 1).otherwise(0)
                                )

In [0]:
df_grouped_2173 = df_flagged_2173\
                        .withColumn( "streak_group", sum("is_break").over(windowSpec))

In [0]:
df_streaks_2173 = df_grouped_2173\
                    .groupBy("player_id", "streak_group") \
                        .agg(sum("is_win").alias("streak_len"))

In [0]:
df_streaks_2173\
    .groupBy("player_id") \
        .agg(max("streak_len").alias("longest_streak")).show()

+---------+--------------+
|player_id|longest_streak|
+---------+--------------+
|        1|             3|
|        2|             0|
|        3|             1|
+---------+--------------+

