## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**3384. Team Dominance by Pass Success (Hard)**

**Table: Teams**

| Column Name | Type    |
|-------------|---------|
| player_id   | int     |
| team_name   | varchar | 

player_id is the unique key for this table.
Each row contains the unique identifier for player and the name of one of the teams participating in that match.

**Table: Passes**

| Column Name | Type    |
|-------------|---------|
| pass_from   | int     |
| time_stamp  | varchar |
| pass_to     | int     |

(pass_from, time_stamp) is the primary key for this table.
pass_from is a foreign key to player_id from Teams table.

Each row represents a pass made during a match, time_stamp represents the time in minutes (00:00-90:00) when the pass was made,
pass_to is the player_id of the player receiving the pass.

**Write a solution to calculate the dominance score for each team in both halves of the match. The rules are as follows:**

- A match is divided into two halves: first half (00:00-45:00 minutes) and second half (45:01-90:00 minutes)
- The dominance score is calculated based on successful and intercepted passes:
  - When pass_to is a player from the same team: +1 point
  - When pass_to is a player from the opposing team (interception): -1 point
- A higher dominance score indicates better passing performance

Return the result table ordered by team_name and half_number in ascending order.

The result format is in the following example.

**Example:**

**Input:**

**Teams table:**

| player_id  | team_name |
|------------|-----------|
| 1          | Arsenal   |
| 2          | Arsenal   |
| 3          | Arsenal   |
| 4          | Chelsea   |
| 5          | Chelsea   |
| 6          | Chelsea   |

**Passes table:**

| pass_from | time_stamp | pass_to |
|-----------|------------|---------|
| 1         | 00:15      | 2       |
| 2         | 00:45      | 3       |
| 3         | 01:15      | 1       |
| 4         | 00:30      | 1       |
| 2         | 46:00      | 3       |
| 3         | 46:15      | 4       |
| 1         | 46:45      | 2       |
| 5         | 46:30      | 6       |

**Output:**

| team_name | half_number | dominance |
|-----------|-------------|-----------|
| Arsenal   | 1           | 3         |
| Arsenal   | 2           | 1         |
| Chelsea   | 1           | -1        |
| Chelsea   | 2           | 1         |

**Explanation:**

- **First Half (00:00-45:00):**
  - Arsenal's passes:
    - 1 → 2 (00:15): Successful pass (+1)
    - 2 → 3 (00:45): Successful pass (+1)
    - 3 → 1 (01:15): Successful pass (+1)
  - Chelsea's passes:
    - 4 → 1 (00:30): Intercepted by Arsenal (-1)

- **Second Half (45:01-90:00):**
  - Arsenal's passes:
    - 2 → 3 (46:00): Successful pass (+1)
    - 3 → 4 (46:15): Intercepted by Chelsea (-1)
    - 1 → 2 (46:45): Successful pass (+1)
  - Chelsea's passes:
    - 5 → 6 (46:30): Successful pass (+1)

The results are ordered by team_name and then half_number


In [0]:
teams_data_3384 = [
    (1, "Arsenal"),
    (2, "Arsenal"),
    (3, "Arsenal"),
    (4, "Chelsea"),
    (5, "Chelsea"),
    (6, "Chelsea")
]

teams_columns_3384 = ["player_id", "team_name"]
teams_df_3384 = spark.createDataFrame(teams_data_3384, teams_columns_3384)
teams_df_3384.show()

passes_data_3384 = [
    (1, "00:15", 2),  
    (2, "00:45", 3),  
    (3, "01:15", 1),  
    (4, "00:30", 1),  
    (2, "46:00", 3),  
    (3, "46:15", 4),  
    (1, "46:45", 2),  
    (5, "46:30", 6)   
]

passes_columns_3384 = ["pass_from", "time_stamp", "pass_to"]
passes_df_3384 = spark.createDataFrame(passes_data_3384, passes_columns_3384)
passes_df_3384.show()

+---------+---------+
|player_id|team_name|
+---------+---------+
|        1|  Arsenal|
|        2|  Arsenal|
|        3|  Arsenal|
|        4|  Chelsea|
|        5|  Chelsea|
|        6|  Chelsea|
+---------+---------+

+---------+----------+-------+
|pass_from|time_stamp|pass_to|
+---------+----------+-------+
|        1|     00:15|      2|
|        2|     00:45|      3|
|        3|     01:15|      1|
|        4|     00:30|      1|
|        2|     46:00|      3|
|        3|     46:15|      4|
|        1|     46:45|      2|
|        5|     46:30|      6|
+---------+----------+-------+



In [0]:
passes_df_3384 = passes_df_3384 \
                    .join(teams_df_3384\
                        .withColumnRenamed("player_id", "pass_from"), on="pass_from") \
                        .withColumnRenamed("team_name", "from_team") \
                    .join(teams_df_3384\
                        .withColumnRenamed("player_id", "pass_to"), on="pass_to") \
                        .withColumnRenamed("team_name", "to_team")

In [0]:
passes_df_3384 = passes_df_3384\
                    .withColumn( "minute", split(col("time_stamp"), ":").getItem(0).cast("int") * 60 + split(col("time_stamp"), ":").getItem(1).cast("int"))\
                        .withColumn("half_number", when(col("minute") <= 45*60, 1).otherwise(2))\
                            .withColumn( "dominance_point", when(col("from_team") == col("to_team"), 1).otherwise(-1))

In [0]:
passes_df_3384\
    .groupBy("from_team", "half_number") \
        .sum("dominance_point") \
            .withColumnRenamed("from_team", "team_name") \
                .withColumnRenamed("sum(dominance_point)", "dominance") \
                    .orderBy("team_name", "half_number").display()

team_name,half_number,dominance
Arsenal,1,3
Arsenal,2,1
Chelsea,1,-1
Chelsea,2,1
