# Find Tournament Winners using Pandas

## Players Table

| Column Name | Type |
|-------------|------|
| player_id   | int  |
| group_id    | int  |

- **player_id** is the primary key of this table.
- Each row indicates the group to which a player belongs.

## Matches Table

| Column Name   | Type |
|---------------|------|
| match_id      | int  |
| first_player  | int  |
| second_player | int  |
| first_score   | int  |
| second_score  | int  |

- **match_id** is the primary key of this table.
- Each row records a match between two players.
- **first_player** and **second_player** contain the `player_id` of each player in the match.
- **first_score** and **second_score** represent the points scored by the first and second players, respectively.
- **Assumption:** In each match, both players belong to the same group.

# Problem Statement

The **winner** in each group is the player who has accumulated the **maximum total points** within that group. In the case of a tie (i.e., multiple players having the same total points), the player with the **lowest `player_id`** wins.

**Task:**  
Write a solution to identify the winner in each group.

**Output:**  
Return the result table in any order.

#### Example

**Input:**

**Players Table:**

| player_id | group_id |
|-----------|----------|
| 15        | 1        |
| 25        | 1        |
| 30        | 1        |
| 45        | 1        |
| 10        | 2        |
| 35        | 2        |
| 50        | 2        |
| 20        | 3        |
| 40        | 3        |

**Matches Table:**

| match_id | first_player | second_player | first_score | second_score |
|----------|--------------|---------------|-------------|--------------|
| 1        | 15           | 45            | 3           | 0            |
| 2        | 30           | 25            | 1           | 2            |
| 3        | 30           | 15            | 2           | 0            |
| 4        | 40           | 20            | 5           | 2            |
| 5        | 35           | 50            | 1           | 1            |

**Output:**

| group_id | player_id |
|----------|-----------|
| 1        | 15        |
| 2        | 35        |
| 3        | 40        |


In [41]:
import pandas as pd

data = [[10, 2], 
        [15, 1], 
        [20, 3], 
        [25, 1], 
        [30, 1], 
        [35, 2], 
        [40, 3], 
        [45, 1], 
        [50, 2]]
players = pd.DataFrame(data, 
                       columns=['player_id', 
                                'group_id']).astype({'player_id':'Int64', 
                                                     'group_id':'Int64'})

data = [[1, 15, 45, 3, 0], 
        [2, 30, 25, 1, 2], 
        [3, 30, 15, 2, 0], 
        [4, 40, 20, 5, 2], 
        [5, 35, 50, 1, 1]]
matches = pd.DataFrame(data, 
                       columns=['match_id', 
                                'first_player', 
                                'second_player', 
                                'first_score', 
                                'second_score']).astype({'match_id':'Int64', 
                                                         'first_player':'Int64', 
                                                         'second_player':'Int64', 
                                                         'first_score':'Int64', 
                                                         'second_score':'Int64'})
display(players, matches)

Unnamed: 0,player_id,group_id
0,10,2
1,15,1
2,20,3
3,25,1
4,30,1
5,35,2
6,40,3
7,45,1
8,50,2


Unnamed: 0,match_id,first_player,second_player,first_score,second_score
0,1,15,45,3,0
1,2,30,25,1,2
2,3,30,15,2,0
3,4,40,20,5,2
4,5,35,50,1,1


**Step 1: Extract the first player's data from each match.**

- Selects the first_player and first_score columns from the matches DataFrame.
- Renames these columns to player_id and score respectively for uniformity.
- df_1 contains two columns: player_id and score, representing the first player's performance in each match.

In [42]:
df_1 = matches[["first_player", 
                "first_score"]].rename(columns={"first_player": "player_id", 
                                                "first_score": "score"})

display(df_1)

Unnamed: 0,player_id,score
0,15,3
1,30,1
2,30,2
3,40,5
4,35,1


**Step 2: Extract the second player's data from each match.**

- Selects the second_player and second_score columns from the matches DataFrame.
- Renames these columns to player_id and score respectively for consistency.
- df_2 mirrors df_1 but for the second player in each match.

In [43]:
df_2 = matches[["second_player", 
                "second_score"]].rename(columns={"second_player": "player_id", 
                                                 "second_score": "score"})
display(df_2)

Unnamed: 0,player_id,score
0,45,0
1,25,2
2,15,0
3,20,2
4,50,1


**Step 3: Combine the data of both players from all matches into a single DataFrame.**

- Concatenates df_1 and df_2 vertically (axis=0), stacking the rows.
- df now contains a unified list of all players and their corresponding scores from every match, regardless of their position (first or second).

In [44]:
df = pd.concat([df_1, df_2], 
               axis=0)
display(df)

Unnamed: 0,player_id,score
0,15,3
1,30,1
2,30,2
3,40,5
4,35,1
0,45,0
1,25,2
2,15,0
3,20,2
4,50,1


**Step 4: Aggregate the total scores for each player across all matches.**

- Groups the DataFrame df by player_id.
- Sums the score for each player.
- Resets the index to convert the grouped data back into a standard DataFrame format.
- df now has two columns:
<br>player_id: Unique identifier for each player.
<br>score: Total accumulated score for each player across all matches.

In [45]:
df = df.groupby(["player_id"])[["score"]].sum().reset_index()
display(df)

Unnamed: 0,player_id,score
0,15,3
1,20,2
2,25,2
3,30,3
4,35,1
5,40,5
6,45,0
7,50,1


**Step 5: Aggregated player scores with additional player information.**

- Performs a left join between the aggregated df and the players DataFrame based on player_id.
- This adds relevant player details, such as group_id, to each player's total score.

In [46]:
df = df.merge(players, 
              how="left", 
              on="player_id")
display(df)

Unnamed: 0,player_id,score,group_id
0,15,3,1
1,20,2,3
2,25,2,1
3,30,3,1
4,35,1,2
5,40,5,3
6,45,0,1
7,50,1,2


**Step 6: Order the DataFrame to prioritize players within each group based on their scores.**

- Sorts the DataFrame first by group_id in ascending order.
Within each group_id, sorts by score in descending order (highest scores first).
- For players with identical scores within the same group, sorts by player_id in ascending order.

In [47]:
df = df.sort_values(by=["group_id", "score", "player_id"], 
                    ascending=[True, False, True])
display(df)

Unnamed: 0,player_id,score,group_id
0,15,3,1
3,30,3,1
2,25,2,1
6,45,0,1
4,35,1,2
7,50,1,2
5,40,5,3
1,20,2,3


**Step 7: Identify the top player in each group based on the previous sorting.**

- Groups the DataFrame by group_id.
- For each group, selects the first player_id (which, due to prior sorting, is the player with the highest score).
- Resets the index to finalize the DataFrame structure.


In [48]:
df = df.groupby(["group_id"])[["player_id"]].first().reset_index()
display(df)

Unnamed: 0,group_id,player_id
0,1,15
1,2,35
2,3,40


References: [1] https://leetcode.com/problems/tournament-winners/?lang=pythondata