In [1]:
import pandas as pd
import numpy as np

results = pd.read_csv('results.csv')
results.head()

Unnamed: 0,date,year,home_team,away_team,home_score,away_score,total_goals,win_margin,tournament,city,country,neutral
0,2000-01-04,2000,Egypt,Togo,2,1,3,1,Friendly,Aswan,Egypt,False
1,2000-01-07,2000,Tunisia,Togo,7,0,7,7,Friendly,Tunis,Tunisia,False
2,2000-01-08,2000,Trinidad and Tobago,Canada,0,0,0,0,Friendly,Port of Spain,Trinidad and Tobago,False
3,2000-01-09,2000,Burkina Faso,Gabon,1,1,2,0,Friendly,Ouagadougou,Burkina Faso,False
4,2000-01-09,2000,Guatemala,Armenia,1,1,2,0,Friendly,Los Angeles,United States,True


#### 1. Count the Number of Unique Home Teams and Away Teams

In [5]:
number_home_teams = results['home_team'].nunique()
number_away_teams = results['away_team'].nunique()

number_home_teams

245

In [4]:
number_away_teams

242

#### 2. Tournament Statistics

For each tournament
- Compute the number of games played
- Sum up the total number of goals 
- Obtain the maximum and minimum win margins

In [7]:
tournament_statistics = results.groupby('tournament').agg({'total_goals' : 'sum', 'win_margin' : ['min', 'max']}).reset_index()

tournament_statistics

Unnamed: 0_level_0,tournament,total_goals,win_margin,win_margin
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,min,max
0,AFC Asian Cup qualification,1480,-8,20
1,African Cup of Nations,933,-5,5
2,African Cup of Nations qualification,2388,-10,7
3,CECAFA Cup,858,-9,9
4,FIFA World Cup,965,-6,8
5,FIFA World Cup qualification,14174,-12,31
6,Friendly,18853,-9,15
7,Gold Cup,796,-6,7
8,UEFA Euro qualification,3593,-13,11
9,UEFA Nations League,1139,-6,6


#### 3. Analyzing the 2022 FIFA World Cup

Calculate the Total Number of Goals by Each Team in the FIFA World Cup 2022. 

Hint: We'll need to combine each team's goals scored as both the home team and away team. 

1. **Split** `results` into a DataFrame containing only FIFA World Cup 2022 matches
- Combine multiple boolean masks using the `&` operator:
    - `results['tournament']=='FIFA World Cup'`
    - `results['year']== 2022` 

- Perform a second split that creates separate DataFrames for home teams and away teams

2. **Apply** aggregation functions to each DataFrame that sums up the total number of goals scored by each team
- Don't forget to flatten the index of the resulting aggregated DataFrames

3. **Combine** the two DataFrames by performing a **left** join where the DataFrame on the left contains the data for the home team (important!)
- Feel free to clean the column names **but don't drop the `home_team` and `away_team` columns!**
- Create a new column `total_goals` that adds up the total goals scored by each team as the home team and away team
- Sort from highest to lowest number of `total_goals` by each team.

In [37]:
fifa_world_cup_matches = results[(results['tournament'] == 'FIFA World Cup') & (results['year'] == 2022)]
fifa_world_cup_matches_home_teams = fifa_world_cup_matches[['home_team', 'home_score']].groupby('home_team').agg({'home_score' : 'sum'})
fifa_world_cup_matches_away_teams = fifa_world_cup_matches[['away_team', 'away_score']].groupby('away_team').agg({'away_score' : 'sum'})

In [38]:
fifa_world_cup_matches_home_teams = fifa_world_cup_matches_home_teams.reset_index()
fifa_world_cup_matches_home_teams

Unnamed: 0,home_team,home_score
0,Argentina,11
1,Australia,1
2,Belgium,1
3,Brazil,7
4,Cameroon,4
5,Canada,1
6,Costa Rica,2
7,Croatia,7
8,Denmark,0
9,Ecuador,1


In [39]:
fifa_world_cup_matches_away_teams = fifa_world_cup_matches_away_teams.reset_index()

fifa_world_cup_matches_away_teams

Unnamed: 0,away_team,away_score
0,Argentina,4
1,Australia,3
2,Belgium,0
3,Brazil,1
4,Cameroon,0
5,Canada,1
6,Costa Rica,1
7,Croatia,1
8,Denmark,1
9,Ecuador,3


In [49]:
team_scores = pd.merge(left = fifa_world_cup_matches_home_teams,
                       right = fifa_world_cup_matches_away_teams,
                       left_on = 'home_team',
                       right_on = 'away_team',
                       how = 'outer')
team_scores = team_scores.drop(labels = 'away_team', axis = 1)
team_scores
team_scores['total_score'] = team_scores['home_score'] + team_scores['away_score']
team_scores.columns = ['team', 'home_score', 'away_score', 'total_score']
team_scores

Unnamed: 0,team,home_score,away_score,total_score
0,Argentina,11,4.0,15.0
1,Australia,1,3.0,4.0
2,Belgium,1,0.0,1.0
3,Brazil,7,1.0,8.0
4,Cameroon,4,0.0,4.0
5,Canada,1,1.0,2.0
6,Costa Rica,2,1.0,3.0
7,Croatia,7,1.0,8.0
8,Denmark,0,1.0,1.0
9,Ecuador,1,3.0,4.0


Note: Since the world cup in 2022 took place in Qatar, they never played as the away team! This is why we used the **left** join in order to return Qatar's total home goals!

Feel free to continue exploring the dataset by adding more cells below!