# Project Submission

**Please use this notebook for your submission.**

Make sure to fill out all the required fields and to answer all the questions.

At the end of this project, you will have answered the following questions:

1. What is the average number of goals for the home team?
2. What is the average number of goals for the away team?
3. What is the country with the highest overall home score?
4. What are the top 3 type of tournaments?
5. Which country has the highest overall FIFA World Cup goals?


**Submission Requirements:**

- Make sure that you run all cells with code in your notebook before submitting.
- You can add additional cells of code if you want, but make sure to clean up your notebook, and only leave the code required to answer the questions.

## Step 1: Getting Started

You will mainly work with a Python library called Pandas. Pandas is a powerful library that allows us to manipulate data. In order to use Pandas, you first have to import it.

In [1]:
import pandas as pd

## Step 2: Loading and Exporing the Data

You can use Pandas to explore and manipulate the _results.csv_ file.
You first have to load the csv file into a Pandas dataframe, so you can then analyize the data.

When using `pd.read_csv()`, make sure you inlcude the correct path to the csv file, depending on where you saved it when you downloaded it.

In [2]:
# First: Load the csv file into a Pandas dataframe (df)

df = pd.read_csv('/content/results.csv')

In [3]:
# Explore the df

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28275 entries, 0 to 28274
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   date        28275 non-null  object
 1   home_team   28275 non-null  object
 2   away_team   28275 non-null  object
 3   home_score  28275 non-null  int64 
 4   away_score  28275 non-null  int64 
 5   tournament  28275 non-null  object
 6   city        28274 non-null  object
 7   country     28274 non-null  object
 8   neutral     28274 non-null  object
dtypes: int64(2), object(7)
memory usage: 1.9+ MB


## Step 3: Data Analysis

### Q1: What is the average number of goals for the home team?

Hint: You can call the `mean()` method on the _'home_score'_ column.

In [4]:
# Code here

df['home_score'].mean()

1.8235897435897437

### Q2: What is the average number of goals for the away team?

Hint: You can call the `mean()` method on the _'away_score'_ column.

In [5]:
# Code here

df['away_score'].mean()

1.2258885941644562

### Q3: What is the country with the highest overall home score?

Hint: You will need to group the data by country, then get the sum of home scores for each country. You can use the `idxmax()` function to find the index(country) with the highest sum.

In [6]:
# Code here

df.groupby('country')['home_score'].sum().idxmax()

'Sweden'

### Q4: What are the top 3 type of tournaments?

Hint: You can use the `value_counts()` method to count the occurences of each unique value in the _'tournament'_ column.

In [7]:
# Code here

df.value_counts('tournament').nlargest(3)

tournament
Friendly                        11952
FIFA World Cup qualification     4504
UEFA Euro qualification          1502
dtype: int64

### Q5: Which country has the highest overall FIFA World Cup goals?

To answer this question, consider breaking down your solution into 3 steps.

**5.1 Create a new dataframe (fifa_df) that only contains rows where the "tournament" column == 'FIFA World Cup'.**

In [8]:
# Code here

fifa_df=df[df.tournament == 'FIFA World Cup']

**5.2 In your new fifa_df, create a new _'total_score'_ column that sums _'home_score'_ + _'away_score'_ for each row.**

PS. If you get a "SettingWithCopyWarning", please ignore it.

In [9]:
# Code here
fifa_df.insert(5, "total_score", fifa_df['home_score'] + fifa_df['away_score'])

**5.3 Group your data by country, then get the sum of the _'total_score'_ column. From there, you can use `idxmax()` to find the country with the highest _'total_score'_.**

In [10]:
# Code here
fifa_df.groupby('country')['total_score'].sum().idxmax()

'France'