# Pandas Series and DataFrames: Practice 

## Introduction

In this lab, we'll look at a dataset which contains information on World Cup matches. Let's use the pandas commands learned in the previous lesson to learn more about our data!

## Objectives

You will be able to: 

- Use pandas methods and attributes to access information about a dataset 
- Index pandas dataframes with .loc, .iloc, and column names 
- Use a boolean mask to index pandas series and dataframes

## Load the Data

Load the file `'WorldCupMatches.csv'` as a DataFrame in pandas.

In [316]:
# Import pandas using the standard alias
import pandas as pd

# Load 'WorldCupMatches.csv' as a DataFrame
df = pd.read_csv('WorldCupMatches.csv')

## Common Methods and Attributes

Use the correct method to display the **first 7 rows** of the dataset.

In [317]:
# Display the first 7 rows of df
df.head(7)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930,13 Jul 1930 - 15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444.0,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1930,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,,18346.0,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL
2,1930,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,,24059.0,2,0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201,1093,YUG,BRA
3,1930,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,,2549.0,1,0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201,1098,ROU,PER
4,1930,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,,23409.0,0,0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201,1085,ARG,FRA
5,1930,16 Jul 1930 - 14:45,Group 1,Parque Central,Montevideo,Chile,3,0,Mexico,,9249.0,1,0,CRISTOPHE Henry (BEL),APHESTEGUY Martin (URU),LANGENUS Jean (BEL),201,1095,CHI,MEX
6,1930,17 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,4,0,Bolivia,,18306.0,0,0,MATEUCCI Francisco (URU),LOMBARDI Domingo (URU),WARNKEN Alberto (CHI),201,1092,YUG,BOL


Display the **last 3 rows** of the dataset.

In [318]:
# Display the last 3 rows of df
df.tail(3)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
849,2014,09 Jul 2014 - 17:00,Semi-finals,Arena de Sao Paulo,Sao Paulo,Netherlands,0,0,Argentina,Argentina win on penalties (2 - 4),63267.0,0,0,C�neyt �AKIR (TUR),DURAN Bahattin (TUR),ONGUN Tarik (TUR),255955,300186490,NED,ARG
850,2014,12 Jul 2014 - 17:00,Play-off for third place,Estadio Nacional,Brasilia,Brazil,0,3,Netherlands,,68034.0,0,2,HAIMOUDI Djamel (ALG),ACHIK Redouane (MAR),ETCHIALI Abdelhak (ALG),255957,300186502,BRA,NED
851,2014,13 Jul 2014 - 16:00,Final,Estadio do Maracana,Rio De Janeiro,Germany,1,0,Argentina,Germany win after extra time,74738.0,0,0,Nicola RIZZOLI (ITA),Renato FAVERANI (ITA),Andrea STEFANI (ITA),255959,300186501,GER,ARG


Get a concise summary of the data using `.info()`. 

In [319]:
# Print a concise summary of df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 852 entries, 0 to 851
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Year                  852 non-null    int64  
 1   Datetime              852 non-null    object 
 2   Stage                 852 non-null    object 
 3   Stadium               852 non-null    object 
 4   City                  852 non-null    object 
 5   Home Team Name        852 non-null    object 
 6   Home Team Goals       852 non-null    int64  
 7   Away Team Goals       852 non-null    int64  
 8   Away Team Name        852 non-null    object 
 9   Win conditions        852 non-null    object 
 10  Attendance            850 non-null    float64
 11  Half-time Home Goals  852 non-null    int64  
 12  Half-time Away Goals  852 non-null    int64  
 13  Referee               852 non-null    object 
 14  Assistant 1           852 non-null    object 
 15  Assistant 2           8

Obtain a tuple representing the **number of rows and number of columns**.

In [320]:
# Display the number of rows and columns in df
df.shape

(852, 20)

Use the appropriate attribute to get the **column names**.

In [321]:
# Display the column names of df
df.columns

Index(['Year', 'Datetime', 'Stage', 'Stadium', 'City', 'Home Team Name',
       'Home Team Goals', 'Away Team Goals', 'Away Team Name',
       'Win conditions', 'Attendance', 'Half-time Home Goals',
       'Half-time Away Goals', 'Referee', 'Assistant 1', 'Assistant 2',
       'RoundID', 'MatchID', 'Home Team Initials', 'Away Team Initials'],
      dtype='object')

Get the index of the DataFrame:

In [349]:
df.index

RangeIndex(start=0, stop=852, step=1)

## Selecting DataFrame Information

When looking at the DataFrame's `.head()` and `.tail()`, you might have noticed that the games are structured chronologically in the DataFrame.

Use the right selection method to display all the information from the 3rd to the 5th game (i.e. **select rows 3 through 5 inclusive**).

In [322]:
# Display rows 3 through 5
# .iloc interval is "half open", does not include 6 in the output
df.iloc[3:6]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
3,1930,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,,2549.0,1,0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201,1098,ROU,PER
4,1930,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,,23409.0,0,0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201,1085,ARG,FRA
5,1930,16 Jul 1930 - 14:45,Group 1,Parque Central,Montevideo,Chile,3,0,Mexico,,9249.0,1,0,CRISTOPHE Henry (BEL),APHESTEGUY Martin (URU),LANGENUS Jean (BEL),201,1095,CHI,MEX


Now, display the info from **games labeled 5-9 in the index** (inclusive), but **only the `"Home Team Name"` and the `"Away Team Name"` columns**.

In [323]:
# Display rows 5 through 9 and columns 'Home Team Name' and 'Away Team Name'
# .loc interval is not "half open", it includes the endpoint
df.loc[5:9, ['Home Team Name', 'Away Team Name']]

Unnamed: 0,Home Team Name,Away Team Name
5,Chile,Mexico
6,Yugoslavia,Bolivia
7,USA,Paraguay
8,Uruguay,Peru
9,Chile,France


Next, we'd like the information on all the games played in **Group 3** for the **1950** World Cup.

Hint: You can combine conditions like this:

`df[(condition1) | (condition2)]`  -> Returns rows where either condition is true

`df[(condition1) & (condition2)]`  -> Returns rows where both conditions are true

In [324]:
# Display all info for games played in 1950 for Group 3
# This time we don't need .loc because we are applying a
# boolean mask and our indexing uses rows only (all
# columns are selected)
df[(df["Year"] == 1950) & (df["Stage"] == "Group 3")]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
56,1950,25 Jun 1950 - 15:00,Group 3,Pacaembu,Sao Paulo,Sweden,3,2,Italy,,36502.0,2,1,LUTZ Jean (SUI),BERANEK Alois (AUT),TEJADA Carlos (MEX),208,1219,SWE,ITA
61,1950,29 Jun 1950 - 15:30,Group 3,Durival de Brito,Curitiba,Sweden,2,2,Paraguay,,7903.0,2,1,MITCHELL Robert (SCO),LEMESIC Leo (YUG),GARCIA Prudencio (USA),208,1228,SWE,PAR
65,1950,02 Jul 1950 - 15:00,Group 3,Pacaembu,Sao Paulo,Italy,2,0,Paraguay,,25811.0,1,0,ELLIS Arthur (ENG),GARCIA Prudencio (USA),DE LA SALLE Charles (FRA),208,1218,ITA,PAR


Let's repeat the command above, but this time display **only the attendance column** for the Group 3 games. 

In [325]:
# Print the 'Attendance' column for games played in 1950
# for Group 3
# This time we want to use df.loc instead of just
# df[boolean mask] in order to select certain rows AND
# certain columns
df.loc[(df['Year'] == 1950) & (df['Stage'] == 'Group 3'), 'Attendance']

56    36502.0
61     7903.0
65    25811.0
Name: Attendance, dtype: float64

Throughout the entire history of the World Cup as recorded in this dataset, **how many home games were played by the Netherlands**?

(Remember that you can use the `len()` built-in function to find the number of rows in a DataFrame.)

In [326]:
# Number of home games played by the Netherlands
# Here we are just using df[boolean mask] again
neth_home = len(df[df['Home Team Name'] == ('Netherlands')])
neth_home

32

**How many games were played by the Netherlands in total**?

In [327]:
# Number of games played by the Netherlands in total
# Conveniently we already saved neth_home as a variable
# so we just need to find the number of times they were
# the away team and sum them
len(df[df['Away Team Name']==('Netherlands')]) + neth_home

54

Next, let's try and figure out **how many games the USA played in the 2014 World Cup**.

In [328]:

# Number of games the USA played in the 2014 world cup

# Mask will return True or False for each row of df
usa_2014_mask = (
    # USA is home team OR away team
    (
        (df['Home Team Name'] == 'USA') |
        (df['Away Team Name'] == 'USA')
    ) &
    # AND year is 2014
    (df['Year'] == 2014)
)

# Filter df using mask and find its length
len(df[usa_2014_mask])

5

The Match ID is a unique match identiier for international games:
- set this to the index and inspect the head to verify changes.

In [359]:
df.set_index('MatchID', inplace = True)
df.head()

Unnamed: 0_level_0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,RoundID,Home Team Initials,Away Team Initials,Total Goals,High_Total_Score,Half-time Goals
MatchID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1096,1930,1930-07-13 15:00:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,4444.0,3,0,DOMINGO LOMBARDI,201,FRA,MEX,5,1,3
1090,1930,1930-07-13 15:00:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,18346.0,2,0,JOSE MACIAS,201,USA,BEL,3,0,2
1093,1930,1930-07-14 12:45:00,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,24059.0,2,0,ANIBAL TEJADA,201,YUG,BRA,3,0,2
1098,1930,1930-07-14 14:50:00,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,2549.0,1,0,ALBERTO WARNKEN,201,ROU,PER,4,0,1
1085,1930,1930-07-15 16:00:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,23409.0,0,0,GILBERTO REGO,201,ARG,FRA,1,0,0


Select data from Home Team Name to Half-time Away Goals for Match ID 1085. Use slicing here:

In [357]:
# your code here 

df.loc[1085, 'Home Team Name': 'Half-time Away Goals']

Home Team Name          Argentina
Home Team Goals                 1
Away Team Goals                 0
Away Team Name             France
Attendance                23409.0
Half-time Home Goals            0
Half-time Away Goals            0
Name: 1085, dtype: object

Reset the index back to integer index and move the 'MatchID' back to a column in the DataFrame.

In [358]:
# your code here

df.reset_index(inplace = True)

## Changing Data and Data Types, Dropping and Creating New Columns

For your analysis, you realized that you don't need information on Referee Assistants and Win Conditions. Remove these columns and modify the original dataframe in-place. Print the column list to verify that these changes have taken place.

In [329]:
# Your code here

df.drop(columns = ['Assistant 1', 'Assistant 2', 'Win conditions'], inplace=True)
print(df.columns)

Index(['Year', 'Datetime', 'Stage', 'Stadium', 'City', 'Home Team Name',
       'Home Team Goals', 'Away Team Goals', 'Away Team Name', 'Attendance',
       'Half-time Home Goals', 'Half-time Away Goals', 'Referee', 'RoundID',
       'MatchID', 'Home Team Initials', 'Away Team Initials'],
      dtype='object')


Check the data type of the DateTime column using the appropriate Series method:

In [330]:
# yor code here
df['Datetime'].dtype

dtype('O')

The dtype 'O' is the object data type. Here it is indicating that we have a column of strings. Now, convert this column to a datetime data type and make sure the changes are administered to the original dataframe `df`:
- note: the string formatting for datetimes in this column are not all exactly the same (i.e. they are mixed)
- use the 'mixed' option for the format keyword argument in the pandas conversion function.

In [331]:
# your code here 

df['Datetime'] = pd.to_datetime(df['Datetime'], format= 'mixed')
df['Datetime'].head()

0   1930-07-13 15:00:00
1   1930-07-13 15:00:00
2   1930-07-14 12:45:00
3   1930-07-14 14:50:00
4   1930-07-15 16:00:00
Name: Datetime, dtype: datetime64[ns]

In World Cup history, **how many matches had 5 goals or more in total**? First, create a new column `"Total Goals"`.

In [332]:
# Your code here

# Number of matches that had more than 5 goals in total

# New column created by summing the other two
# We don't need a loop because pandas will automatically create
# one and "broadcast" each value from the columns being summed
df['Total Goals'] = df['Home Team Goals'] + df['Away Team Goals']

Filter the dataset based on the `"Total Goals"` column and save it to a new dataframe `high_score_df`. Then get the number of matches satisfying the above condition.

In [333]:
# Your code here

high_score_df = df[df['Total Goals']>=5]
len(high_score_df)

147

In the dataframe `df`, create a new column `"High_Total_Score"` that has value 0 if the total number of goals in a match was less than 5 and 1 if the there were 5 or more goals scored in total.


In [334]:
# Your code here

df['High_Total_Score'] = (df['Total Goals'] >= 5).astype('int')

print(df['High_Total_Score'].head())

0    1
1    0
2    0
3    0
4    0
Name: High_Total_Score, dtype: int32


Now **create a new column `"Half-time Goals"`** in `df` that includes both home and away values.

In [335]:
# Create a new column 'Half-time Goals' in df
df['Half-time Goals'] = df['Half-time Home Goals'] + df['Half-time Away Goals']
df.columns

Index(['Year', 'Datetime', 'Stage', 'Stadium', 'City', 'Home Team Name',
       'Home Team Goals', 'Away Team Goals', 'Away Team Name', 'Attendance',
       'Half-time Home Goals', 'Half-time Away Goals', 'Referee', 'RoundID',
       'MatchID', 'Home Team Initials', 'Away Team Initials', 'Total Goals',
       'High_Total_Score', 'Half-time Goals'],
      dtype='object')

Select all records for that contain matches where North-Korea (Korea DPR) and South-Korea (Korea Republic) were involved. 

In [336]:
# Your code here

# just printing head here
df.loc[df['Home Team Name'].str.contains('Korea') | df['Away Team Name'].str.contains('Korea')].head()


Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,RoundID,MatchID,Home Team Initials,Away Team Initials,Total Goals,High_Total_Score,Half-time Goals
80,1954,1954-06-17 18:00:00,Group 2,Hardturm,Zurich,Hungary,9,0,Korea Republic,13000.0,4,0,VINCENTI Raymond (FRA),211,1294,HUN,KOR,9,1,4
88,1954,1954-06-20 17:00:00,Group 2,Charmilles,Geneva,Turkey,7,0,Korea Republic,4000.0,4,0,MARINO Esteban (URU),211,1304,TUR,KOR,7,1,4
171,1966,1966-07-12 19:30:00,Group 4,Ayresome Park,Middlesbrough,Soviet Union,3,0,Korea DPR,23006.0,2,0,GARDEAZABAL Juan (ESP),238,1710,URS,PRK,3,0,2
179,1966,1966-07-15 19:30:00,Group 4,Ayresome Park,Middlesbrough,Korea DPR,1,1,Chile,13792.0,0,1,KANDIL Aly Hussein (EGY),238,1609,PRK,CHI,2,0,1
187,1966,1966-07-19 19:30:00,Group 4,Ayresome Park,Middlesbrough,Korea DPR,1,0,Italy,17829.0,1,0,SCHWINTE Pierre (FRA),238,1679,PRK,ITA,1,0,1


## Calculating Statistics and Applying Functions

Calculate the average number of goals South Korea (Korea Republic) scored when it was the Home Team. Save this to a variable `mean_home_SK`. 

In [337]:
mean_home_SK = df.loc[df['Home Team Name'] == 'Korea Republic',  'Home Team Goals'].mean()
mean_home_SK

1.2857142857142858

Also estimate the average spread on this by calculating the standard deviation. Save the standard deviation to a variable `std_home_SK`.

In [338]:
std_home_SK = df.loc[df['Home Team Name'] == 'Korea Republic',  'Home Team Goals'].std(ddof=1)
std_home_SK

0.8254203058555569

What is the maximum number of goals that North Korea has scored when it has been away?

In [339]:
max_away_NK = df.loc[df['Away Team Name'] == 'Korea DPR',  'Away Team Goals'].max()
max_away_NK


3

Get the average attendance, home team goals, away team goals, and total goals scored over all home games for South Korea: 

In [340]:
# your code here
df.loc[df['Home Team Name'] == 'Korea Republic',  ['Attendance', 'Home Team Goals', 'Away Team Goals', 'Total Goals']].mean()

Attendance         43969.714286
Home Team Goals        1.285714
Away Team Goals        1.571429
Total Goals            2.857143
dtype: float64

Does South Korea's scoring indicate that it benefits strongly from home advantage?

Your answer here:

Get a list of the unique teams that South Korea has played when it has been on the road:

In [341]:
korea_df.loc[korea_df['Away Team Name'] == 'Korea Republic',  'Home Team Name'].unique()

array(['Hungary', 'Turkey', 'Argentina', 'Belgium', 'Spain', 'Germany',
       'Netherlands', 'Portugal', 'France', 'Switzerland', 'Nigeria',
       'Uruguay', 'Russia'], dtype=object)

Create a Series with the teams that South Korea has played against and the number of matches against each team in the dataset -- when South Korea has been away. Save this in a variable `away_match_teamcount`

In [342]:
away_match_teamcount = korea_df.loc[korea_df['Away Team Name'] == 'Korea Republic',  'Home Team Name'].value_counts()
away_match_teamcount

Home Team Name
Argentina      2
Belgium        2
Spain          2
Germany        2
Hungary        1
Turkey         1
Netherlands    1
Portugal       1
France         1
Switzerland    1
Nigeria        1
Uruguay        1
Russia         1
Name: count, dtype: int64

Get a list of the names of countries in the above Series for which the count is greater than or equal to 2:
- use Boolean masking and filtering
- get the names using a relevant Pandas series attribute

In [343]:
away_match_teamcount[away_match_teamcount >= 2].index

Index(['Argentina', 'Belgium', 'Spain', 'Germany'], dtype='object', name='Home Team Name')

Taking a look at the referee name, you realize that there are some cleaning tasks that you would like to execute:

In [344]:
korea_df['Referee'].head()

80       VINCENTI Raymond (FRA)
88         MARINO Esteban (URU)
171      GARDEAZABAL Juan (ESP)
179    KANDIL Aly Hussein (EGY)
187       SCHWINTE Pierre (FRA)
Name: Referee, dtype: object

We can see that the referee names are structured as FIRST NAME, Last name (COUNTRY CODE). Write a function that takes in an element of the above Series, removes the country code, and reverses the first and last name ordering. The function should return a string with first name and last name separated by a space.

- *Hint 1*: The character "(" separates the country code from the rest of the name.
- *Hint 2* Think about converting relevant elements of the rest of the name to a list, reordering elements of the list and then rejoining.

In [345]:
# Define a function to clean and reverse the name

def clean_and_reverse_name(name):

  # YOUR CODE HERE

  # Split the name and country code into separate elements of a list
  parts = name.split('(') # notice pattern -- country code begins with `(`
  name_without_country_code = parts[0].strip() # strip white space
  # splits name, reverse first and last name elements and recombines (assuming space as separator)
  reversed_name = ' '.join(name_without_country_code.split()[::-1]) # then rejoin

  # YOUR CODE ENDS HERE
  return reversed_name

Use an appropriate Pandas Series method to transform the referee names accordingly. Make this change is to the original dataFrame and check by printing the head:

In [346]:
df['Referee']= df['Referee'].map(clean_and_reverse_name)
df['Referee'].head()

0    Domingo LOMBARDI
1         Jose MACIAS
2       Anibal TEJADA
3     Alberto WARNKEN
4       Gilberto REGO
Name: Referee, dtype: object

Let's standardize the casing for the names:
- use an appropriate vectorized Series method to upper case the entire name and save this to the original dataframe.

In [347]:
# Your code here
df['Referee'] = df['Referee'].str.upper()
df.head()

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,RoundID,MatchID,Home Team Initials,Away Team Initials,Total Goals,High_Total_Score,Half-time Goals
0,1930,1930-07-13 15:00:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,4444.0,3,0,DOMINGO LOMBARDI,201,1096,FRA,MEX,5,1,3
1,1930,1930-07-13 15:00:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,18346.0,2,0,JOSE MACIAS,201,1090,USA,BEL,3,0,2
2,1930,1930-07-14 12:45:00,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,24059.0,2,0,ANIBAL TEJADA,201,1093,YUG,BRA,3,0,2
3,1930,1930-07-14 14:50:00,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,2549.0,1,0,ALBERTO WARNKEN,201,1098,ROU,PER,4,0,1
4,1930,1930-07-15 16:00:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,23409.0,0,0,GILBERTO REGO,201,1085,ARG,FRA,1,0,0


Save your modified Dataframe to csv file:
- call it 'modified.csv'

In [348]:
# your code here

df.to_csv('modified_fifa.csv')