
# Accessing Data within Pandas - Lab

## Introduction

In this lab, we'll look at a data set which contains information World cup matches. Let's use the pandas commands learned in the previous lecture to learn more about our data!

## Objectives
You will be able to:
* Understand and explain some key Pandas methods
* Access DataFrame data by using the label
* Perform boolean indexing on both Series and DataFrames
* Use simple selectors for series
* Set new Series and DataFrame inputs

## Load the data

Load the file `WorldCupMatches.csv` as a dataframe in Pandas

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('WorldCupMatches.csv')
df = pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4572 entries, 0 to 4571
Data columns (total 20 columns):
Year                    852 non-null float64
Datetime                852 non-null object
Stage                   852 non-null object
Stadium                 852 non-null object
City                    852 non-null object
Home Team Name          852 non-null object
Home Team Goals         852 non-null float64
Away Team Goals         852 non-null float64
Away Team Name          852 non-null object
Win conditions          852 non-null object
Attendance              850 non-null float64
Half-time Home Goals    852 non-null float64
Half-time Away Goals    852 non-null float64
Referee                 852 non-null object
Assistant 1             852 non-null object
Assistant 2             852 non-null object
RoundID                 852 non-null float64
MatchID                 852 non-null float64
Home Team Initials      852 non-null object
Away Team Initials      852 non-null object
dtype

## Common methods and attributes

Use the correct method to look at the first 7 rows of the data set.

In [3]:
df.iloc[0:7]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930.0,13 Jul 1930 - 15:00,Group 1,Pocitos,Montevideo,France,4.0,1.0,Mexico,,4444.0,3.0,0.0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201.0,1096.0,FRA,MEX
1,1930.0,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3.0,0.0,Belgium,,18346.0,2.0,0.0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201.0,1090.0,USA,BEL
2,1930.0,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2.0,1.0,Brazil,,24059.0,2.0,0.0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201.0,1093.0,YUG,BRA
3,1930.0,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3.0,1.0,Peru,,2549.0,1.0,0.0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201.0,1098.0,ROU,PER
4,1930.0,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1.0,0.0,France,,23409.0,0.0,0.0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201.0,1085.0,ARG,FRA
5,1930.0,16 Jul 1930 - 14:45,Group 1,Parque Central,Montevideo,Chile,3.0,0.0,Mexico,,9249.0,1.0,0.0,CRISTOPHE Henry (BEL),APHESTEGUY Martin (URU),LANGENUS Jean (BEL),201.0,1095.0,CHI,MEX
6,1930.0,17 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,4.0,0.0,Bolivia,,18306.0,0.0,0.0,MATEUCCI Francisco (URU),LOMBARDI Domingo (URU),WARNKEN Alberto (CHI),201.0,1092.0,YUG,BOL


Look at the last 3 rows of the data set.

In [4]:
df.tail(3)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
4569,,,,,,,,,,,,,,,,,,,,
4570,,,,,,,,,,,,,,,,,,,,
4571,,,,,,,,,,,,,,,,,,,,


Get a concise summary of your data using `.info()`

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4572 entries, 0 to 4571
Data columns (total 20 columns):
Year                    852 non-null float64
Datetime                852 non-null object
Stage                   852 non-null object
Stadium                 852 non-null object
City                    852 non-null object
Home Team Name          852 non-null object
Home Team Goals         852 non-null float64
Away Team Goals         852 non-null float64
Away Team Name          852 non-null object
Win conditions          852 non-null object
Attendance              850 non-null float64
Half-time Home Goals    852 non-null float64
Half-time Away Goals    852 non-null float64
Referee                 852 non-null object
Assistant 1             852 non-null object
Assistant 2             852 non-null object
RoundID                 852 non-null float64
MatchID                 852 non-null float64
Home Team Initials      852 non-null object
Away Team Initials      852 non-null object
dtype

Obtain a tuple representing the number of rows and number of columns

In [6]:
df.shape

(4572, 20)

Use the appropriate attribute to get the column names

In [7]:
df.columns

Index(['Year', 'Datetime', 'Stage', 'Stadium', 'City', 'Home Team Name',
       'Home Team Goals', 'Away Team Goals', 'Away Team Name',
       'Win conditions', 'Attendance', 'Half-time Home Goals',
       'Half-time Away Goals', 'Referee', 'Assistant 1', 'Assistant 2',
       'RoundID', 'MatchID', 'Home Team Initials', 'Away Team Initials'],
      dtype='object')

## Selecting dataframe information

When looking at the dataframe's `.head()`, you might have noticed that the games are structured chronologically in the dataframe.

Use the right selection method to print all the information from the 3rd to the 5th game.

In [8]:
df.iloc[2:5]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
2,1930.0,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2.0,1.0,Brazil,,24059.0,2.0,0.0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201.0,1093.0,YUG,BRA
3,1930.0,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3.0,1.0,Peru,,2549.0,1.0,0.0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201.0,1098.0,ROU,PER
4,1930.0,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1.0,0.0,France,,23409.0,0.0,0.0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201.0,1085.0,ARG,FRA


Now, print all the info from game 5-9, but we're only interested to print out the "Home Team Name" and the "Away Team Name", 

In [9]:
df.loc[4:9,["Home Team Name","Away Team Name"]]

Unnamed: 0,Home Team Name,Away Team Name
4,Argentina,France
5,Chile,Mexico
6,Yugoslavia,Bolivia
7,USA,Paraguay
8,Uruguay,Peru
9,Chile,France


Next, we'd like the information on all the games played in Group 3 for the 1950 World Cup.

In [10]:
df.loc[(df['Stage']=='Group 3')&(df['Year']==1950.0)]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
56,1950.0,25 Jun 1950 - 15:00,Group 3,Pacaembu,Sao Paulo,Sweden,3.0,2.0,Italy,,36502.0,2.0,1.0,LUTZ Jean (SUI),BERANEK Alois (AUT),TEJADA Carlos (MEX),208.0,1219.0,SWE,ITA
61,1950.0,29 Jun 1950 - 15:30,Group 3,Durival de Brito,Curitiba,Sweden,2.0,2.0,Paraguay,,7903.0,2.0,1.0,MITCHELL Robert (SCO),LEMESIC Leo (YUG),GARCIA Prudencio (USA),208.0,1228.0,SWE,PAR
65,1950.0,02 Jul 1950 - 15:00,Group 3,Pacaembu,Sao Paulo,Italy,2.0,0.0,Paraguay,,25811.0,1.0,0.0,ELLIS Arthur (ENG),GARCIA Prudencio (USA),DE LA SALLE Charles (FRA),208.0,1218.0,ITA,PAR


Let's repeat the command above, but now we only want to print out the attendance colum for the Group 3 games

You can combine conditions like this:

`df[(condition1) | (condition2)]`  -> Returns rows where either condition is true

`df[(condition1) & (condition2)]`  -> Returns rows where both conditions are true

In [11]:
df.loc[(df['Stage']=='Group 3')&(df['Year']==1950.0),['Attendance']]

Unnamed: 0,Attendance
56,36502.0
61,7903.0
65,25811.0


Throughout the entire history of the world cup, How many Home games were played by the Netherlands?

In [49]:
df.loc[df['Home Team Name']=='Netherlands']

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,...,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials,Played in ND,ND Played
244,1974.0,19 Jun 1974 - 19:30,Group 3,Westfalenstadion,Dortmund,Netherlands,0.0,0.0,Sweden,,...,0.0,WINSEMANN Werner (CAN),TSCHENSCHER Kurt (GER),THOMAS Clive (WAL),262.0,2097.0,NED,SWE,ND,Yes
258,1974.0,26 Jun 1974 - 19:30,Group A,Parkstadion,Gelsenkirchen,Netherlands,4.0,0.0,Argentina,,...,0.0,DAVIDSON Bob (SCO),TSCHENSCHER Kurt (GER),KAZAKOV Pavel (URS),263.0,1948.0,NED,ARG,ND,Yes
265,1974.0,03 Jul 1974 - 19:30,Group A,Westfalenstadion,Dortmund,Netherlands,2.0,0.0,Brazil,,...,0.0,TSCHENSCHER Kurt (GER),DAVIDSON Bob (SCO),SUPPIAH George (SIN),263.0,1983.0,NED,BRA,ND,Yes
269,1974.0,07 July 1974 - 16:00,Final,Olympiastadion,Munich,Netherlands,1.0,2.0,Germany FR,,...,2.0,TAYLOR John (ENG),GONZALEZ ARCHUNDIA Alfonso (MEX),BARRETO RUIZ Ramon (URU),605.0,2063.0,NED,FRG,ND,Yes
277,1978.0,03 Jun 1978 - 16:45,Group 4,San Martin,Mendoza,Netherlands,3.0,0.0,IR Iran,,...,0.0,GONZALEZ ARCHUNDIA Alfonso (MEX),WURTZ Robert (FRA),COMESANA Miguel (ARG),278.0,2388.0,NED,IRN,ND,Yes
285,1978.0,07 Jun 1978 - 16:45,Group 4,San Martin,Mendoza,Netherlands,0.0,0.0,Peru,,...,0.0,PROKOP Adolf (GDR),COEREZZA Norberto Angel (ARG),IVANOV Anatoly (URS),278.0,2394.0,NED,PER,ND,Yes
295,1978.0,14 Jun 1978 - 13:45,Group A,Estadio Ol�mpico Chateau Carreras,Cordoba,Netherlands,5.0,1.0,Austria,,...,0.0,GORDON John (SCO),ITHURRALDE Arturo Andres (ARG),BOUZO Farouk (SYR),279.0,2220.0,NED,AUT,ND,Yes
302,1978.0,21 Jun 1978 - 13:45,Group A,El Monumental - Estadio Monumental Antonio Ves...,Buenos Aires,Netherlands,2.0,1.0,Italy,,...,1.0,MARTINEZ Angel (ESP),PESTARINO Luis (ARG),OROZCO GUERRERO Cesar (PER),279.0,2391.0,NED,ITA,ND,Yes
422,1990.0,12 Jun 1990 - 21:00,Group F,Della Favorita,Palermo,Netherlands,1.0,1.0,Egypt,,...,0.0,SORIANO ALADREN Emilio (ESP),CODESAL MENDEZ Edgardo (MEX),CARDELLINO DE SAN VICENTE Juan (URU),322.0,151.0,NED,EGY,ND,Yes
472,1994.0,20 Jun 1994 - 19:30,Group F,RFK Stadium,Washington Dc,Netherlands,2.0,1.0,Saudi Arabia,,...,1.0,DIAZ VEGA Manuel (ESP),IVANOV Valentin (RUS),MARTON Sandor (HUN),337.0,3058.0,NED,KSA,ND,Yes


How many games were playes by the Netherlands in total?

In [50]:
df.loc[(df['Home Team Name']=='Netherlands')| (df['Away Team Name']=='Netherlands')]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,...,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials,Played in ND,ND Played
20,1934.0,27 May 1934 - 16:30,Preliminary round,San Siro,Milan,Switzerland,3.0,2.0,Netherlands,,...,1.0,EKLIND Ivan (SWE),BERANEK Alois (AUT),BONIVENTO Ferruccio (ITA),204.0,1133.0,SUI,NED,ND,
41,1938.0,05 Jun 1938 - 18:30,First round,Cavee Verte,Le Havre,Czechoslovakia,3.0,0.0,Netherlands,Czechoslovakia win after extra time,...,0.0,LECLERCQ Lucien (FRA),OLIVE D. (FRA),SDEZ Victor (FRA),206.0,1172.0,TCH,NED,ND,
236,1974.0,15 Jun 1974 - 16:00,Group 3,Niedersachsenstadion,Hanover,Uruguay,0.0,2.0,Netherlands,,...,1.0,PALOTAI Karoly (HUN),KAZAKOV Pavel (URS),RAINEA Nicolae (ROU),262.0,2098.0,URU,NED,ND,
244,1974.0,19 Jun 1974 - 19:30,Group 3,Westfalenstadion,Dortmund,Netherlands,0.0,0.0,Sweden,,...,0.0,WINSEMANN Werner (CAN),TSCHENSCHER Kurt (GER),THOMAS Clive (WAL),262.0,2097.0,NED,SWE,ND,Yes
252,1974.0,23 Jun 1974 - 16:00,Group 3,Westfalenstadion,Dortmund,Bulgaria,1.0,4.0,Netherlands,,...,2.0,BOSKOVIC Tony (AUS),BIWERSI Ferdinand (GER),ESCHWEILER Walter (GER),262.0,1990.0,BUL,NED,ND,
258,1974.0,26 Jun 1974 - 19:30,Group A,Parkstadion,Gelsenkirchen,Netherlands,4.0,0.0,Argentina,,...,0.0,DAVIDSON Bob (SCO),TSCHENSCHER Kurt (GER),KAZAKOV Pavel (URS),263.0,1948.0,NED,ARG,ND,Yes
262,1974.0,30 Jun 1974 - 16:00,Group A,Parkstadion,Gelsenkirchen,German DR,0.0,2.0,Netherlands,,...,1.0,SCHEURER Ruedi (SUI),LINEMAYR Erich (AUT),DELGADO Omar (COL),263.0,2067.0,GDR,NED,ND,
265,1974.0,03 Jul 1974 - 19:30,Group A,Westfalenstadion,Dortmund,Netherlands,2.0,0.0,Brazil,,...,0.0,TSCHENSCHER Kurt (GER),DAVIDSON Bob (SCO),SUPPIAH George (SIN),263.0,1983.0,NED,BRA,ND,Yes
269,1974.0,07 July 1974 - 16:00,Final,Olympiastadion,Munich,Netherlands,1.0,2.0,Germany FR,,...,2.0,TAYLOR John (ENG),GONZALEZ ARCHUNDIA Alfonso (MEX),BARRETO RUIZ Ramon (URU),605.0,2063.0,NED,FRG,ND,Yes
277,1978.0,03 Jun 1978 - 16:45,Group 4,San Martin,Mendoza,Netherlands,3.0,0.0,IR Iran,,...,0.0,GONZALEZ ARCHUNDIA Alfonso (MEX),WURTZ Robert (FRA),COMESANA Miguel (ARG),278.0,2388.0,NED,IRN,ND,Yes


Next, let's try and figure out how many games the USA played in the 2014 world cup. 

In [51]:
df.loc[(df['Year']== 2014.0)& (df['Away Team Name']=='USA')|(df['Home Team Name']=='USA') ]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,...,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials,Played in ND,ND Played
1,1930.0,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3.0,0.0,Belgium,,...,0.0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201.0,1090.0,USA,BEL,ND,
7,1930.0,17 Jul 1930 - 14:45,Group 4,Parque Central,Montevideo,USA,3.0,0.0,Paraguay,,...,0.0,MACIAS Jose (ARG),APHESTEGUY Martin (URU),TEJADA Anibal (URU),201.0,1097.0,USA,PAR,ND,
62,1950.0,29 Jun 1950 - 15:00,Group 2,Independencia,Belo Horizonte,USA,1.0,0.0,England,,...,0.0,DATTILO Generoso (ITA),DE LA SALLE Charles (FRA),GALEATI Giovanni (ITA),208.0,1202.0,USA,ENG,ND,
416,1990.0,10 Jun 1990 - 17:00,Group A,Comunale,Florence,USA,1.0,5.0,Czechoslovakia,,...,2.0,ROETHLISBERGER Kurt (SUI),VAN LANGENHOVE Marcel (BEL),SCHMIDHUBER Aron (GER),322.0,355.0,USA,TCH,ND,
466,1994.0,18 Jun 1994 - 11:30,Group A,Pontiac Silverdome,Detroit,USA,1.0,1.0,Switzerland,,...,1.0,LAMOLINA Francisco Oscar (ARG),TAIBI Ernesto (ARG),ZARATE Venancio (PAR),337.0,3051.0,USA,SUI,ND,
478,1994.0,22 Jun 1994 - 19:30,Group A,Rose Bowl,Los Angeles,USA,2.0,1.0,Colombia,,...,0.0,BALDAS Fabio (ITA),RAMICONE Domenico (ITA),RHARIB El Jilali Mohamed (MAR),337.0,3063.0,USA,COL,ND,
488,1994.0,26 Jun 1994 - 16:00,Group A,Rose Bowl,Los Angeles,USA,0.0,1.0,Romania,,...,1.0,VAN DER ENDE Mario (NED),DOLSTRA Jan (NED),DUNSTER Gordon (AUS),337.0,3073.0,USA,ROU,ND,
545,1998.0,21 Jun 1998 - 21:00,Group F,Stade de Gerland,Lyon,USA,1.0,2.0,Iran,,...,1.0,MEIER Urs (SUI),RAUSIS Laurent (SUI),GRIGORESCU Nicolae (ROU),1014.0,8754.0,USA,IRN,ND,
559,1998.0,25 Jun 1998 - 21:00,Group F,La Beaujoire,Nantes,USA,0.0,1.0,Yugoslavia,,...,1.0,EL GHANDOUR Gamal (EGY),SALIE Achmat (RSA),WARREN Mark (ENG),1014.0,8768.0,USA,YUG,ND,
595,2002.0,05 Jun 2002 - 18:00,Group D,Suwon World Cup Stadium,Suwon,USA,3.0,2.0,Portugal,,...,1.0,MORENO Byron (ECU),FIERRO Bomer (ECU),HASSOUNEH Awni (JOR),43950100.0,43950016.0,USA,POR,ND,


Now, let's try to find out how many countries participated in the 1986 world cup.

Hint 1: as a first step, create a new data set that only contain games in that year.

Hint 2: You can use `.unique()` to make sure you don't end up with duplicate country names.

In [55]:
Eightysix = df.loc[df['Year'] == 1986.0]

In [76]:
countries = []
for country in Eightysix['Away Team Name']:
    if country in countries:
        continue
    else: 
        countries.append(country)
for country in Eightysix['Home Team Name']:
    if country in countries:
        continue
    else: 
        countries.append(country)
print(set(countries),len(countries))

{'Belgium', 'England', 'Canada', 'Paraguay', 'Mexico', 'Portugal', 'Poland', 'Korea Republic', 'Argentina', 'Hungary', 'France', 'Spain', 'Soviet Union', 'Iraq', 'Germany FR', 'Brazil', 'Uruguay', 'Northern Ireland', 'Italy', 'Morocco', 'Denmark', 'Algeria', 'Scotland', 'Bulgaria'} 24


In the world cup history, how matches had more than 5 goals in total?

In [77]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4572 entries, 0 to 4571
Data columns (total 22 columns):
Year                    852 non-null float64
Datetime                852 non-null object
Stage                   852 non-null object
Stadium                 852 non-null object
City                    852 non-null object
Home Team Name          852 non-null object
Home Team Goals         852 non-null float64
Away Team Goals         852 non-null float64
Away Team Name          852 non-null object
Win conditions          852 non-null object
Attendance              850 non-null float64
Half-time Home Goals    852 non-null float64
Half-time Away Goals    852 non-null float64
Referee                 852 non-null object
Assistant 1             852 non-null object
Assistant 2             852 non-null object
RoundID                 852 non-null float64
MatchID                 852 non-null float64
Home Team Initials      852 non-null object
Away Team Initials      852 non-null object
Playe

In [85]:
df["goals"]= (df["Home Team Goals"] + df["Away Team Goals"])
df.loc[df['goals']>5]

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,...,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials,Played in ND,ND Played,goals
10,1930.0,19 Jul 1930 - 15:00,Group 1,Estadio Centenario,Montevideo,Argentina,6.0,3.0,Mexico,,...,SAUCEDO Ulises (BOL),ALONSO Gualberto (URU),RADULESCU Constantin (ROU),201.0,1086.0,ARG,MEX,ND,,9.0
15,1930.0,26 Jul 1930 - 14:45,Semi-finals,Estadio Centenario,Montevideo,Argentina,6.0,1.0,USA,,...,LANGENUS Jean (BEL),VALLEJO Gaspar (MEX),WARNKEN Alberto (CHI),202.0,1088.0,ARG,USA,ND,,7.0
16,1930.0,27 Jul 1930 - 14:45,Semi-finals,Estadio Centenario,Montevideo,Uruguay,6.0,1.0,Yugoslavia,,...,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),BALWAY Thomas (FRA),202.0,1101.0,URU,YUG,ND,,7.0
17,1930.0,30 Jul 1930 - 14:15,Final,Estadio Centenario,Montevideo,Uruguay,4.0,2.0,Argentina,,...,LANGENUS Jean (BEL),SAUCEDO Ulises (BOL),CRISTOPHE Henry (BEL),405.0,1087.0,URU,ARG,ND,,6.0
19,1934.0,27 May 1934 - 16:30,Preliminary round,Giorgio Ascarelli,Naples,Hungary,4.0,2.0,Egypt,,...,BARLASSINA Rinaldo (ITA),DATTILO Generoso (ITA),SASSI Otello (ITA),204.0,1119.0,HUN,EGY,ND,,6.0
22,1934.0,27 May 1934 - 16:30,Preliminary round,Giovanni Berta,Florence,Germany,5.0,2.0,Belgium,,...,MATTEA Francesco (ITA),MELANDRI Ermenegildo (ITA),BAERT Jacques (FRA),204.0,1108.0,GER,BEL,ND,,7.0
24,1934.0,27 May 1934 - 16:30,Preliminary round,Nazionale PNF,Rome,Italy,7.0,1.0,USA,,...,MERCET Rene (SUI),ESCARTIN Pedro (ESP),ZENISEK Bohumil (TCH),204.0,1135.0,ITA,USA,ND,,8.0
36,1938.0,05 Jun 1938 - 17:00,First round,Velodrome Municipale,Reims,Hungary,6.0,0.0,Dutch East Indies,,...,CONRIE Roger (FRA),DE LA SALLE Charles (FRA),WEINGARTNER Karl (AUT),206.0,1173.0,HUN,INH,ND,,6.0
38,1938.0,05 Jun 1938 - 17:00,First round,Stade Municipal,Toulouse,Cuba,3.0,3.0,Romania,,...,SCARPI Giuseppe (ITA),VALPREDE Ferdinand (FRA),MERKCX Jean (FRA),206.0,1156.0,CUB,ROU,ND,,6.0
40,1938.0,05 Jun 1938 - 17:30,First round,Stade de la Meinau,Strasbourg,Brazil,6.0,5.0,Poland,Brazil win after extra time,...,EKLIND Ivan (SWE),POISSANT Louis (FRA),KISSENBERGER Ernest (FRA),206.0,1150.0,BRA,POL,ND,,11.0


## Changing values and creating new columns

With the information you currently have in your `df`, create a new column "Half-time Goals".

In [86]:
df["Half-time Goals"]= (df["Half-time Home Goals"] + df["Half-time Away Goals"])
df['Half-time Goals']

0       3.0
1       2.0
2       2.0
3       1.0
4       0.0
5       1.0
6       0.0
7       2.0
8       0.0
9       0.0
10      4.0
11      1.0
12      1.0
13      4.0
14      3.0
15      1.0
16      4.0
17      3.0
18      0.0
19      4.0
20      3.0
21      2.0
22      3.0
23      3.0
24      3.0
25      1.0
26      2.0
27      0.0
28      0.0
29      1.0
       ... 
4542    NaN
4543    NaN
4544    NaN
4545    NaN
4546    NaN
4547    NaN
4548    NaN
4549    NaN
4550    NaN
4551    NaN
4552    NaN
4553    NaN
4554    NaN
4555    NaN
4556    NaN
4557    NaN
4558    NaN
4559    NaN
4560    NaN
4561    NaN
4562    NaN
4563    NaN
4564    NaN
4565    NaN
4566    NaN
4567    NaN
4568    NaN
4569    NaN
4570    NaN
4571    NaN
Name: Half-time Goals, Length: 4572, dtype: float64

Run the code below. You'll notice that for Korea, there are records for both North-Korea (Korea DPR) and South-Korea (Korea Republic). 

In [88]:
df.loc[df["Home Team Name"].str.contains('Korea',na=False), "Home Team Name" ]

179         Korea DPR
187         Korea DPR
374    Korea Republic
386    Korea Republic
434    Korea Republic
444    Korea Republic
480    Korea Republic
524    Korea Republic
593    Korea Republic
609    Korea Republic
635    Korea Republic
642    Korea Republic
655    Korea Republic
710    Korea Republic
753         Korea DPR
802    Korea Republic
818    Korea Republic
Name: Home Team Name, dtype: object

Imagine that for some reason, we simply want Korea listed as one entry, so we want to replace every "Home Team Name" and "Away Team Name" entry that contains "Korea" to simply "Korea". In the same way, we want to change the columns "Home Team Initials" and "Away Team Initials" to NSK (North & South Korea) instead of "KOR" and "PRK". 

In [94]:
df.loc[df["Home Team Name"].str.contains('Korea',na=False), "Home Team Name"] = "Korea"
df.loc[df["Away Team Name"].str.contains('Korea',na=False), "Away Team Name"] = "Korea"
df.loc[df["Home Team Initials"].str.contains(('KOR' or 'PRK'),na=False), "Home Team Initials"] = "NSK"
df.loc[df["Away Team Initials"].str.contains(('KOR' or 'PRK'),na=False), "Away Team Initials"] = "NSK"


Make sure to verify your answer!

In [100]:
df.loc[df["Home Team Initials"].str.contains(('NSK'),na=False), "Home Team Initials"]


179    NSK
187    NSK
374    NSK
386    NSK
434    NSK
444    NSK
480    NSK
524    NSK
593    NSK
609    NSK
635    NSK
642    NSK
655    NSK
710    NSK
753    NSK
802    NSK
818    NSK
Name: Home Team Initials, dtype: object