<h1><center> PPOL 5203 Data Science I: Foundations <br><br> 
<font color='grey'> Writing/Loading and Previewing Data in Pandas<br><br>
Tiago Ventura</center></center> <h1> 

---

**In this Notebook we cover**

`Pandas` methods for: 

- Loading data 
- Saving data
- Data Conversion
- Previewing your Pandas DataFrame



## Setup

In this notebook, we will work with the [Fifa World Cup](https://www.kaggle.com/datasets/abecklas/fifa-world-cup/code?select=WorldCupMatches.csv) data set hosted on Kaggle. 

Download the data. Then: 

- Save in a folder you can access from this notebook
- Or save in the same folder of the notebook (your working directory)


In [1]:
# import modules
import pandas as pd
import numpy as np

## Data in and Data out in `Pandas`

In our class on file managements, we saw how to use connection managament tools in Python (`open()`, `close()`, `with()`) to load data stored locally into our Python environments. That process usually involved accessing a locally stored data row by row, and import the data a nested container (list or dictionary). 

Today, we will see the use of high-level functions from Pandas that facilitate the process of loading data into our Python environment. We will focus on data input and output using pandas, though there are numerous tools in other libraries to help with reading and writing data in various formats.

### `pandas` methods

`pandas` contains a variety of methods for reading in various data types.

|Format Type	|Data Description |	Reader |	Writer| Note |
|:------:|:------:|:------:|:------:|:------:| 
|text	|CSV	|`read_csv`	| `to_csv` |
|text	|JSON	|`read_json`	|`to_json`|
|text	|HTML	|`read_html`	|`to_html`|
|text	|Local clipboard	|`read_clipboard`	|`to_clipboard`|
|binary	|MS Excel	|`read_excel`	|`to_excel`| need the `xlwt` module
|binary	|OpenDocument	|`read_excel`	 |
|binary	|HDF5 Format	|`read_hdf`	|`to_hdf`|
|binary	|Feather Format	|`read_feather`	|`to_feather`|
|binary	|Parquet Format	|`read_parquet`	|`to_parquet`|
|binary	|Msgpack	|`read_msgpack`	|`to_msgpack`|
|binary	|Stata	|`read_stata`	|`to_stata`|
|binary	|SAS	|`read_sas`	 |
|binary	|Python Pickle Format	|`read_pickle`	|`to_pickle`|
|SQL	|SQL	|`read_sql`	|`to_sql`|
|SQL	|Google Big Query	|`read_gbq`	|`to_gbq`|

Read more about all the input/output methods [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).

### Data in with `pandas`

As you can see, the purposes of each function is intuitive. For example:

### `pandas.read_csv()`: to open flat files

In [30]:
# read a csv 
d = pd.read_csv("WorldCupMatches.csv")

### Data out with `pandas`

All the same methods provided to load, also exists for converting and writing (locally) Pandas Dataframes. 

For example:

In [34]:
# export as stata file
d.to_stata("worldcupmatches.dta",  version=118)

/var/folders/w9/2fgtgs657kzfrybbwnxy3t400000gp/T/ipykernel_84851/3669814853.py:2: InvalidColumnName: 
Not all pandas column names were valid Stata variable names.
The following replacements have been made:

    Home Team Name   ->   Home_Team_Name
    Home Team Goals   ->   Home_Team_Goals
    Away Team Goals   ->   Away_Team_Goals
    Away Team Name   ->   Away_Team_Name
    Win conditions   ->   Win_conditions
    Half-time Home Goals   ->   Half_time_Home_Goals
    Half-time Away Goals   ->   Half_time_Away_Goals
    Assistant 1   ->   Assistant_1
    Assistant 2   ->   Assistant_2
    Home Team Initials   ->   Home_Team_Initials
    Away Team Initials   ->   Away_Team_Initials

If this is not what you expect, please make sure you have Stata-compliant
column names in your DataFrame (strings only, max 32 characters, only
alphanumerics and underscores, no Stata reserved words)

  d.to_stata("worldcupmatches.dta",  version=118)


In [36]:
# load back again
d_state = pd.read_stata("worldcupmatches.dta")
d_state.head()

Unnamed: 0,index,Year,Datetime,Stage,Stadium,City,Home_Team_Name,Home_Team_Goals,Away_Team_Goals,Away_Team_Name,Win_conditions,Attendance,Half_time_Home_Goals,Half_time_Away_Goals,Referee,Assistant_1,Assistant_2,RoundID,MatchID,Home_Team_Initials,Away_Team_Initials
0,0,1930,13 Jul 1930 - 15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444.0,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1,1930,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,,18346.0,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL
2,2,1930,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,,24059.0,2,0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201,1093,YUG,BRA
3,3,1930,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,,2549.0,1,0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201,1098,ROU,PER
4,4,1930,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,,23409.0,0,0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201,1085,ARG,FRA


### JSON Data

JSON (short for JavaScript Object Notation) has become one of the most used data formats in Data Science. The main reason is that JSON are the primary way data gets transfered by HTTP request between web browsers and other applications. So we will see a lot of JSON data when querying APIs. 


Let's see an example of: 

- Saving a DataFrame as JSON
- Loading a JSON in your Python environments

In [56]:
import sys
d_wc = pd.read_csv("WorldCups.csv") 

# let's first see what a json looks like. It is a dictionary!
d_wc.to_json()

'{"Year":{"0":1930,"1":1934,"2":1938,"3":1950,"4":1954,"5":1958,"6":1962,"7":1966,"8":1970,"9":1974,"10":1978,"11":1982,"12":1986,"13":1990,"14":1994,"15":1998,"16":2002,"17":2006,"18":2010,"19":2014},"Country":{"0":"Uruguay","1":"Italy","2":"France","3":"Brazil","4":"Switzerland","5":"Sweden","6":"Chile","7":"England","8":"Mexico","9":"Germany","10":"Argentina","11":"Spain","12":"Mexico","13":"Italy","14":"USA","15":"France","16":"Korea\\/Japan","17":"Germany","18":"South Africa","19":"Brazil"},"Winner":{"0":"Uruguay","1":"Italy","2":"Italy","3":"Uruguay","4":"Germany FR","5":"Brazil","6":"Brazil","7":"England","8":"Brazil","9":"Germany FR","10":"Argentina","11":"Italy","12":"Argentina","13":"Germany FR","14":"Brazil","15":"France","16":"Brazil","17":"Italy","18":"Spain","19":"Germany"},"Runners-Up":{"0":"Argentina","1":"Czechoslovakia","2":"Hungary","3":"Brazil","4":"Hungary","5":"Sweden","6":"Czechoslovakia","7":"Germany FR","8":"Italy","9":"Netherlands","10":"Netherlands","11":"Ger

In [62]:
d_wc

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607
5,1958,Sweden,Brazil,Sweden,France,Germany FR,126,16,35,819.810
6,1962,Chile,Brazil,Czechoslovakia,Chile,Yugoslavia,89,16,32,893.172
7,1966,England,England,Germany FR,Portugal,Soviet Union,89,16,32,1.563.135
8,1970,Mexico,Brazil,Italy,Germany FR,Uruguay,95,16,32,1.603.975
9,1974,Germany,Germany FR,Netherlands,Poland,Brazil,97,16,38,1.865.753


In [53]:
# see dictionary here
d_wc.to_dict()

{'Year': {0: 1930,
  1: 1934,
  2: 1938,
  3: 1950,
  4: 1954,
  5: 1958,
  6: 1962,
  7: 1966,
  8: 1970,
  9: 1974,
  10: 1978,
  11: 1982,
  12: 1986,
  13: 1990,
  14: 1994,
  15: 1998,
  16: 2002,
  17: 2006,
  18: 2010,
  19: 2014},
 'Country': {0: 'Uruguay',
  1: 'Italy',
  2: 'France',
  3: 'Brazil',
  4: 'Switzerland',
  5: 'Sweden',
  6: 'Chile',
  7: 'England',
  8: 'Mexico',
  9: 'Germany',
  10: 'Argentina',
  11: 'Spain',
  12: 'Mexico',
  13: 'Italy',
  14: 'USA',
  15: 'France',
  16: 'Korea/Japan',
  17: 'Germany',
  18: 'South Africa',
  19: 'Brazil'},
 'Winner': {0: 'Uruguay',
  1: 'Italy',
  2: 'Italy',
  3: 'Uruguay',
  4: 'Germany FR',
  5: 'Brazil',
  6: 'Brazil',
  7: 'England',
  8: 'Brazil',
  9: 'Germany FR',
  10: 'Argentina',
  11: 'Italy',
  12: 'Argentina',
  13: 'Germany FR',
  14: 'Brazil',
  15: 'France',
  16: 'Brazil',
  17: 'Italy',
  18: 'Spain',
  19: 'Germany'},
 'Runners-Up': {0: 'Argentina',
  1: 'Czechoslovakia',
  2: 'Hungary',
  3: 'Brazil',

In [60]:
# save
d_wc.to_json("worldcup.json", orient="records")

'[{"Year":1930,"Country":"Uruguay","Winner":"Uruguay","Runners-Up":"Argentina","Third":"USA","Fourth":"Yugoslavia","GoalsScored":70,"QualifiedTeams":13,"MatchesPlayed":18,"Attendance":"590.549"},{"Year":1934,"Country":"Italy","Winner":"Italy","Runners-Up":"Czechoslovakia","Third":"Germany","Fourth":"Austria","GoalsScored":70,"QualifiedTeams":16,"MatchesPlayed":17,"Attendance":"363.000"},{"Year":1938,"Country":"France","Winner":"Italy","Runners-Up":"Hungary","Third":"Brazil","Fourth":"Sweden","GoalsScored":84,"QualifiedTeams":15,"MatchesPlayed":18,"Attendance":"375.700"},{"Year":1950,"Country":"Brazil","Winner":"Uruguay","Runners-Up":"Brazil","Third":"Sweden","Fourth":"Spain","GoalsScored":88,"QualifiedTeams":13,"MatchesPlayed":22,"Attendance":"1.045.246"},{"Year":1954,"Country":"Switzerland","Winner":"Germany FR","Runners-Up":"Hungary","Third":"Austria","Fourth":"Uruguay","GoalsScored":140,"QualifiedTeams":16,"MatchesPlayed":26,"Attendance":"768.607"},{"Year":1958,"Country":"Sweden","W

In [61]:
# load
pd.read_json("worldcup.json")

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607
5,1958,Sweden,Brazil,Sweden,France,Germany FR,126,16,35,819.810
6,1962,Chile,Brazil,Czechoslovakia,Chile,Yugoslavia,89,16,32,893.172
7,1966,England,England,Germany FR,Portugal,Soviet Union,89,16,32,1.563.135
8,1970,Mexico,Brazil,Italy,Germany FR,Uruguay,95,16,32,1.603.975
9,1974,Germany,Germany FR,Netherlands,Poland,Brazil,97,16,38,1.865.753


### Exploring Arguments 

`pandas` loading functions are highly customizable. For example, check the documentation of `pandas.read_csv()`

In [32]:
# asking for help
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers.readers:

read_csv(filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', *, sep: 'str | None | lib.NoDefault' = <no_default>, delimiter: 'str | None | lib.NoDefault' = None, header: "int | Sequence[int] | None | Literal['infer']" = 'infer', names: 'Sequence[Hashable] | None | lib.NoDefault' = <no_default>, index_col: 'IndexLabel | Literal[False] | None' = None, usecols=None, squeeze: 'bool | None' = None, prefix: 'str | lib.NoDefault' = <no_default>, mangle_dupe_cols: 'bool' = True, dtype: 'DtypeArg | None' = None, engine: 'CSVEngine | None' = None, converters=None, true_values=None, false_values=None, skipinitialspace: 'bool' = False, skiprows=None, skipfooter: 'int' = 0, nrows: 'int | None' = None, na_values=None, keep_default_na: 'bool' = True, na_filter: 'bool' = True, verbose: 'bool' = False, skip_blank_lines: 'bool' = True, parse_dates=None, infer_datetime_format: 'bool' = False, keep_date_col: 'bool' = F

In [64]:
pd.read_csv("WorldCups.csv", 
            sep = ",", # Separator in the data
            index_col="Year", # Set a variable to the index
            usecols = ["Year","Country", "Winner", "Runners-Up"], # Only request specific columns
            nrows = 3, # only read in n-rows of the data 
            na_values = "nan",
            parse_dates=True, # Parse all date features as datatime
            low_memory=True) # read the file in chunks for lower memory use (useful on large data)

Unnamed: 0_level_0,Country,Winner,Runners-Up
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1930-01-01,Uruguay,Uruguay,Argentina
1934-01-01,Italy,Italy,Czechoslovakia
1938-01-01,France,Italy,Hungary


### Data Type Conversion

Pandas also provides methods to convert your data frame in native Python Data structures. Those can be useful tool for accessing your dataframe in different format, for example, as a dictionary or a list. 



In [67]:
# to a dictionary
d_wc.to_dict()

{0: 1930,
 1: 1934,
 2: 1938,
 3: 1950,
 4: 1954,
 5: 1958,
 6: 1962,
 7: 1966,
 8: 1970,
 9: 1974,
 10: 1978,
 11: 1982,
 12: 1986,
 13: 1990,
 14: 1994,
 15: 1998,
 16: 2002,
 17: 2006,
 18: 2010,
 19: 2014}

In [71]:
# to a numpy array
d_wc.values[0:2]

array([[1930, 'Uruguay', 'Uruguay', 'Argentina', 'USA', 'Yugoslavia', 70,
        13, 18, '590.549'],
       [1934, 'Italy', 'Italy', 'Czechoslovakia', 'Germany', 'Austria',
        70, 16, 17, '363.000']], dtype=object)

In [78]:
# To a nested list (which is a method from numpy)
d_wc.values[0:2].tolist()

[[1930,
  'Uruguay',
  'Uruguay',
  'Argentina',
  'USA',
  'Yugoslavia',
  70,
  13,
  18,
  '590.549'],
 [1934,
  'Italy',
  'Italy',
  'Czechoslovakia',
  'Germany',
  'Austria',
  70,
  16,
  17,
  '363.000']]

### Previewing and Describing your data

You just loaded your first dataset in Python. Let's see some useful tools to preview you data. 

#### `pandas.head()` : print first n rows

In [13]:
d.head(5)

(852, 20)

#### `pandas.tail()` : print last n rows

In [14]:
d.tail(10)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
842,2014,01 Jul 2014 - 13:00,Round of 16,Arena de Sao Paulo,Sao Paulo,Argentina,1,0,Switzerland,Argentina win after extra time,63255.0,0,0,ERIKSSON Jonas (SWE),KLASENIUS Mathias (SWE),WARNMARK Daniel (SWE),255951,300186503,ARG,SUI
843,2014,01 Jul 2014 - 17:00,Round of 16,Arena Fonte Nova,Salvador,Belgium,2,1,USA,Belgium win after extra time,51227.0,0,0,HAIMOUDI Djamel (ALG),ACHIK Redouane (MAR),ETCHIALI Abdelhak (ALG),255951,300186497,BEL,USA
844,2014,04 Jul 2014 - 13:00,Quarter-finals,Estadio do Maracana,Rio De Janeiro,France,0,1,Germany,,74240.0,0,1,PITANA Nestor (ARG),MAIDANA Hernan (ARG),BELATTI Juan Pablo (ARG),255953,300186485,FRA,GER
845,2014,04 Jul 2014 - 17:00,Quarter-finals,Estadio Castelao,Fortaleza,Brazil,2,1,Colombia,,60342.0,1,0,Carlos VELASCO CARBALLO (ESP),ALONSO FERNANDEZ Roberto (ESP),YUSTE Juan (ESP),255953,300186461,BRA,COL
846,2014,05 Jul 2014 - 13:00,Quarter-finals,Estadio Nacional,Brasilia,Argentina,1,0,Belgium,,68551.0,1,0,Nicola RIZZOLI (ITA),Renato FAVERANI (ITA),Andrea STEFANI (ITA),255953,300186504,ARG,BEL
847,2014,05 Jul 2014 - 17:00,Quarter-finals,Arena Fonte Nova,Salvador,Netherlands,0,0,Costa Rica,Netherlands win on penalties (4 - 3),51179.0,0,0,Ravshan IRMATOV (UZB),RASULOV Abduxamidullo (UZB),KOCHKAROV Bakhadyr (KGZ),255953,300186488,NED,CRC
848,2014,08 Jul 2014 - 17:00,Semi-finals,Estadio Mineirao,Belo Horizonte,Brazil,1,7,Germany,,58141.0,0,5,RODRIGUEZ Marco (MEX),TORRENTERA Marvin (MEX),QUINTERO Marcos (MEX),255955,300186474,BRA,GER
849,2014,09 Jul 2014 - 17:00,Semi-finals,Arena de Sao Paulo,Sao Paulo,Netherlands,0,0,Argentina,Argentina win on penalties (2 - 4),63267.0,0,0,C�neyt �AKIR (TUR),DURAN Bahattin (TUR),ONGUN Tarik (TUR),255955,300186490,NED,ARG
850,2014,12 Jul 2014 - 17:00,Play-off for third place,Estadio Nacional,Brasilia,Brazil,0,3,Netherlands,,68034.0,0,2,HAIMOUDI Djamel (ALG),ACHIK Redouane (MAR),ETCHIALI Abdelhak (ALG),255957,300186502,BRA,NED
851,2014,13 Jul 2014 - 16:00,Final,Estadio do Maracana,Rio De Janeiro,Germany,1,0,Argentina,Germany win after extra time,74738.0,0,0,Nicola RIZZOLI (ITA),Renato FAVERANI (ITA),Andrea STEFANI (ITA),255959,300186501,GER,ARG


Print the entire data without truncation.

In [16]:
pd.set_option('display.max_columns', None) 
d.head(5)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930,13 Jul 1930 - 15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444.0,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1930,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,,18346.0,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL
2,1930,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,,24059.0,2,0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201,1093,YUG,BRA
3,1930,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,,2549.0,1,0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201,1098,ROU,PER
4,1930,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,,23409.0,0,0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201,1085,ARG,FRA


#### `pandas.sample()` : get a sample

In [17]:
d.sample(5)

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
114,1958,11 Jun 1958 - 19:00,Group 1,Olympia Stadium,Helsingborg,Germany FR,2,2,Czechoslovakia,,25000.0,0,2,ELLIS Arthur (ENG),LEAFE Reginald (ENG),SEIPELT Fritz (AUT),220,1391,FRG,TCH
429,1990,15 Jun 1990 - 21:00,Group D,Giuseppe Meazza,Milan,Germany FR,5,1,"rn"">United Arab Emirates",,71169.0,2,0,SPIRIN Alexey (RUS),TAKADA Shizuo (JPN),PAIRETTO Pierluigi (ITA),322,198,FRG,UAE
788,2014,17 Jun 2014 - 18:00,Group H,Arena Pantanal,Cuiaba,Russia,1,1,Korea Republic,,37603.0,0,0,PITANA Nestor (ARG),MAIDANA Hernan (ARG),BELATTI Juan Pablo (ARG),255931,300186499,RUS,KOR
280,1978,06 Jun 1978 - 16:45,Group 2,Estadio Ol�mpico Chateau Carreras,Cordoba,Germany FR,6,0,Mexico,,35258.0,4,0,BOUZO Farouk (SYR),GARRIDO Antonio (POR),RION Francis (BEL),278,2350,FRG,MEX
598,2002,06 Jun 2002 - 18:00,Group E,Saitama Stadium 2002,Saitama,Cameroon,1,0,Saudi Arabia,,52328.0,0,0,HAUGE Terje (NOR),VAN NYLEN Roland (BEL),WIERZBOWSKI Maciej (POL),43950100,43950019,CMR,KSA


#### `pandas.info()` : Prints information about a DataFrame


In [None]:
d.info()

#### `pandas.dtypes` : Atttributed to see data types


In [19]:
d.dtypes

Year                      int64
Datetime                 object
Stage                    object
Stadium                  object
City                     object
Home Team Name           object
Home Team Goals           int64
Away Team Goals           int64
Away Team Name           object
Win conditions           object
Attendance              float64
Half-time Home Goals      int64
Half-time Away Goals      int64
Referee                  object
Assistant 1              object
Assistant 2              object
RoundID                   int64
MatchID                   int64
Home Team Initials       object
Away Team Initials       object
dtype: object

#### `pandas.describe()` : Summarize all the columns


In [29]:
d.describe()

Unnamed: 0,Year,Home Team Goals,Away Team Goals,Attendance,Half-time Home Goals,Half-time Away Goals,RoundID,MatchID
count,852.0,852.0,852.0,850.0,852.0,852.0,852.0,852.0
mean,1985.089202,1.811033,1.0223,45164.8,0.70892,0.428404,10661770.0,61346870.0
std,22.448825,1.610255,1.087573,23485.249247,0.937414,0.691252,27296130.0,111057200.0
min,1930.0,0.0,0.0,2000.0,0.0,0.0,201.0,25.0
25%,1970.0,1.0,0.0,30000.0,0.0,0.0,262.0,1188.75
50%,1990.0,2.0,1.0,41579.5,0.0,0.0,337.0,2191.0
75%,2002.0,3.0,2.0,61374.5,1.0,1.0,249722.0,43950060.0
max,2014.0,10.0,7.0,173850.0,6.0,5.0,97410600.0,300186500.0
