# Lecture 4: Data Preproccessing with Pandas

Instructor: Md Shahidullah Kawsar
<br>Data Scientist, IDARE, Houston, TX, USA

#### Objectives:
- How to extract new information from a column?
- How to create a column based on a condition or function?
- Removing a string from a column
- Checking the unique values for each column
- performing calculation in dataframe columns
- dataframe sorting
- dataframe slicing


#### References:
[1] Data Source: https://stats.espncricinfo.com/ci/content/records/223646.html
<br>[2] pandas split: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html
<br>[3] pandas concatenation: https://pandas.pydata.org/docs/reference/api/pandas.concat.html
<br>[4] pandas replace: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
<br>[5] pandas column rename: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html
<br>[6] pandas sorting: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html
<br>[7] pandas counting unique values: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html
<br>[8] pandas drop: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
<br>[9] pandas data type conversion: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html
<br>[10] **Self Study:** difference between .loc and .iloc: https://www.analyticsvidhya.com/blog/2020/02/loc-iloc-pandas/

In [465]:
import numpy as np
import pandas as pd

# display 100 rows of the dataframe
pd.options.display.max_rows = 100

#### Reading an excel file

In [466]:
df = pd.read_excel("test_cricket.xlsx", sheet_name='runs')

display(df.head())

Unnamed: 0,Player,Span,Mat,Inns,NO,Runs,HS,Ave,100,50,0
0,SR Tendulkar (INDIA),1989-2013,200,329,33,15921,248*,53.78,51,68,14
1,RT Ponting (AUS),1995-2012,168,287,29,13378,257,51.85,41,62,17
2,JH Kallis (ICC/SA),1995-2013,166,280,40,13289,224,55.37,45,58,16
3,R Dravid (ICC/INDIA),1996-2012,164,286,32,13288,270,52.31,36,63,8
4,AN Cook (ENG),2006-2018,161,291,16,12472,294,45.35,33,57,9


#### How can you extract the country information from the 'Player' column if thre were no "(" and ")" symbol?

In [467]:
file = df.copy()

file['Player'] = file['Player'].str.replace("(", "")
file['Player'] = file['Player'].str.replace(")", "")
file['Player'] = file['Player'].str.replace("\xa0", " ")

display(file.head())

Unnamed: 0,Player,Span,Mat,Inns,NO,Runs,HS,Ave,100,50,0
0,SR Tendulkar INDIA,1989-2013,200,329,33,15921,248*,53.78,51,68,14
1,RT Ponting AUS,1995-2012,168,287,29,13378,257,51.85,41,62,17
2,JH Kallis ICC/SA,1995-2013,166,280,40,13289,224,55.37,45,58,16
3,R Dravid ICC/INDIA,1996-2012,164,286,32,13288,270,52.31,36,63,8
4,AN Cook ENG,2006-2018,161,291,16,12472,294,45.35,33,57,9


In [468]:
file_player = file['Player'].str.split(" ")
file_player = pd.DataFrame(file_player)

file_player['Country'] = file_player['Player'].str[-1]

print(type(file_player))

display(file_player.head())

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Player,Country
0,"[SR, Tendulkar, INDIA]",INDIA
1,"[RT, Ponting, AUS]",AUS
2,"[JH, Kallis, ICC/SA]",ICC/SA
3,"[R, Dravid, ICC/INDIA]",ICC/INDIA
4,"[AN, Cook, ENG]",ENG


In [469]:
# print(file['Player'].values)

#### Codes from Lecture 2

In [470]:
# renaming the column names
df = df.rename(columns={'Mat':'Match', 
                        'Inns':'Innings',
                        'NO': 'NotOut',
                        'HS': 'Highest_score',
                        'Ave': 'Average',
                        100: 'Centuries',
                        50: 'Half_centuries',
                        0: 'Ducks'})

# splitting the 'Player' column to get the information about 'Country'
df_player = df['Player'].str.split("(", expand=True)

# concatinating 'Country' with the main dataframe
df = pd.concat([df, df_player], axis=1)

# dropping the 'Player' columns
df = df.drop('Player', axis=1)

# renaming the column names
df = df.rename(columns={0: 'Player',
                        1: 'Country'})

# remove the ")" from the 'Country' column
df['Country'] = df['Country'].str.replace(")", "")

# rearrange the columns
new_col_sequence = ['Player', 'Span', 'Match', 'Innings', 'NotOut', 'Runs', 'Highest_score',
                    'Average', 'Centuries', 'Half_centuries', 'Ducks', 'Country']
df = df[new_col_sequence]

display(df.head())

Unnamed: 0,Player,Span,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country
0,SR Tendulkar,1989-2013,200,329,33,15921,248*,53.78,51,68,14,INDIA
1,RT Ponting,1995-2012,168,287,29,13378,257,51.85,41,62,17,AUS
2,JH Kallis,1995-2013,166,280,40,13289,224,55.37,45,58,16,ICC/SA
3,R Dravid,1996-2012,164,286,32,13288,270,52.31,36,63,8,ICC/INDIA
4,AN Cook,2006-2018,161,291,16,12472,294,45.35,33,57,9,ENG


In [471]:
# df.columns

#### How to create a column based on a condition or function?

In [472]:
def icc_check(x):
    if "ICC" in x:
        return "Yes"
    else:
        return "No"

In [473]:
# def INDIA_check(x):
#     if "INDIA" in x:
#         return 1
#     else:
#         return 0

In [474]:
df['played_for_ICC'] = df['Country'].apply(icc_check)
# df['played_for_INDIA'] = df['Country'].apply(INDIA_check)

display(df.head(10))

Unnamed: 0,Player,Span,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC
0,SR Tendulkar,1989-2013,200,329,33,15921,248*,53.78,51,68,14,INDIA,No
1,RT Ponting,1995-2012,168,287,29,13378,257,51.85,41,62,17,AUS,No
2,JH Kallis,1995-2013,166,280,40,13289,224,55.37,45,58,16,ICC/SA,Yes
3,R Dravid,1996-2012,164,286,32,13288,270,52.31,36,63,8,ICC/INDIA,Yes
4,AN Cook,2006-2018,161,291,16,12472,294,45.35,33,57,9,ENG,No
5,KC Sangakkara,2000-2015,134,233,17,12400,319,57.4,38,52,11,SL,No
6,BC Lara,1990-2006,131,232,6,11953,400*,52.88,34,48,17,ICC/WI,Yes
7,S Chanderpaul,1994-2015,164,280,49,11867,203*,51.37,30,66,15,WI,No
8,DPMD Jayawardene,1997-2014,149,252,15,11814,374,49.84,34,50,15,SL,No
9,AR Border,1978-1994,156,265,44,11174,205,50.56,27,63,11,AUS,No


In [475]:
df['played_for_ICC'].value_counts()

No     90
Yes     7
Name: played_for_ICC, dtype: int64

#### Removing "ICC/" from the 'Country'

In [476]:
df['Country'] = df['Country'].str.replace("ICC/", "")

display(df.head())

Unnamed: 0,Player,Span,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC
0,SR Tendulkar,1989-2013,200,329,33,15921,248*,53.78,51,68,14,INDIA,No
1,RT Ponting,1995-2012,168,287,29,13378,257,51.85,41,62,17,AUS,No
2,JH Kallis,1995-2013,166,280,40,13289,224,55.37,45,58,16,SA,Yes
3,R Dravid,1996-2012,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes
4,AN Cook,2006-2018,161,291,16,12472,294,45.35,33,57,9,ENG,No


#### Checking the unique values for each column

In [477]:
df['Country'].value_counts()

ENG      22
AUS      20
INDIA    12
WI       12
SL       10
PAK       8
SA        7
NZ        6
Name: Country, dtype: int64

#### Find number of years played

In [479]:
# df['start_year'] = df['Span'].str[0:4]

# df['end_year'] = df['Span'].str[5:]

# display(df.head(10))

In [480]:
# splitting the 'Span' column based on the "-"
df_span = df['Span'].str.split("-", expand=True)

# conccatinating the new dataframe with the main dataframe
df = pd.concat([df, df_span], axis=1)

# renaming the newly created column names
df = df.rename(columns={0: "start_year",
                        1: "end_year"})

# removing the "Span" column
df = df.drop("Span", axis=1)

display(df.head())

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,start_year,end_year
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,1989,2013
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,1995,2012
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,1995,2013
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,1996,2012
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,2006,2018


In [481]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Player          97 non-null     object 
 1   Match           97 non-null     int64  
 2   Innings         97 non-null     int64  
 3   NotOut          97 non-null     int64  
 4   Runs            97 non-null     int64  
 5   Highest_score   97 non-null     object 
 6   Average         97 non-null     float64
 7   Centuries       97 non-null     int64  
 8   Half_centuries  97 non-null     int64  
 9   Ducks           97 non-null     int64  
 10  Country         97 non-null     object 
 11  played_for_ICC  97 non-null     object 
 12  start_year      97 non-null     object 
 13  end_year        97 non-null     object 
dtypes: float64(1), int64(7), object(6)
memory usage: 10.7+ KB


**Data type conversion**

In [482]:
df['start_year'] = df['start_year'].astype('int') 
df['end_year'] = df['end_year'].astype('int')

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Player          97 non-null     object 
 1   Match           97 non-null     int64  
 2   Innings         97 non-null     int64  
 3   NotOut          97 non-null     int64  
 4   Runs            97 non-null     int64  
 5   Highest_score   97 non-null     object 
 6   Average         97 non-null     float64
 7   Centuries       97 non-null     int64  
 8   Half_centuries  97 non-null     int64  
 9   Ducks           97 non-null     int64  
 10  Country         97 non-null     object 
 11  played_for_ICC  97 non-null     object 
 12  start_year      97 non-null     int64  
 13  end_year        97 non-null     int64  
dtypes: float64(1), int64(9), object(4)
memory usage: 10.7+ KB
None


In [483]:
df['years_played'] = df['end_year'] - df['start_year']

df = df.drop(['start_year', "end_year"], axis=1)

display(df.head(10))

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,24
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,17
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,16
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,12
5,KC Sangakkara,134,233,17,12400,319,57.4,38,52,11,SL,No,15
6,BC Lara,131,232,6,11953,400*,52.88,34,48,17,WI,Yes,16
7,S Chanderpaul,164,280,49,11867,203*,51.37,30,66,15,WI,No,21
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17
9,AR Border,156,265,44,11174,205,50.56,27,63,11,AUS,No,16


#### Checking the average 

In [484]:
# df['avg'] = df['Runs']/(df['Innings'] - df['NotOut'])
# df['avg'] = np.round(df['avg'], 2)

# display(df.head(10))

#### Top 10 batsmen: Highest Batting average

In [485]:
df.sort_values(by='Average', ascending = False).head(10)

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
53,DG Bradman,52,80,10,6996,334,99.94,29,13,7,AUS,No,20
38,SPD Smith,77,139,17,7540,239,61.8,27,31,5,AUS,No,11
57,KF Barrington,82,131,15,6806,256,58.67,20,35,5,ENG,No,13
46,WR Hammond,85,140,16,7249,336*,58.45,22,24,4,ENG,No,20
29,GS Sobers,93,160,21,8032,365*,57.78,26,30,12,WI,No,20
5,KC Sangakkara,134,233,17,12400,319,57.4,38,52,11,SL,No,15
84,JB Hobbs,61,102,7,5410,211,56.94,15,28,4,ENG,No,22
55,L Hutton,79,138,15,6971,364,56.67,19,33,5,ENG,No,18
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
47,KS Williamson,85,148,14,7230,251,53.95,24,33,9,NZ,No,11


#### Top 10 batsmen: Highest number of centuries

In [486]:
df.sort_values(by=['Centuries', "Half_centuries"], ascending = False).head(10)

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,24
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,17
5,KC Sangakkara,134,233,17,12400,319,57.4,38,52,11,SL,No,15
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,16
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17
6,BC Lara,131,232,6,11953,400*,52.88,34,48,17,WI,Yes,16
11,SM Gavaskar,125,214,16,10122,236*,51.12,34,45,12,INDIA,No,16
12,Younis Khan,118,213,19,10099,313,52.05,34,33,19,PAK,No,17
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,12


#### Top 10 batsmen: Highest number of half centuries

In [487]:
df.sort_values(by="Half_centuries", ascending = False).head(10)

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,24
7,S Chanderpaul,164,280,49,11867,203*,51.37,30,66,15,WI,No,21
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,16
9,AR Border,156,265,44,11174,205,50.56,27,63,11,AUS,No,16
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,17
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,12
18,VVS Laxman,134,225,34,8781,281,45.97,17,56,14,INDIA,No,16
5,KC Sangakkara,134,233,17,12400,319,57.4,38,52,11,SL,No,15
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17


#### Top 10 batsmen: Highest number of years played

In [488]:
df.sort_values(by="years_played", ascending = False).head(10)

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,24
84,JB Hobbs,61,102,7,5410,211,56.94,15,28,4,ENG,No,22
34,MC Cowdrey,114,188,15,7624,182,44.06,22,38,9,ENG,No,21
7,S Chanderpaul,164,280,49,11867,203*,51.37,30,66,15,WI,No,21
53,DG Bradman,52,80,10,6996,334,99.94,29,13,7,AUS,No,20
73,DCS Compton,78,131,15,5807,278,50.06,17,28,10,ENG,No,20
29,GS Sobers,93,160,21,8032,365*,57.78,26,30,12,WI,No,20
15,GA Gooch,118,215,6,8900,333,42.58,20,46,13,ENG,No,20
46,WR Hammond,85,140,16,7249,336*,58.45,22,24,4,ENG,No,20
41,CH Lloyd,110,175,14,7515,242*,46.67,19,39,4,WI,No,19


#### Top 10 batsmen: Highest number of matches played

In [489]:
df.sort_values(by="Match", ascending = False).head(10)

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,24
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,17
10,SR Waugh,168,260,46,10927,200,51.06,32,50,22,AUS,No,19
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,16
7,S Chanderpaul,164,280,49,11867,203*,51.37,30,66,15,WI,No,21
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,12
9,AR Border,156,265,44,11174,205,50.56,27,63,11,AUS,No,16
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17
79,MV Boucher,147,206,24,5515,125,30.3,5,35,17,SA,Yes,15


In [490]:
display(df.head(10))

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248*,53.78,51,68,14,INDIA,No,24
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,17
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,16
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,12
5,KC Sangakkara,134,233,17,12400,319,57.4,38,52,11,SL,No,15
6,BC Lara,131,232,6,11953,400*,52.88,34,48,17,WI,Yes,16
7,S Chanderpaul,164,280,49,11867,203*,51.37,30,66,15,WI,No,21
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17
9,AR Border,156,265,44,11174,205,50.56,27,63,11,AUS,No,16


#### Removing the * symbol from the Highest_Score column

In [491]:
def star_remover(x):
    x = str(x)
    if "*" in x:
        return x.replace("*", "")
    else:
        return x


df['Highest_score'] = df['Highest_score'].apply(star_remover)
df['Highest_score'] = df['Highest_score'].astype('int')

display(df.head(10))
print(df.info())

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
0,SR Tendulkar,200,329,33,15921,248,53.78,51,68,14,INDIA,No,24
1,RT Ponting,168,287,29,13378,257,51.85,41,62,17,AUS,No,17
2,JH Kallis,166,280,40,13289,224,55.37,45,58,16,SA,Yes,18
3,R Dravid,164,286,32,13288,270,52.31,36,63,8,INDIA,Yes,16
4,AN Cook,161,291,16,12472,294,45.35,33,57,9,ENG,No,12
5,KC Sangakkara,134,233,17,12400,319,57.4,38,52,11,SL,No,15
6,BC Lara,131,232,6,11953,400,52.88,34,48,17,WI,Yes,16
7,S Chanderpaul,164,280,49,11867,203,51.37,30,66,15,WI,No,21
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17
9,AR Border,156,265,44,11174,205,50.56,27,63,11,AUS,No,16


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Player          97 non-null     object 
 1   Match           97 non-null     int64  
 2   Innings         97 non-null     int64  
 3   NotOut          97 non-null     int64  
 4   Runs            97 non-null     int64  
 5   Highest_score   97 non-null     int64  
 6   Average         97 non-null     float64
 7   Centuries       97 non-null     int64  
 8   Half_centuries  97 non-null     int64  
 9   Ducks           97 non-null     int64  
 10  Country         97 non-null     object 
 11  played_for_ICC  97 non-null     object 
 12  years_played    97 non-null     int64  
dtypes: float64(1), int64(9), object(3)
memory usage: 10.0+ KB
None


In [492]:
df.sort_values(by="Highest_score", ascending = False).head(10)

Unnamed: 0,Player,Match,Innings,NotOut,Runs,Highest_score,Average,Centuries,Half_centuries,Ducks,Country,played_for_ICC,years_played
6,BC Lara,131,232,6,11953,400,52.88,34,48,17,WI,Yes,16
22,ML Hayden,103,184,14,8625,380,50.73,30,29,14,AUS,No,15
8,DPMD Jayawardene,149,252,15,11814,374,49.84,34,50,15,SL,No,17
29,GS Sobers,93,160,21,8032,365,57.78,26,30,12,WI,No,20
55,L Hutton,79,138,15,6971,364,56.67,19,33,5,ENG,No,18
54,ST Jayasuriya,110,188,14,6973,340,40.07,14,31,15,SL,No,16
46,WR Hammond,85,140,16,7249,336,58.45,22,24,4,ENG,No,20
44,DA Warner,86,159,7,7311,335,48.09,24,30,9,AUS,No,10
40,MA Taylor,104,186,13,7525,334,43.49,19,40,5,AUS,No,10
53,DG Bradman,52,80,10,6996,334,99.94,29,13,7,AUS,No,20


#### dataframe slicing

In [493]:
# column_name = ['Player', 'Country', 'Runs', 'Centuries']

# method 1
df_method_1 = df[['Player', 'Country', 'Runs', 'Centuries']]

# method 2
df_method_2 = df.loc[0:10, ['Player', 'Country', 'Runs', 'Centuries']]

# method 3
df_method_3 = df.iloc[0:10, [0, 10, 4, 7]]

display(df_method_1.head())
display(df_method_2)
display(df_method_3)

print(df_method_1.shape)
print(df_method_2.shape)
print(df_method_3.shape)

Unnamed: 0,Player,Country,Runs,Centuries
0,SR Tendulkar,INDIA,15921,51
1,RT Ponting,AUS,13378,41
2,JH Kallis,SA,13289,45
3,R Dravid,INDIA,13288,36
4,AN Cook,ENG,12472,33


Unnamed: 0,Player,Country,Runs,Centuries
0,SR Tendulkar,INDIA,15921,51
1,RT Ponting,AUS,13378,41
2,JH Kallis,SA,13289,45
3,R Dravid,INDIA,13288,36
4,AN Cook,ENG,12472,33
5,KC Sangakkara,SL,12400,38
6,BC Lara,WI,11953,34
7,S Chanderpaul,WI,11867,30
8,DPMD Jayawardene,SL,11814,34
9,AR Border,AUS,11174,27


Unnamed: 0,Player,Country,Runs,Centuries
0,SR Tendulkar,INDIA,15921,51
1,RT Ponting,AUS,13378,41
2,JH Kallis,SA,13289,45
3,R Dravid,INDIA,13288,36
4,AN Cook,ENG,12472,33
5,KC Sangakkara,SL,12400,38
6,BC Lara,WI,11953,34
7,S Chanderpaul,WI,11867,30
8,DPMD Jayawardene,SL,11814,34
9,AR Border,AUS,11174,27


(97, 4)
(11, 4)
(10, 4)
