Import libraries

In [1]:
import pandas as pd

Import Data

In [2]:
df = pd.read_csv('athlete_events.csv')

## Selecting Columns
Selecting a single column will return a Series

In [4]:
df['Name'].head()

0                   A Dijiang
1                    A Lamusi
2         Gunnar Nielsen Aaby
3        Edgar Lindenau Aabye
4    Christine Jacoba Aaftink
Name: Name, dtype: object

To select multiple columns, we need to put them in a single list

In [5]:
df[['Name', 'Sex', 'Age', 'Team']].head()

Unnamed: 0,Name,Sex,Age,Team
0,A Dijiang,M,24.0,China
1,A Lamusi,M,23.0,China
2,Gunnar Nielsen Aaby,M,24.0,Denmark
3,Edgar Lindenau Aabye,M,34.0,Denmark/Sweden
4,Christine Jacoba Aaftink,F,21.0,Netherlands


## Basic Operations on Column Values

In [13]:
df['Age'].head()

ID
1    24.0
2    23.0
3    24.0
4    34.0
5    21.0
Name: Age, dtype: float64

### Addition

In [16]:
df['Age'].head()+100

ID
1    124.0
2    123.0
3    124.0
4    134.0
5    121.0
Name: Age, dtype: float64

In [17]:
df['Age'].add(100).head()

ID
1    124.0
2    123.0
3    124.0
4    134.0
5    121.0
Name: Age, dtype: float64

### Subtraction

In [18]:
df['Age'].head()-50

ID
1   -26.0
2   -27.0
3   -26.0
4   -16.0
5   -29.0
Name: Age, dtype: float64

In [19]:
df['Age'].sub(50).head()

ID
1   -26.0
2   -27.0
3   -26.0
4   -16.0
5   -29.0
Name: Age, dtype: float64

### Multiplication

In [20]:
df['Age'].head()*10

ID
1    240.0
2    230.0
3    240.0
4    340.0
5    210.0
Name: Age, dtype: float64

In [22]:
df['Age'].mul(10).head()

ID
1    240.0
2    230.0
3    240.0
4    340.0
5    210.0
Name: Age, dtype: float64

### Division

In [25]:
df['Age'].head()/10

ID
1    2.4
2    2.3
3    2.4
4    3.4
5    2.1
Name: Age, dtype: float64

In [27]:
df['Age'].div(10).head()

ID
1    2.4
2    2.3
3    2.4
4    3.4
5    2.1
Name: Age, dtype: float64

### Modulu

In [28]:
df['Age'].head()%10

ID
1    4.0
2    3.0
3    4.0
4    4.0
5    1.0
Name: Age, dtype: float64

In [30]:
df['Age'].mod(10).head()

ID
1    4.0
2    3.0
3    4.0
4    4.0
5    1.0
Name: Age, dtype: float64

## Adding & Removing Columns
Selecting a nonexistent column name will return a key error

In [4]:
df['BMI'] # Returns an error!

Assigning a value to a nonexistent column, on the other hand, will generate it and add it to the dataframe permanantly. Columns generated in this manner would always be added on the far right

In [24]:
df['BMI'] = df['Weight'] / (df['Height'] / 100) ** 2
df.head()

Unnamed: 0_level_0,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,BMI
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,,24.691358
2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,,20.761246
3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,,
4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold,
5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,,23.959094


### insert
We can use the ***insert*** method to specify the position of our new column

In [36]:
df.insert(6, 'Weight in Pounds', df['Weight'] * 2.2)
df.head()

Unnamed: 0_level_0,Name,Sex,Age,Height,Weight,Team,Weight in Pounds,NOC,Games,Year,Season,City,Sport,Event,Medal,BMI
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,A Dijiang,M,24.0,180.0,80.0,China,176.0,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,,24.691358
2,A Lamusi,M,23.0,170.0,60.0,China,132.0,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,,20.761246
3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,,
4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold,
5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,180.4,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,,23.959094


### drop
If we want to delete new column we can specift that using drop. We also need to specify axis=1 because we can use drop to remove rows as well and that's the default

In [37]:
df.drop('Weight in Pounds', axis=1).head()

Unnamed: 0_level_0,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,BMI
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,,24.691358
2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,,20.761246
3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,,
4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold,
5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,1988 Winter,1988,Winter,Calgary,Speed Skating,Speed Skating Women's 500 metres,,23.959094


We can drop an entire list of columns at once

In [45]:
df.drop(['Games', 'Year', 'Season','Medal', 'City','Sport'], axis=1).head()

Unnamed: 0_level_0,Name,Sex,Age,Height,Weight,Team,NOC,Event,BMI
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,A Dijiang,M,24.0,180.0,80.0,China,CHN,Basketball Men's Basketball,24.691358
2,A Lamusi,M,23.0,170.0,60.0,China,CHN,Judo Men's Extra-Lightweight,20.761246
3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,Football Men's Football,
4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,Tug-Of-War Men's Tug-Of-War,
5,Christine Jacoba Aaftink,F,21.0,185.0,82.0,Netherlands,NED,Speed Skating Women's 500 metres,23.959094


### inplace
The change of the dataframe won't stick unless we use the ***inplace*** parameter

In [38]:
df.head(3)

Unnamed: 0_level_0,Name,Sex,Age,Height,Weight,Team,Weight in Pounds,NOC,Games,Year,Season,City,Sport,Event,Medal,BMI
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,A Dijiang,M,24.0,180.0,80.0,China,176.0,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,,24.691358
2,A Lamusi,M,23.0,170.0,60.0,China,132.0,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,,20.761246
3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,,


In [39]:
df.drop('Weight in Pounds', axis=1, inplace=True)
df.head(3)

Unnamed: 0_level_0,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal,BMI
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,,24.691358
2,A Lamusi,M,23.0,170.0,60.0,China,CHN,2012 Summer,2012,Summer,London,Judo,Judo Men's Extra-Lightweight,,20.761246
3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,,


We can delete rows in the same manner

In [3]:
df.drop(1).head(3)

Unnamed: 0,ID,Name,Sex,Age,Height,Weight,Team,NOC,Games,Year,Season,City,Sport,Event,Medal
0,1,A Dijiang,M,24.0,180.0,80.0,China,CHN,1992 Summer,1992,Summer,Barcelona,Basketball,Basketball Men's Basketball,
2,3,Gunnar Nielsen Aaby,M,24.0,,,Denmark,DEN,1920 Summer,1920,Summer,Antwerpen,Football,Football Men's Football,
3,4,Edgar Lindenau Aabye,M,34.0,,,Denmark/Sweden,DEN,1900 Summer,1900,Summer,Paris,Tug-Of-War,Tug-Of-War Men's Tug-Of-War,Gold


### unique()
We can use the ***unique*** method to display the different values from a Series without duplicates 

In [6]:
df['Sex'].unique()

array(['M', 'F'], dtype=object)

In [9]:
df['Sport'].unique()

array(['Basketball', 'Judo', 'Football', 'Tug-Of-War', 'Speed Skating',
       'Cross Country Skiing', 'Athletics', 'Ice Hockey', 'Swimming',
       'Badminton', 'Sailing', 'Biathlon', 'Gymnastics',
       'Art Competitions', 'Alpine Skiing', 'Handball', 'Weightlifting',
       'Wrestling', 'Luge', 'Water Polo', 'Hockey', 'Rowing', 'Bobsleigh',
       'Fencing', 'Equestrianism', 'Shooting', 'Boxing', 'Taekwondo',
       'Cycling', 'Diving', 'Canoeing', 'Tennis', 'Modern Pentathlon',
       'Figure Skating', 'Golf', 'Softball', 'Archery', 'Volleyball',
       'Synchronized Swimming', 'Table Tennis', 'Nordic Combined',
       'Baseball', 'Rhythmic Gymnastics', 'Freestyle Skiing',
       'Rugby Sevens', 'Trampolining', 'Beach Volleyball', 'Triathlon',
       'Ski Jumping', 'Curling', 'Snowboarding', 'Rugby',
       'Short Track Speed Skating', 'Skeleton', 'Lacrosse', 'Polo',
       'Cricket', 'Racquets', 'Motorboating', 'Military Ski Patrol',
       'Croquet', 'Jeu De Paume', 'Roque', 'Al

### nunique()
Returns the number of the different values

In [11]:
df['Sex'].nunique()

2

In [10]:
df['Sport'].nunique()

66

### value_counts()
Returns the count of all records broken down by the Series values

In [12]:
df['Sex'].value_counts()

M    196594
F     74522
Name: Sex, dtype: int64

In [13]:
df['Sport'].value_counts()

Athletics                38624
Gymnastics               26707
Swimming                 23195
Shooting                 11448
Cycling                  10859
Fencing                  10735
Rowing                   10595
Cross Country Skiing      9133
Alpine Skiing             8829
Wrestling                 7154
Football                  6745
Sailing                   6586
Equestrianism             6344
Canoeing                  6171
Boxing                    6047
Speed Skating             5613
Ice Hockey                5516
Hockey                    5417
Biathlon                  4893
Basketball                4536
Weightlifting             3937
Water Polo                3846
Judo                      3801
Handball                  3665
Art Competitions          3578
Volleyball                3404
Bobsleigh                 3058
Tennis                    2862
Diving                    2842
Ski Jumping               2401
                         ...  
Badminton                 1457
Nordic C

In [19]:
df['Medal'].nunique(dropna=False)

4