# Lab 5 Tasks

In this notebook we will analyse a dataset related to the London 2012 Olympics by using the Pandas library. In the dataset, each row represents a different country described by a number of features:

- *ISO:* Unique short ISO country code
- *Country*: Full country name
- *Gold*: Number of gold medals won by the country in 2012
- *Silver*: Number of silver medals won by the country in 2012
- *Bronze*: Number of bronze medals won by the country in 2012
- *Population*: 2011 population for the country, from the World Bank

## Task 1

Load the CSV file "olympics2012.csv" into a Pandas DataFrame, where each row is indexed by its ISO country code. Check the number of rows and the column names in the DataFrame.

In [9]:
import pandas as pd
df=pd.read_csv("olympics2012.csv",index_col="ISO")
df

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AFG,Afghanistan,0,0,1,35320445
ARG,Argentina,1,1,2,40764561
ARM,Armenia,0,1,2,3100236
AUS,Australia,7,16,12,22620600
AZE,Azerbaijan,2,2,6,9168000
...,...,...,...,...,...
UKR,Ukraine,6,5,9,45706100
USA,United States,46,29,29,311591917
UZB,Uzbekistan,1,0,3,29341200
VEN,Venezuela,1,0,0,29278000


Display the first 15 rows of the data.

In [11]:
df.head(15)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AFG,Afghanistan,0,0,1,35320445
ARG,Argentina,1,1,2,40764561
ARM,Armenia,0,1,2,3100236
AUS,Australia,7,16,12,22620600
AZE,Azerbaijan,2,2,6,9168000
BEL,Belgium,0,1,2,11008000
BGR,Bulgaria,0,1,1,7476000
BHR,Bahrain,0,0,1,1323535
BHS,Bahamas,1,0,0,347176
BLR,Belarus,3,5,5,9473000


Show the top 10 countries with the highest number of gold medals at the 2012 Olympics.

In [19]:
df.sort_values("Gold",ascending=False).head(10)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
USA,United States,46,29,29,311591917
CHN,China,38,27,22,1344130000
GBR,Great Britain,29,17,19,60924000
RUS,Russian Federation,24,25,33,141930000
KOR,"Korea, Rep.",13,8,7,49779000
FRA,France,11,11,12,65436552
DEU,Germany,11,19,14,81726000
ITA,Italy,8,9,11,60770000
HUN,Hungary,8,4,5,9971000
KAZ,Kazakhstan,7,1,5,16558459


## Task 2

Create a new column in the DataFrame called "Total" which indicates the total number of medals won by each country. Show the top 10 countries with the highest number of total medals.

In [23]:
df["Total"] = df["Gold"]+df["Silver"]+df["Bronze"]
df

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population,Total
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFG,Afghanistan,0,0,1,35320445,1
ARG,Argentina,1,1,2,40764561,4
ARM,Armenia,0,1,2,3100236,3
AUS,Australia,7,16,12,22620600,35
AZE,Azerbaijan,2,2,6,9168000,10
...,...,...,...,...,...,...
UKR,Ukraine,6,5,9,45706100,20
USA,United States,46,29,29,311591917,104
UZB,Uzbekistan,1,0,3,29341200,4
VEN,Venezuela,1,0,0,29278000,1


Display the subset of countries which:
1. Won 20 or more gold medals
2. Won 25 or more total medals
3. Won only bronze medals

In [39]:
df[(df["Gold"]>20) & (df["Total"]>25) | (df["Gold"]==0) &(df["Silver"] ==0)]

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population,Total
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFG,Afghanistan,0,0,1,35320445,1
BHR,Bahrain,0,0,1,1323535,1
CHN,China,38,27,22,1344130000,87
GBR,Great Britain,29,17,19,60924000,65
GRC,Greece,0,0,2,11304000,2
HKG,Hong Kong,0,0,1,7071600,1
KWT,Kuwait,0,0,1,2818042,1
MAR,Morocco,0,0,1,32272974,1
MDA,Moldova,0,0,2,3559000,2
QAT,Qatar,0,0,2,1870041,2


## Task 3

Create a new column called "WeightedTotal", which computes a weighted total for the number of medals won by each country, with weights: Gold=3, Silver=2, Bronze=1. 

Display the top 10 countries according to this score.

In [65]:
df["WeightedTotal"] = df["Gold"]*3 + df["Silver"]*2 + df["Bronze"]
df[["Country","WeightedTotal"]].sort_values(by="WeightedTotal", ascending=False).head(10)

Unnamed: 0_level_0,Country,WeightedTotal
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1
USA,United States,225
CHN,China,190
RUS,Russian Federation,155
GBR,Great Britain,140
DEU,Germany,85
FRA,France,67
JPN,Japan,66
AUS,Australia,65
KOR,"Korea, Rep.",62
ITA,Italy,53


## Task 4

Create a new column "TotalPerPop" which is calculated as the total number of medals won by a country per million population.

Display the top 20 countries according to this score.

In [79]:
df["TotalPerPop"] = df["Total"]/(df["Population"]/10**6)
df.sort_values("TotalPerPop",ascending=False).head(20)

Unnamed: 0_level_0,Country,Gold,Silver,Bronze,Population,Total,WeightedTotal,TotalPerPop
ISO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
GRD,Grenada,1,0,0,104890,1,3,9.533797
JAM,Jamaica,4,4,4,2709300,12,24,4.429188
TTO,Trinidad and Tobago,1,0,3,1346350,4,6,2.970996
NZL,New Zealand,5,3,5,4405200,13,26,2.951058
BHS,Bahamas,1,0,0,347176,1,3,2.880383
SVN,Slovenia,1,1,2,2052000,4,7,1.949318
MNG,Mongolia,0,2,3,2800114,5,7,1.785642
HUN,Hungary,8,4,5,9971000,17,37,1.704944
DNK,Denmark,2,4,3,5574000,9,17,1.614639
MNE,Montenegro,0,1,0,632261,1,2,1.581625
