# Introduction to probability

We covered a bit of probability in the last mission, but we'll go more into depth here and build a strong foundation. Before we do that, let's introduce our dataset. Our dataset contains information on flags of countries around the world. Each row is a country. Here are the relevant columns:

* `name` -- name of the country
* `landmass` -- which continent the country is in (1=N.America, 2=S.America, 3=Europe, 4=Africa, 4=Asia, 6=Oceania)
* `area` -- country area, in thousands of square kilometers
* `population` -- rounded to the nearest million
* `bars` -- Number of vertical bars in the flag
* `stripes` -- Number of horizontal stripes in the flag
* `colors` -- Number of different colours in the flag
* `red, green, blue, gold, white, black, orange` -- 0 if color absent, 1 if color present in the flag

This data was collected from Collins Gem Guide to Flags. It was written in 1986, so some flag information may be out of date!

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [2]:
flags = pd.read_csv('data/flags.csv')
flags.head()

Unnamed: 0,name,landmass,zone,area,population,language,religion,bars,stripes,colors,...,saltires,quarters,sunstars,crescent,triangle,icon,animate,text,topleft,botright
0,Afghanistan,5,1,648,16,10,2,0,3,5,...,0,0,1,0,0,1,0,0,black,green
1,Albania,3,1,29,3,6,6,0,0,3,...,0,0,1,0,0,0,1,0,red,red
2,Algeria,4,1,2388,20,8,2,2,0,3,...,0,0,1,1,0,0,0,0,green,white
3,American-Samoa,6,3,0,0,1,1,0,0,5,...,0,0,0,0,1,1,1,0,blue,red
4,Andorra,3,1,0,0,6,0,3,0,3,...,0,0,0,0,0,0,0,0,blue,red


In [6]:
flags.sort_values(by='bars', ascending=False)

Unnamed: 0,name,landmass,zone,area,population,language,religion,bars,stripes,colors,...,saltires,quarters,sunstars,crescent,triangle,icon,animate,text,topleft,botright
161,St-Vincent,1,4,0,0,1,1,5,0,4,...,0,0,0,0,0,1,1,1,blue,green
143,Rwanda,4,2,26,5,10,5,3,0,4,...,0,0,0,0,0,0,0,1,red,green
88,Ivory-Coast,4,4,323,7,3,5,3,0,3,...,0,0,0,0,0,0,0,0,red,green
85,Ireland,3,4,70,3,1,0,3,0,3,...,0,0,0,0,0,0,0,0,green,orange
30,Cameroon,4,1,474,8,3,1,3,0,3,...,0,0,1,0,0,0,0,0,green,gold
142,Romania,3,1,237,22,6,6,3,0,7,...,0,0,2,0,0,1,1,1,blue,red
35,Chad,4,1,1284,4,3,5,3,0,3,...,0,0,0,0,0,0,0,0,blue,red
126,Nigeria,4,1,925,56,10,2,3,0,2,...,0,0,0,0,0,0,0,0,green,green
16,Belgium,3,1,31,10,6,0,3,0,3,...,0,0,0,0,0,0,0,0,black,red
15,Barbados,1,4,0,0,1,1,3,0,3,...,0,0,0,0,0,1,0,0,blue,blue


In [7]:
most_bars_country = flags.sort_values(by='bars', ascending=False).iloc[0]['name']
most_bars_country

'St-Vincent'

In [8]:
highest_population_country = flags.sort_values(by='population', ascending=False).iloc[0]['name']
highest_population_country

'China'