# Widgets

For this exercise let's take some world population data and make it sparkle with interactivity! Also, notice how we are telling a story with this particular notebook

## Getting Data

I wanted to get some data about the world's population, and found a csv online that I can use from a wonderful repository called [Github Datasets](https://github.com/datasets)

In [1]:
import pandas as pd

real_population = pd.read_csv('https://raw.githubusercontent.com/datasets/population/master/data/population.csv')

## Engineering Data

Next, I needed to remove the entries that aren't actual countries

In [2]:
clean_population = real_population[~real_population['Country Code'].str.contains('CSS|ARB|CEB|EAR|EAS|EAP|TEA|EMU|ECS|ECA|TEC|EUU|FCS|HPC|HIC|IBRD|IBD|IBT|IDB|IDX|IDA|LTE|LCN|LAC|TLA|LDC|LMY|LIC|LMC|MEA|MNA|TMN|MIC|NAC|OED|OSS|PSS|PST|PRE|SST|SAS|TSA|SSF|SSA|TSS|UMC|WLD')]

Since these entries had multiple years, I just wanted to find the highest one, regardless of year. Yes, I am assuming that population is increasing for most countries

In [3]:
grouped = clean_population.groupby(['Country Code'])['Value'].max()
grouped

Country Code
ABW       106585
AFE    702976832
AFG     40099462
AFW    478185907
AGO     34503774
         ...    
XKX      2086000
YEM     32981641
ZAF     59392255
ZMB     19473125
ZWE     15993524
Name: Value, Length: 219, dtype: int64

I then realized that the above entry had no continent data, so I got another source of information.

In [4]:
continents = pd.read_csv('https://pkgstore.datahub.io/JohnSnowLabs/country-and-continent-codes-list/country-and-continent-codes-list-csv_csv/data/b7876b7f496677669644f3d1069d3121/country-and-continent-codes-list-csv_csv.csv')
continents

Unnamed: 0,Continent_Name,Continent_Code,Country_Name,Two_Letter_Country_Code,Three_Letter_Country_Code,Country_Number
0,Asia,AS,"Afghanistan, Islamic Republic of",AF,AFG,4.0
1,Europe,EU,"Albania, Republic of",AL,ALB,8.0
2,Antarctica,AN,Antarctica (the territory South of 60 deg S),AQ,ATA,10.0
3,Africa,AF,"Algeria, People's Democratic Republic of",DZ,DZA,12.0
4,Oceania,OC,American Samoa,AS,ASM,16.0
...,...,...,...,...,...,...
257,Africa,AF,"Zambia, Republic of",ZM,ZMB,894.0
258,Oceania,OC,Disputed Territory,XX,,
259,Asia,AS,Iraq-Saudi Arabia Neutral Zone,XE,,
260,Asia,AS,United Nations Neutral Zone,XD,,


Merging both together

In [5]:
merged = pd.merge(grouped, continents, how="left", left_on="Country Code", right_on="Three_Letter_Country_Code")

Dropping all information that has empty data since that would be useless to me

In [6]:
merged.dropna(inplace=True)

## Creating a function to dynamically display

The following function will take two arguments, `selection` and `topk`. `selection` is the continent, and `topk` is for the top number of countries in a continent.

In [9]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

def show_top(selection, topk):
    plt.figure(figsize=(23,5))
    sns.set_style("whitegrid")
    result = merged[merged['Continent_Name'] == selection].sort_values('Value', ascending=False).head(topk)
    barplot= sns.barplot(result['Country_Name'], result['Value'])
    barplot.set_xticklabels(barplot.get_xticklabels(),rotation=45)

Matplotlib is building the font cache; this may take a moment.


ModuleNotFoundError: No module named 'seaborn'

Here we just get all the unique continent names

In [None]:
items = merged['Continent_Name'].unique()
items

## Exercise 1 

1. Create a drop down widget that shows the continents
2. Create a slider widget that will set the top number of countries, make the number 1 through 25.
3. Call `interact` with the `show_top` function above