# Countries of the World

The notebook serves to demonstrate the abilities of Python and its packages for exploratory data analysis on country statistics  and to convey the problems and approaches to resolve them.

**Motivation**: I was browsing Kaggle data sets to explore for fun when I happened upon the nations of the world data set. I thought it would be interesting to check out the statistics of nations especially when the political mantra of the time is to "Make America Great Again". Just how does the United States, or any other country for that matter compare with the rest of the world? The database I found on Kaggle had some interesting numbers, but lacked annotation of units, may have had incorrect information, and had string entries for data. This motivated me to try and make my own.

How can we make our own database of world facts? The <a href = https://www.cia.gov/library/publications/the-world-factbook/>CIA World Factbook </a>is an ideal source of this information and perhaps the original source of the Kaggle data. Unfortunately, the site has limited machine-readable format for the data, so we will have to do some web-scraping.

# Objectives

1. Create our own data set through parsing the CIA World Factbook.
2. Store the data into a database.
3. Visualize the data.

1. [Creating Our Own Data Set](#own)                                              
    1.1 [Features of Interest](#features2)                                                                                                           
    1.2 [Web Pages of Interest](#webpages)                                                                                        
    1.3 [Web Page Structure](#structure)     
    1.4 [Data Cleaning](#data_cleaning)    
    1.5 [Data Cleaning Functions](#data_cleaning_functions)    
    1.6 [Data Scraping Functions](#data_scraping)    
    1.7 [Multhithreading](#multithreading)
    
2. [Viewing the Data](#viewing)      
    2.1 [Identifying Problem Fields](#problems)      
    2.2 [Cleaning the Data Set](#cleaning)
3. [Creating a Database](#database)  
    3.1 [Database Functions](#database_functions)     
    3.2 [Webscraping into a Database](#testing)     
    3.3 [Extracting Data from the Database](#extracting)
4. [Visualization](#visualization)
5. [Concluding Remarks and Future Directions](#conclusion)

In [None]:
import matplotlib.pyplot as plt
#import seaborn as sns
import numpy as np
import pandas as pd
#sns.set()
import re
from time import sleep
import requests
from bs4 import BeautifulSoup
from collections import defaultdict
import json
import psycopg2 as ps
from threading import Thread
from queue import Queue


<a id='own'></a>

# 1. Obtaining Our Own Dataset: CIA World Factbook

Specifically it will be useful to find out:

1. What facts are available for each country?
2. What are the web pages of interest? 
3. What is the general structure for the webpages of interest?
    - is there a regular structure to each page?
    - what tags or phrases contain the key quantities of interest?
4. What regular expression or tag can we use to obtain each information?

<a id='features2'></a>
## 1.1 Features of Interest

The CIA World Factbook organizes the various features by fields and subfields for each <a href='https://www.cia.gov/library/publications/the-world-factbook/docs/profileguide.html'>country profile</a>. We will create lists of fields with quantitative data organized by category. Below is a list created manually by inspection.

**All Fields of Interest**

In [None]:
# Create one entire aggregate list of fields
Geography = ['Area', 'Land boundaries', 'Coastline', 'Elevation', 'Land use', 'Irrigated land']
Society = ['Population', 'Age structure', 'Dependency ratios', 'Median age', 'Population growth rate', 'Birth rate',
           'Death rate', 'Net migration rate', 'Urbanization', 'Sex ratio', "Mother's mean age at first birth", 
           'Maternal mortality ratio', 'Infant mortality rate', 'Life expectancy at birth', 'Total fertility rate', 
           'Contraceptive prevalence rate', 'Health expenditures', 'Physicians density', 'Hospital bed density', 
           'Drinking water source', 'Sanitation facility access', 'HIV/AIDS - adult prevalence rate', 
           'HIV/AIDS - people living with HIV/AIDS', 'HIV/AIDS - deaths', 'Obesity - adult prevalence rate', 
           'Children under the age of 5 years underweight', 'Education expenditures', 'Literacy', 
           'School life expectancy (primary to tertiary education)']

Economy = ['GDP (purchasing power parity)', 'GDP (official exchange rate)', 'GDP - real growth rate', 'Gross national saving',
           'GDP - composition, by end use', 'GDP - composition, by sector of origin', 'Industrial production growth rate',
           'Labor force', 'Labor force - by occupation', 'Unemployment rate', 'Population below poverty line',
           'Household income or consumption by percentage share','Distribution of family income - Gini index', 'Budget', 
           'Taxes and other revenues', 'Budget surplus (+) or deficit (-)', 'Public debt', 'Inflation rate (consumer prices)', 
           'Commercial bank prime lending rate','Stock of narrow money', 'Stock of broad money', 'Stock of domestic credit', 
           'Market value of publicly traded shares', 'Current account balance', 'Exports', 'Imports', 
           'Reserves of foreign exchange and gold', 'Debt - external']

Energy = ['Electricity access', 'Electricity - production', 'Electricity - exports', 'Electricity - imports', 
          'Electricity - installed generating capacity', 'Electricity - from fossil fuels', 
          'Electricity - from hydroelectric plants','Electricity - from nuclear fuels', 'Electricity - from other renewable sources', 'Crude oil - production',
          'Crude oil - exports', 'Crude oil - imports', 'Crude oil - proved reserves', 'Refined petroleum products - production',
          'Refined petroleum products - consumption', 'Refined petroleum products - exports', 
          'Refined petroleum products - imports', 'Natural gas - production', 'Natural gas - consumption', 
          'Natural gas - exports', 'Natural gas - imports', 'Carbon dioxide emissions from consumption of energy']

Communications = ['Telephones - fixed lines', 'Telephones - mobile cellular', 'Internet users', 'Broadband - fixed subscriptions']

Transportation = ['National air transport system', 'Airports', 'Airports - with paved runways', 
                  'Airports - with unpaved runways', 'Heliports', 'Pipelines', 'Roadways', 'Waterways']

Military = ['Military expenditures']

Fields = []
for i in [Geography, Society, Economy, Energy, Communications, Transportation, Military]:
    Fields.extend(i)


<a id='webpages'></a>

## 1.2 Web Pages of Interest

For webscraping, it's necessary to create a list of websites to scrape. We can achieve through code by noticing how any country profile page on the CIA website has links to other countries. The format is of the form: 

`https://www.cia.gov/library/publications/the-world-factbook/geos/xx.html` where `xx` is a placeholder for two letters representing each country.

Below is code to create and store the websites in a dictionary. We will put this into a function later.

In [None]:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/af.html'
r = requests.get(url)
html_contents = r.text
html_soup = BeautifulSoup(html_contents, 'html.parser')

country_letters = {}
for found in html_soup.find_all('option'):
    match = re.search(r'([^x][^x])\.(html)', str(found))
    if match:
        name = str(found.string)
        country_letters[name] = match.group(1)

sites = ['https://www.cia.gov/library/publications/the-world-factbook/geos/%s.html' % 
         country_letters[i] for i in country_letters]


In [None]:
country_letters

<a id='structure'></a>

## 1.3 General Webpage Structure

A previous version of the CIA world factbook had a heterogeneous data organizing structure. Now, it's much more uniform. The structure of any html page is on a coarse level, tags within tags. These can be thought of as a tree structure, in which the nodes are tags and edges show which tags are within other tags.

Looking into the source HTML code, the information of a country is organized by field, specifically enclosed in a pair of `<div>` tags where the field is encoded in the attribute ```id= field-xxx...'```. Within a field, information is structured is subfield names and data then follow ```class='category_data'``` and the actual data.

** Patterns of data organization**: A closer look at the HTML shows a range of scenarios of fields and subfields organizations each within `<div id="field-xxx'> ... </div>`:

   1. Multiple `<div>` tags each with a single subfield name and number:
   
       ```       
       <div class='category_data subfield numeric'>
          <span class="subfield-name">total:</span>
          <span class="subfield-number">652,230 sq km</span>
          <span class="subfield-note"></span>
          
        </div>
        <div class='category_data subfield numeric'>
          <span class="subfield-name">land:</span>
          <span class="subfield-number">652,230 sq km</span>
          <span class="subfield-note"></span>
          
        </div>
        <div class='category_data subfield numeric'>
          <span class="subfield-name">water:</span>
          <span class="subfield-number">0 sq km</span>
          <span class="subfield-note"></span>
          
        </div>```
        
   2. Text label:
       ```
       <div class='category_data subfield text'>
          <span class="subfield-name">definition:</span>
        age 15 and over can read and write
        <span class="subfield-date" aria-label="Date of information: 2015 est.">(2015 est.)</span>
      </div>
      ```
      
   3. Historic label with no subfield name and repeats of past data:
   
      ```
      <div class='category_data subfield historic'>
          
          <span class="subfield-number">3.2% of GDP</span>
          <span class="subfield-note"></span>
          ... <\div>```
       
   4. Numeric label without subfield name:
   
       ```
       <div id="field-population">
        <div class='category_data subfield numeric'>
          
          <span class="subfield-number">34,940,837</span>
          <span class="subfield-note"></span>
          <span class="subfield-date" aria-label="Date of information: July 2018">(July 2018 est.)</span>
        </div>```     
     
   5. Subfield labeled by group label:       
   
   ```
       <div class='category_data subfield grouped_subfield'>
         <span class='subfield-group'>improved:</span>
         <span class="subfield-name">urban:</span>
         <span class="subfield-number">78.2% of population</span>
         ...<\div>
         ```
   6. Numeric label but without subfield number:       
   
   ```
       <div class='category_data subfield numeric'>
          <span class="subfield-name">water:</span>
          <span class="subfield-note">NEGL</span>
          ...<\div>
         ```      
   
All of these scenarios have some common features, in that they all have numerical data stored in 
`<span class="subfield-number"><\span>` tags. This allows us to search for these tags and extract the data. We can then work around the tags to find the relevant context such as the field, subfield, subfield group. This would generate data one country profile at a time.

Alternatively, we can scan the page for a field and then collect the relevant data and annotations. We will use this latter approach. 


1. Find the relevant field tag by searching for the `<a>` with the relevant string.

2. Find the next `<div id="field-xxx">` tag and go through each child `<div>` tag, iterating through the relevant child  `<span>` tags. Check for the following conditions:
 - If the next `<div>` has `text` as an attribute, skip it.
 - If it has `historic` as an attribute, acquire data from the nearest `<span class="subfield-number">` tag and skip to the next `<div>` tag.
 - If it has 'grouped_subfield' as an attribute, create specific solution for the field but catch this case before
 
 


### Filtering for Quantitative Fields

Below is some exploratory code to find the relevant tags:

In [None]:
url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/us.html'
r = requests.get(url)
html_contents = r.text
html_soup = BeautifulSoup(html_contents, 'html.parser')


found = html_soup.find('span', {'class': 'region_name1 countryName'})
print(found['class'])

Here we print out all the tags as well as the total number:

In [None]:
fields = []
# find the <div id='field....'> tags
for found in html_soup.find_all('span', class_=re.compile("subfield-number")):
    # check the immediate div tag below
    #print(found)
    parent = found.parent.parent
    fields.append(parent['id'])
print(set(fields))
len(set(fields))

### Accessing Numerical Data

Here is some code to print out annotated numerical data for scenarios 1-4:

In [None]:
fields= []
subfields = []
data = []

example_fields = ['Population', 'Area', 'Drinking water source', 'Land use', 'Elevation', 'Health expenditures',
          'School life expectancy (primary to tertiary education)']


for field in example_fields:
    div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
    print(field)
    # By default, assumes situation 1
    for div_category_data in div_field.find_all('div', class_=re.compile('category_data')):
        name = div_category_data.find('span', class_="subfield-name")
        num = div_category_data.find('span', class_="subfield-number")
        
        # Check for scenario 2
        if 'text' in div_category_data['class']:
            continue
        # Check for scenarios 3 and 4    
        if name==None or {'note','historic'} <= set(div_category_data['class']):
            if num:
                print('  ', num.string)
            break        
        print(name.string)
        print('  ', num.string)

Here is some code to print data with the structure of scenario 5:

In [None]:
div_field = html_soup.find('a', href=re.compile('.*#\d+'), string='Drinking water source').find_next('div')
for i in div_field.find_all('span', class_='subfield-number'):
    group = i.find_previous('span', class_='subfield-group').string
    name = i.find_previous('span', class_='subfield-name').string
    print(group, name, i.string)

Looking at the data present on the web page for [Afghanistan](https://www.cia.gov/library/publications/the-world-factbook/geos/af.html#field-anchor-geography-land-use), we can see that most data was printed out for the selected fields.  For `Drinking water source` the group names were skipped. For `Land use`, the   `permanent crops` and `permanent pasture` percentages were skipped. Finally, in `Elevation`, the lowest extreme height source along with data for the highest elevation was skipped. Specific code will be required to handle these fields. 

### Code for Special Cases

Here is code to obtain and organize data for the special cases of `Land use`, `Elevation` and `Age Structure`.

**Land use**:

In [None]:
field = 'Land use'
Fields, Subfields, Data = [], [], []
found = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
next_num = found.find_next('span', class_="subfield-number")
while True:
    name = next_num.find_previous('span', class_="subfield-name").string
    num = next_num.string
    Fields.append('Land use')
    Subfields.append(name)
    Data.append(num)
    print(name)
    print(' ', num)
    next_num = next_num.find_next('span', class_="subfield-number")
    if name== 'other:':
        break


**Elevation**:

In [None]:
field = 'Elevation'
Fields = ['Elevation', 'Elevation', 'Elevation']
Subfields = ['mean elevation:', 'lowest point:', 'highest point:']
Data = []
div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
mean = div_field.find_next('div')
lowest = mean.find_next('div')
hi = lowest.find_next('div')

match = re.search(r'\d+', str(div_note))
Data = [mean, lowest, hi]
for i,j in zip(Subfields,Data):
    print(i)
    print(' ', re.search(r'-*[\d,]+',str(j)).group())

**Age Structure:**

In [None]:
field = 'Age structure'
div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
div_categories = div_field.find_all('div', class_='numeric')
for i in div_categories:
    name = i.find('span', class_='subfield-name').string
    num = i.find('span', class_='subfield-number').string
    note = i.find('span', class_='subfield-note').string
    match = re.search(r'(\w+)\s([\d,]+)[\s/]*(\w+)\s([\d,]+)', note)
    print('total',name, num)
    print(match.group(1), name, match.group(2))
    print(match.group(3), name, match.group(4))

<a id='cleaning'></a>

## 1.4 Cleaning the Data

The data stored in the dataframe is in the form of strings. The strings contain numbers, units, support symbols such as commas and percents, as well as annotations in parentheses. We would like to write code which would:

1. Remove annotations
2. Extract the units for the numbers and express them in the MultiIndex, including $, %
3. Convert string numbers to floats

### Formats of Data

Browsing the profile page of the United States, we can classify types of data formats:

1. xxx,xxx,xxx units 
2. xxx,xxx,xxx units (year)
3. xx% (male xx,xxx,xxx/ female xx,xxx,xxx)
4. xx% (male xx,xxx,xxx/ female xx,xxx,xxx) (2017)
5. xx% (male xx,xxx,xxx/ female xx,xxx,xxx) (2017 est.)
4. xx% of population  
5. improved / unimproved % of population subcategories
6. $19.39 trillion
7. 8.714 trillion cu m (1 January 2016 est.)

**Cases 1, 6**: Extract numbers and anything remaining are units

In [None]:
text = "20% of population"
match = re.search(r'([-\d,\.]+)', text)
match.group(1)
num = float(match.group(1).replace(',', ''))
units = text.replace(match.group(1), '').lstrip()
units

**Cases 2, 4, 5**: Getting rid of annotations

In [None]:
text = '15.63% (male 22,678,235/female 28,376,817) (2017 est.)'
#text = '$4.084 trillion (31 December 2017 est.)'
match1 = re.search(r'\(\d.*', text)
match1.group()
#text.replace(match1.group(), '').rstrip()

**Case 3**: Age structure extraction of percent and population numbers in addition to % unit and male female categories

In [None]:
text = '38.9% (male 2,658,563/female 2,711,017)'
match1 = re.findall(r'[a-z%$]+', text)
match2 = re.findall(r'[\d\.,]+', text)
print(match1,[float(i.replace(',','')) for i in match2])

**Case 8**: Converting "illions" to numbers and removing it from string.

In [None]:
text = "$-19.29 million"
dollar = re.search(r'\$', text)
dollar.group()
unit = re.search(r'([a-z]*)(illion)', text)
if unit:
    if unit.group(1)=='m':
        factor = 1e6
    elif unit.group(1)=='b':
        factor = 1e9
    elif unit.group(1)=='tr':
        factor = 1e12
text2 = text.replace(unit.group(),'').rstrip()
num = float(re.search(r'[-\d,\.]+', text).group().replace(',',''))*factor
text2

<a id='data_cleaning_functions'></a>

## 1.5 Prerequisite Data Cleaning Functions

Below is the previous code organized into functions we will use:

In [None]:
def replace_string(sub_out, sub_in, text):
    """
    Replaces any substring from inputted text 
    
    Parameters:
    ---------------
    sub_out: string
        String to substitute out from text
    sub_in : string
        String to be subsituted in
    text: string
        String to be altered
        
    Returns:
    ---------------
    text: string
        String from which to remove char
    """
    match1 = re.search(sub_out, text)
    if match1:
        return text.replace(match1.group(), sub_in)
    else:
        return text
    
def remove_annotation(text):
    """
    Checks for and removes annotations such as (2017 est)
    
    Parameters:
    ----------------
    text: string
        String of data to format
         
    Returns:    
    ----------------
    formatted: string
        Text without annotations
    
    """  
    match1 = re.search(r'\(\d.*', text)
    if match1:
        return text.replace(match1.group(), '').rstrip()
    else:
        return text
    
def get_unit(text):
    """
    Detects and returns unit of string, excludes numbers
    
    Parameters:
    ----------------
    text: string
        Text to be parsed
        
    Returns:    
    ----------------
     unit: string
        String representing the unit
    
    """
        
    if re.search(r'([a-z]*)(illion)', text):
        text = text.replace(unit.group(),'').rstrip()
        
    if re.search(r'\$', text):
        unit = '$'        
    else:
        try:        
            match = re.search(r'[\d.,]*', text)
            if match:
                unit = text.replace(match.group(), '').lstrip()        
            else:
                unit = ''
        except:
            print(text)
            unit=''
    return unit

def remove_unit(text):
    """
    Checks for, returns, and removes unit from string
    
    Parameters:
    ----------------
    text: string
        Text to be modified
        
    Returns:    
    ----------------
    unit: string
        Bit of text that was removed
    text: 
        Text stripped of unit and any unwanted whitespace
    
    """
    dollar = re.search(r'\$', text)
    if dollar:        
        text = text.replace('$', '')
    match1 = re.search(r'[\d.,]*', text)
    if match1:
        text = text.replace(match1.group(), '').rstrip()
    return text

def convert_to_num(text):
    """
    Converts unit free text into numbers
    
    Parameters:
    ----------------
    text: string
        description
        
    Returns:    
    ----------------
    num: float
         Data converted to numerical form
    
    """
    factor = 1
    unit = re.search(r'([a-z]*)(illion)', text)
    if unit:
        if unit.group(1)=='m':
            factor = 1e6
        elif unit.group(1)=='b':
            factor = 1e9
        elif unit.group(1)=='tr':
            factor = 1e12
        text = text.replace(unit.group(),'').rstrip()
    found =  re.search(r'-?[\d,]+[\.]?\d*', text.replace('$', ''))
    
    if found:        
        try:
            num= float(re.search(r'-?[\d,]+[\.]?\d*', text).group().replace(',',''))*factor
        except:
            num = text
            print('Had trouble converting %s' % text)
    else:
        num = text
    return num

<a id='data_scraping'></a>

## 1.6 Data Scraping Functions

Below is code we explored before to get data, but organized into functions which help retrieve the site contents, extract data by different criteria, handling special cases, and ultimately place them on tables with appropriate fields and subfields. Multithreading is used to speed up performance.

#### Note on Making Tables

While the previous code printed out data, we will need to store it and eventually display it conveniently. One approach is to the well-suited [`pandas`](http://pandas.pydata.org/pandas-docs/stable/) library to [create a MultiIndex Dataframe object](https://pandas.pydata.org/pandas-docs/stable/advanced.html) from arrays.

- Create lists representing the Fields, Subfields, Units and Data.
- Each time there is a match for data, append the arrays with the current value of variables Field, Subfield, and Data in the following manner:
    - Scenario 1: Field: current match, Subfield: current match, Unit: current unit, Data: current data
    - Scenario 2: Skip
    - Scenario 3-4 :Field: current match, Subfield: blank, Unit: current unit, Data: current data
    - Scenario 5: Determine arrays case by case depending on data layout dependent on field
- Create dataframe using columns created via `pandas.MultiIndex.from_array` using the lists as arguments

In [None]:
def get_soup(url, sleep_length=5):
    """
    Return a BeautifulSoup object of the url
    
    Parameters:
    ----------------
    url: string
        Website address of a country profile in the CIA world factbook
    sleep_length: int
        Length of pause in seconds before trying another request
        
    Returns:    
    ----------------
    html_soup: BeautifulSoup object
         description
    raw_text: string
        Raw html text of the webpage
    """ 
    while True:
        try:
            r = requests.get(url)
            break
        except requests.exceptions.RequestExceptions as e:
            print(e)
            sleep(sleep_length)
    
    raw_text = r.text
    html_soup = BeautifulSoup(raw_text, 'html.parser')
    return html_soup, raw_text
    
def sites_list(banned=[]):
    """Return a dict of CIA World Factbook country profile urls,
        with country : url as key:value pairs"""   
    
    # Search for country two-letter abbreviations
    url = 'https://www.cia.gov/library/publications/resources/the-world-factbook/'
    html_soup, text = get_soup(url)
    country_sites = {}
    
    for found in html_soup.find_all('option'):
        match = re.search(r'([^x][^x])\.(html)', str(found))
        if match:
            name = str(found.string).rstrip().lstrip()
            if not name in banned:
                country_sites[name] = url_base % match.group(1)
            
    return country_sites

def site_crawl(country_site, sleep_length=0):
    """
    Test retrieval of site information by printing out country name.
    
    Parameters
    -----------
    banned : list
        Names of countries to exclude from the sites to crawl
    sleep_length : int
        Number of seconds to pause before retrying a web page request
    """
    

    html_soup, text = get_soup(country_site, sleep_length)
    print(html_soup.find('span', {'class': 'region_name1 countryName '}).string)

    
def land_use(html_soup):
    """
    Creates a hierarchical DataFrame for land use data from the CIA world factbook
    
    Parameters:
    ----------------
    html_soup: BeautifulSoup object
        Object to navigate and find data from
    
    Returns:    
    ----------------
    Fields : list
        List of fields to be used in MultiIndex
    Subfields1 : list
        List of subfields to be used in MultiIndex
    Units : list
        List of units for the data to be used in MultiIndex
    Data : list
        List of numerical percents to be used in MultiIndex
    """
    field = 'Land use'
    Fields, Subfields, Units, Data = [], [], [], []
    found = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
    next_num = found.find_next('span', class_="subfield-number")
    while True:
        name = next_num.find_previous('span', class_="subfield-name").string
        num = next_num.string
        Fields.append('Land use')
        Subfields.append(name)
        Units.append('%')
        Data.append(convert_to_num(num))
        next_num = next_num.find_next('span', class_="subfield-number")        
        if name== 'other:':
            break

    
    #columns = pd.MultiIndex.from_arrays(arrays=[Fields, Subfields1], names=['Field', 'Subfield'])
    #land_df = pd.DataFrame(np.array([Data1]), columns=columns)
        
    return Fields, Subfields, Units, Data


def elevation(html_soup):
    """
    Creates a hierarchical DataFrame for land use data from the CIA world factbook
        
    Parameters:
    ----------------
    html_soup: BeautifulSoup object
        Object to navigate and find data from
        
    Returns:    
    ----------------
    Fields: list
         Strings representing the Field level in the MultiIndex
    Subfields: list
         Strings representing the Subfields level in the MultiIndex
    Units: list
        Strings representing the Units level in the MultiIndex
    Data: list
         Numerical data    
    """      
     
    field = 'Elevation'
    Fields = ['Elevation', 'Elevation', 'Elevation']
    Subfields = ['mean elevation:', 'lowest point:', 'highest point:']
    Data = []
    
    name = html_soup.find('span', {'class': ['region_name1', 'countryName']}).string
    
    div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
    mean = div_field.find_next('div')
    lowest = mean.find_next('div')
    hi = lowest.find_next('div')
    try:
        Data = [convert_to_num(str(i)) for i in [mean, lowest, hi]]
        Units = [get_unit(i) for i in [mean, lowest]]  
        Units.append('m')
    except:
        Data = [None, None, None]
        print('Problem with elevation data for %s' % name)
    
    return Fields, Subfields, Units, Data

def age_structure(html_soup):
    """
    Creates a hierarchical DataFrame for Age structure data from the CIA world factbook
    
    Parameters:
    ----------------
    html_soup: BeautifulSoup object
        Object to navigate and find data from
        
    Returns:    
    ----------------
    Fields: list
         Strings representing the Field level in the MultiIndex
    Subfields: list
         Strings representing the Subfields level in the MultiIndex
    Units: list
        Strings representing the Units level in the MultiIndex
    Data: list
         Numerical data    
    """  
    
    Fields = []
    Subfields = []
    Units = []
    Data = []
    field = 'Age structure'    
    div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
    div_categories = div_field.find_all('div', class_='numeric')
    for i in div_categories:
        name = i.find('span', class_='subfield-name').string
        num = i.find('span', class_='subfield-number').string
        note = i.find('span', class_='subfield-note').string
        match = re.search(r'(\w+)\s([\d,]+)[\s/]*(\w+)\s([\d,]+)', note)
        Fields.extend([field, field, field])
        Subfields.extend(['total' +' '+ name, match.group(1)+' '+name, match.group(3)+' '+name])
        Units.extend(['percent', 'individuals', 'individuals'])
        Data.extend([convert_to_num(i) for i in [num, match.group(2), match.group(4)]])
    return Fields, Subfields, Units, Data

def improved_unimproved(field, html_soup):
    """
    Creates a hierarchical DataFrame for Drinking water source and Sanitation facility access 
    data from the CIA world factbook since they have similar data formats
    
    Parameters:
    ----------------
    field : string
        String 
    html_soup: BeautifulSoup object
        Object to navigate and find data from
    field_keys: dict
        Dictionary of field:number pairs in the CIA world factbook            
        
    Returns:    
    ----------------
    Fields: list
         Strings representing the Field level in the MultiIndex
    Subfields: list
         Strings representing the Subfields level in the MultiIndex
    Units: list
        Strings representing the Units level in the MultiIndex
    Data: list
         Numerical data    
    """  
    
    Fields = []
    Subfields = []
    Units = []
    Data = []
    div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
    
    div_field = html_soup.find('a', href=re.compile('.*#\d+'), string='Drinking water source').find_next('div')
    for i in div_field.find_all('span', class_='subfield-number'):
        group = i.find_previous('span', class_='subfield-group').string
        name = i.find_previous('span', class_='subfield-name').string
        Fields.append(field)
        Subfields.append(group+' '+name)
        Data.append(convert_to_num(i.string))
        Units.append(get_unit(i.string))
    return Fields, Subfields, Units, Data
        
def find_name(html_soup):
    """Returns name of country"""
    return html_soup.find('span', {'class': ['region_name1', 'countryName']}).string

def find_field(field, html_soup):
    """
    Finds field data for conventional cases and returns lists to make a hierarchical DataFrame.
    Excludes Land use and Elevation
    
    Parameters:
    ----------------
    field: string
        Exact string to match field on webpage
    html_soup: BeautifulSoup object
        Object to navigate and find data from
    field_keys: dict
        Dictionary of field:number pairs in the CIA world factbook        
        
    Returns:    
    ----------------
    Fields: list
         Strings representing the Field level in the MultiIndex
    Subfields: list
         Strings representing the Subfields level in the MultiIndex
    Units: list
        Strings representing the Units level in the MultiIndex
    Data: list
         Numerical data
    
    """ 
    
    # check if field is a special case
    if field == 'Land use':
        return land_use(html_soup)
    elif field == 'Elevation':
        return elevation(html_soup)    
    elif field == 'Age structure':
        return age_structure(html_soup)
    elif field in ['Drinking water source', 'Sanitation facility access']:
        return improved_unimproved(field, html_soup)
    
        
    Fields = []
    Subfields = []
    Units = []
    Data = []
    
    div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')
    
    if not div_field==None:
        
        for div_category_data in div_field.find_all('div', class_=re.compile('category_data')):
            
            name = div_category_data.find('span', class_="subfield-name")
            num = div_category_data.find('span', class_="subfield-number")
                        
            # Check for scenario 2
            if 'text' in div_category_data['class']:
                continue
            
            # Check for scenarios 3 and 4    
            elif name==None or {'note','historic'} <= set(div_category_data['class']):
                if num:                    
                    Fields.append(field)
                    Units.append(get_unit(num))
                    Data.append(convert_to_num(num))                    
                    Subfields.append('')                    
                break        
            
            # Remaining case is scenario 1
            else:
                Fields.append(field)
                Units.append(get_unit(num))
                Data.append(convert_to_num(num))
                Subfields.append(name.string)
                
    return Fields, Subfields, Units, Data


def create_dataframe(Fields, Subfields, Units, Data):
    """
    Creates a hierarchical data frame from the inputted lists
    
    Parameters:
    ----------------
    Fields: list
         Strings representing the Field level in the MultiIndex
    Subfields: list
         Strings representing the Subfields level in the MultiIndex
    Units: list
        Strings representing the Units level in the MultiIndex
    Data: list
         Numerical data
        
    Returns:    
    ----------------
    hier_df: dataframe object
         Pandas MultiIndex dataframe object
    
    """  
    columns = pd.MultiIndex.from_arrays(arrays=[Fields, Subfields, Units], names=['Field', 'Subfield', 'Units'])
    hier_df = pd.DataFrame([Data], columns=columns)
    return hier_df

def country_profile(name, url, fields, sleep_length=5):
    """
    Create a table of data given a CIA world factbook country profile and fields of interest.
    
    Parameters:
    ----------------
    name: string
        Name of country on the options menu           
    url: string
        Website address of a country profile in the CIA world factbook
    fields: list
        List of strings corresponding to the Fields of information referenced in:
        https://www.cia.gov/library/publications/the-world-factbook/docs/profileguide.html    
    sleep_length: int
        Length of pause in seconds before trying another request
    
    Returns:    
    ----------------
    hier_df: pandas Dataframe object
         Hierarchical data frame with Field, Subfield, and Units as levels
    """ 
    
    html_soup, text = get_soup(url, sleep_length)
    Fields_ = ['Name']
    Subfields_ = ['']
    Units_ = ['']
    Data_ = [str(html_soup.find('span', {'class': ['region_name1', 'countryName']}).string)]
        
    for field in fields:
        if not html_soup.find('a', href=re.compile('.*#\d+'), string=field):
            continue
        Fields, Subfields, Units, Data = find_field(field, html_soup)        
        Fields_.extend(Fields)
        Subfields_.extend(Subfields)
        Data_.extend(Data)
        Units_.extend(Units)
    
    return Fields_, Subfields_, Units_, Data_
    
def world_table(fields, banned=[], sleep_length=5):
    """
    Creates a table version of selected CIA world factbook data. Requires numpy as np, 
    pandas as pd, BeautifulSoup, request, and re libraries.
    
    Parameters:
    ----------------
    fields: list
        List of exact field names to match
    banned: list
        List of countries to skip
    sleep_length: int
        Length of pause in seconds before trying another request
        
    Returns:    
    ----------------
    table: pandas dataframe
        MultiIndex data frame with strings representing data in the specified fields
         
    """ 
    # Create a list of country profile urls
    country_sites = sites_list(banned)
    site_countries = {country_sites[country]: country for country in country_sites}
    frames = []
    for name, url in country_sites.items():
        try:
            html_soup, text = get_soup(url, sleep_length)
            frames.append(*create_dataframe(country_profile(name, url, fields, sleep_length)))
        except AttributeError as err:
            print('%s was skipped due to %s.' % (site_countries[url], err))
        except IndexError as err:
            print('%s was skipped due to %s.' % (site_countries[url], err))
            
    #return pd.concat(frames)
    return frames

<a id='multithreading'></a>

## 1.7 Multithreading

Here we compare the task completion times for accessing the sites one at a time (serially) versus switching between threads (multithreading). Multithreading requires us to feed tasks to a worker function using a queue data structure.

In [None]:
import time
from threading import Thread
from queue import Queue

# Set up threading
NUM_WORKERS = 4
task_queue = Queue()

def worker():
    while True:
        # Constantly check the queue for addresses
        site = task_queue.get()
        site_crawl(site, 5)

        # Mark task as done
        task_queue.task_done()

sites = sites_list(banned)

In [None]:
start_time = time.time()
for country in sites:
    site_crawl(sites[country])
end_time = time.time()


print("Serial time=", end_time - start_time)

In [None]:
# Create worker threads

start_time = time.time()
threads = [Thread(target=worker) for _ in range(NUM_WORKERS)]

# Add website to task queue
[task_queue.put(sites[country]) for country in sites]

# Start all workers
[thread.start() for thread in threads]

# Wait for all the tasks in the queue to be processed
task_queue.join()
end_time = time.time()
 
print("Threading time=", end_time - start_time)

### Saving Data While Threading

In [None]:
from threading import Thread
from queue import Queue

def site_save(country_site, sleep_length=0):
    """
    Test retrieval of site information by saving country name to a list.
    
    Parameters
    -----------
    visited : list
        Names of countries sites visited
    sleep_length : int
        Number of seconds to pause before retrying a web page request
    """
    global countries_visited
    html_soup, text = get_soup(country_site, sleep_length)
    name = html_soup.find('span', {'class': 'region_name1 countryName '}).string
    countries_visited.append(name)
    print(name)

In [None]:
global countries_visited 
countries_visited = []
NUM_WORKERS = 4
sites = sites_list(banned)

def worker():
    global countries_visited
    while True:
        # Constantly check the queue for addresses
        site = task_queue.get()
        site_save(site, 5)

        # Mark task as done
        task_queue.task_done()

    
# Create threads and Q
task_queue = Queue()
# Add website to task queue
[task_queue.put(sites[country]) for country in sites]

threads = [Thread(target=worker) for _ in range(NUM_WORKERS)]


# Start all workers
[thread.start() for thread in threads]

# Wait for all the tasks in the queue to be processed
task_queue.join()

In [None]:
len(set(countries_visited))

In [None]:
countries_visited

### Putting it all together: Multithreading World Table

In [None]:
def world_table_threaded(fields, banned=[], sleep_length=5):
    """
    Creates a threaded table version of selected CIA world factbook data. Requires numpy as np, 
    pandas as pd, BeautifulSoup, request, threading and re libraries.
    
    Parameters:
    ----------------
    fields: list
        List of exact field names to match
    banned: list
        List of countries to skip
    sleep_length: int
        Length of pause in seconds before trying another request
        
    Returns:    
    ----------------
    frames: list 
         List of pandas MultiIndex dataframes 
         
    """ 
    frames = []
    NUM_WORKERS = 4
    country_sites = sites_list(banned)
    site_countries = {country_sites[country]:country for country in country_sites}
    
    def worker():
        
        while True:
            # Constantly check the queue for addresses
            url = task_queue.get()
            name = site_countries[url]
            try:
                html_soup, text = get_soup(url, sleep_length)
                frames.append(create_dataframe(*country_profile(name, url, fields, sleep_length)))
                print(name, ' done')
            except AttributeError as err:
                print('%s was skipped due to %s.' % (site_countries[url], err))
            except IndexError as err:
                print('%s was skipped due to %s.' % (site_countries[url], err))
                
            # Mark task as done
            task_queue.task_done()

    # Create threads and Q
    task_queue = Queue()
    # Add website to task queue
    [task_queue.put(country_sites[country]) for country in country_sites]

    threads = [Thread(target=worker) for _ in range(NUM_WORKERS)]


    # Start all workers
    [thread.start() for thread in threads]

    # Wait for all the tasks in the queue to be processed
    task_queue.join()
    return frames

<a id='viewing'></a>

# 2. Viewing the Data

<a id='problems'></a>

## 2.1 Identifying Problem Cases

Now that we've created functions to mine the data, and respective fields and subfields from the sites, we can finally attempt to create tables for each field. However, we soon notice that there are "problem cases" that result in entire columns created with all but one non-NA entry.

Some common features of these problem cases are:

- Aggregate entities such as the "European Union" have their composite entities as subfields, resulting in messy tables. This includes groups of territories that are lumped into one webpage.
- Non-nation state entities like Antarctica, the oceans, that lack a lot of info
- Arbitrary or mistaken variations of subfields

Some problem cases are not of interest since they are not nation-states and can be ignored. We can create a list of countries we'll call `banned` 

In [None]:
banned = ['Antarctica', 'French Southern and Antarctic Lands', 'European Union', 'Atlantic Ocean', 'Arctic Ocean', 
          'Indian Ocean', 'Pacific Ocean', 'Ashmore and Cartier Islands', 'United States Pacific Island Wildlife Refuges',
          'Baker Island', 'Howland Island', 'Jarvis Island', 'Johnston Atoll', 'Kingman Reef', 'Midway Islands',
          'Palmyra Atoll']

### Creating Field DataFrames

The function `world_table` returns a list of dataframes corresponding to each country, which can be concatenated into one large table afterwards. We can repeatedly use this on single fields to learn what cases may cause issues. 

As a warning, including all fields will result in a code runtime of hours. For shorter runtimes, we wish to loop through fewer fields.

Looping through all fields is accomplished in the following dictionary comprehension where the list of `Fields` can be substituted by any subset of fields desired:

In [None]:
Fields = []
for i in [Geography, Society, Economy, Energy, Communications, Transportation, Military]:
    Fields.extend(i)

In [None]:
field_frames = {}
for field in ['Land use']:
    try:
        field_frames[field] = world_table_threaded([field], banned)
        print('%s done ' % field, '-'*100)
    except:
        print('Exceptions raised with %s.' % field)

### Problems with Concatenation

We can automatically search through each field and try making a concatenated table, noting cases where there are errors.

In [None]:
field_dfs = []
failed_fields = []
for i in field_frames:
    try:
        frame = pd.concat(field_frames[i])
        field_dfs.append(frame)
    except:
        failed_fields.append(i)
        print('Unable to concatenate %s' %i)
        continue

<a id='database'></a>

# 3. Creating a Database

Parsing even just the names of each of the 240+ countries takes over a minute so it's worthwhile to store the data scraped in a persistent form for faster future reference, like a database. Here, we'll use [PostgreSQL](https://www.postgresql.org/) to store our data. We will modify previously written functions to store the data. Each field will be the name of a table while columns will be named after subfields or even lower level categories if applicable.

First we connect to a database created in PostgreSQL via  the command `CREATE DATABASE countries;` in SQL and then use the python module `psycopg2` to interface with the `countries` database and store the data.

We will need to create functions that perform basic operations including:

- creating a table for each field with the correct
- inserting an entry for a country by entering its data in the appropriate columns

<a id='database_functions'></a>

## 3.1 Database Functions

In [None]:
import psycopg2 as ps
import re

def sql_create_title(text):
    """
    Generate SQL compatible title from raw text
    
    Parameters:
    -------------
    text : string
        The candidate title name to be converted
    
    Returns:
    -------------
    title : string
        SQL friendly table title
    
    """    
    
    removal = [r'\s\(\+\)', r'\s\(\-\)', r'[(),:]']
    replacement = ['/', '\s\-\s', '\s', '-']
    
    # No starting numeric character
    if re.search(r'\d', text[0]):
        text = '_' + text
    # Special character removal
    for i in removal:
        text = re.sub(i, '', text)
    for i in replacement:
        text = re.sub(i,'_', text)
    # White space removal
    text = text.lstrip().rstrip()        
    return text

def sql_list_columns(table):
    """
    Generates SQL text to query table columns
    
    Parameters:
    ---------------
    table: string
        The name that will be used as the name of the table
    Returns:    
    ----------------
    sql: string
         String to use in connection.execute(sql)
    args: list
         List of arguments to use in connection.execute(sql, args)
    """  
    sql = f"""PRAGMA TABLE_INFO({table})"""
    
    return sql

def sql_alter_table(table, column):
    """
    Generates SQL text needed to add new table columns
    
    Parameters:
    ---------------
    table: string
        The name that will be used as the name of the table
    column: string
       Name of the new column
        
    Returns:    
    ----------------
    sql: string
         String to use in connection.execute(sql)
       
    """  
    sql = f"""ALTER TABLE {table} ADD {column} real;"""
    return sql

def sql_create_table(field, columns):
    """
    Generate the SQL text needed to create a table representing the fields
    
    Parameters:
    ----------------
    field: string
        The field name that will be used as the name of the table
    columns: list of strings
        A list of strings that will be the subfields or subcategories and the names of the columns
        
    Returns:    
    ----------------
    sql: string
         String to use in connection.execute(sql)
    
    """  

    sql = f"""CREATE TABLE {field} (\n country text PRIMARY KEY,            
        """
    for i in columns:
        sql+=f"\n \t {i} real," 
    sql = sql[:-1]+");"
    return sql

def sql_fill_table(table_title, data):
    """
    Generates SQLite compatible string that can be fed into the execute method with
    the data
    
    Parameters:
    ---------------
    table_title: string
        The name of the table
    data : list of tuples of floats
        A list of tuples of floats representing the data to be inserted into the table, with entries corresponding to columns
        
    Returns:    
    ----------------
    sql: string
         String to use in connection.executemany(sql)
    
    """
    
    sql = f"""INSERT INTO {table_title} VALUES ("""
    
    for i in data[0]:
        sql+='?,'
    sql = sql[:-1] +');'        
    return sql   

<a id='testing'></a>

## 3.2 Webscraping into a Database

Here we'll attempt to use the functions to create a table and insert one entry. Below are the prerequisites to load and prepare our session. By loading the previous scripts from an external source, we can restart the kernel and start from here instead of running the whole notebook again.

#### Load Prerequisites: Reset Point

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import psycopg2 as ps
import re
from time import sleep
import requests
from bs4 import BeautifulSoup
from countries_functions_sqlite import *
from threading import Thread, Lock
from queue import Queue
from collections import defaultdict
import sqlite3

Geography = ['Area', 'Land boundaries', 'Coastline', 'Land use', 'Irrigated land']
Society = ['Population', 'Age structure', 'Dependency ratios', 'Median age', 'Population growth rate', 'Birth rate',
           'Death rate', 'Net migration rate', 'Urbanization', 'Sex ratio', "Mother's mean age at first birth", 
           'Maternal mortality ratio', 'Infant mortality rate', 'Life expectancy at birth', 'Total fertility rate', 
           'Contraceptive prevalence rate', 'Health expenditures', 'Physicians density', 'Hospital bed density', 
           'Drinking water source', 'Sanitation facility access', 'HIV/AIDS - adult prevalence rate', 
           'HIV/AIDS - people living with HIV/AIDS', 'HIV/AIDS - deaths', 'Obesity - adult prevalence rate', 
           'Children under the age of 5 years underweight', 'Education expenditures', 'Literacy', 
           'School life expectancy (primary to tertiary education)']

Economy = ['GDP (purchasing power parity)', 'GDP (official exchange rate)', 'GDP - real growth rate', 'Gross national saving',
           'GDP - composition, by end use', 'GDP - composition, by sector of origin', 'Industrial production growth rate',
           'Labor force', 'Labor force - by occupation', 'Unemployment rate', 'Population below poverty line',
           'Household income or consumption by percentage share','Distribution of family income - Gini index', 'Budget', 
           'Taxes and other revenues', 'Budget surplus (+) or deficit (-)', 'Public debt', 'Inflation rate (consumer prices)', 
           'Commercial bank prime lending rate','Stock of narrow money', 'Stock of broad money', 'Stock of domestic credit', 
           'Market value of publicly traded shares', 'Current account balance', 'Exports', 'Imports', 
           'Reserves of foreign exchange and gold', 'Debt - external']

Energy = ['Electricity access', 'Electricity - production', 'Electricity - exports', 'Electricity - imports', 
          'Electricity - installed generating capacity', 'Electricity - from fossil fuels', 
          'Electricity - from hydroelectric plants', 'Electricity - from other renewable sources', 'Crude oil - production',
          'Crude oil - exports', 'Crude oil - imports', 'Crude oil - proved reserves', 'Refined petroleum products - production',
          'Refined petroleum products - consumption', 'Refined petroleum products - exports', 
          'Refined petroleum products - imports', 'Natural gas - production', 'Natural gas - consumption', 
          'Natural gas - exports', 'Natural gas - imports', 'Carbon dioxide emissions from consumption of energy']

Communications = ['Telephones - fixed lines', 'Telephones - mobile cellular', 'Internet users']

Transportation = ['National air transport system', 'Airports', 'Airports - with paved runways', 
                  'Airports - with unpaved runways', 'Heliports', 'Pipelines', 'Roadways', 'Waterways']

Military = ['Military expenditures']

Fields = []
for i in [Geography, Society, Economy, Energy, Communications, Transportation, Military]:
    Fields.extend(i)
banned = ['Antarctica', 'French Southern and Antarctic Lands', 'European Union', 'Atlantic Ocean', 'Arctic Ocean', 
          'Indian Ocean', 'Pacific Ocean', 'Ashmore and Cartier Islands', 'United States Pacific Island Wildlife Refuges',
          'Baker Island', 'Howland Island', 'Jarvis Island', 'Johnston Atoll', 'Kingman Reef', 'Midway Islands',
          'Palmyra Atoll', 'Southern Ocean', 'World']

country_sites = sites_list(banned)
table_titles = {field: sql_create_title(field) for field in Fields}


In [None]:
country_sites = sites_list(banned)
test_sites = {j:country_sites[j] for i, j in enumerate(country_sites.keys())
             if i <10}
html_soup, raw_text = get_soup(country_sites['Syria'])
field = ['GDP (purchasing power parity)', 'GDP (official exchange rate)','Inflation rate (consumer prices)']
div_field = html_soup.find('a', href=re.compile('.*#\d+'), string=field).find_next('div')


In [None]:
fields, subfields, units, data = find_field(field[0], html_soup)

In [None]:
for div_category_data in div_field.find_all('div', class_=re.compile('category_data')):
    name = div_category_data.find('span', class_="subfield-name")
    num = div_category_data.find('span', class_="subfield-number")
    print(name)
    print(num)

In [None]:
sql_create_title(field[2])

### Creating SQL-Compatible Titles

Allowed titles for database tables exclude many special characters such as hyphens and parentheses. As a result, we need to construct a dictionary to relate the Field name with table title. To systematically convert field names on the website to SQL-friendly titles we apply the following rules:

- Special character phrases to be removed:  " (-)", "(+)", "(", ")"
- Special character phrases to be replaced with an underscore: " - ", "/", " "

### Unique Column Names
In addition to table names, we need to consider appropriate column names for the tables. The column names will be comprised of information from the `Fields`, `Subfields`, and `Units` lists.

Each country will fall among the following scenarios to consider:

1. Data is only a single number like in "Population" and has no subfield, so the column will be assigned the value of field.

2. Data is only a single number, like in some small nations for the field "Area" but has a subfield such as "total:", so the column name will be assigned the value of subfield.

3. Multiple data like in "Area" which can be labeled uniquely by subfield, or subfield - unit combinations.

### SQL-Compatible Column Names 

Once unique column names are generated, we must also make sure they are compatible with SQL constraints.

1. Column names must not start with number characters, which is sometimes a problem encountered in subfields referring to ages or lengths. This can be checked with regular expressions.

2. Special characters should be treated in the same way as table titles, with `sql_create_title()`



In [None]:
for i in Fields:
    print(i)
    print(sql_create_title(i))
    print('-'*40)

In [None]:
text = 'Budget surplus (+) or deficit (-)'
removal = [r'\s\(\+\)', r'\s\(\-\)', r'[(]', r'[)]']
replacement = ['/', '\s\-\s', '\s']
    
for i in removal:
    text = re.sub(i, '', text)
print(text)
for i in replacement:
    text = re.sub(i,'_', text)        

print(text)

In [None]:
table_titles = {field: sql_create_title(field) for field in Fields}

### Interacting with SQL Database via Python

Below we demonstrate Python's ability to execute SQL. In the following code we create a table, insert a line of data, load the data from the table to a DataFrame object to be displayed, and finally delete the table.


In [None]:
url = 'https://www.cia.gov/library/publications/the-world-factbook/geos/us.html'
sqlite_file = 'test_sqlite.db'
with sqlite3.connect(sqlite_file) as conn:
        cur = conn.cursor()
        # Enter data in to database
        html_soup,text = get_soup(url)
        country= 'United States'
        field = 'Sanitation facility access'
        fields, subfields, units, data = find_field(field, html_soup)
        
        # format data into a list of tuple
        data = [(country, *data)]
        # Decide column names based on scenarios
        # Scenario 1
        if len(fields)==1:
            columns = [sql_create_title(i) for i in fields]
                
        # Assume Scenario 2 for all other cases
        else:
            columns = [sql_create_title(i) for i in subfields]
        table_title = sql_create_title(field)
        sql_create = sql_create_table(table_title, columns)
        cur.execute(sql_create)
        cur.executemany(sql_fill_table(table_title, data), data)        
        
        # Query data and delete table
        df = pd.read_sql_query("SELECT * FROM %s;"% table_titles[field], conn)        
        cur.execute("DROP TABLE %s;" % table_titles[field])
conn.close()        
df

## Saving the World Factbook Data to a Database

Since waiting for the website response is the main bottleneck, threading is effective for achieving results. The script below sets up threading by splitting up the websites into a queue and having the individual threads each apply a "worker" function to continously consume the tasks until they are complete.

For SQLite, multithreading cannot be used to directly write to the database, since connections can't be shared. However, we can use multithreading to save the data to Python data structures and then serially write the data into the SQLite database afterwards.

### Writing the Data to a Nested Default Dictionary

The first level will have Field: defaultdict as the key:value pairs. The nested defaultdict will have column and data as keys and lists as values.

In [2]:
import sqlite3
from threading import Thread, Lock
from queue import Queue
from collections import defaultdict
import pickle


banned = ['Antarctica', 'French Southern and Antarctic Lands', 'European Union', 'Atlantic Ocean', 'Arctic Ocean', 
          'Indian Ocean', 'Pacific Ocean', 'Southern Ocean', 'Ashmore and Cartier Islands',
          'United States Pacific Island Wildlife Refuges',
          'Baker Island', 'Howland Island', 'Jarvis Island', 'Johnston Atoll', 'Kingman Reef', 'Midway Islands',
          'Palmyra Atoll', 'World']

#Fields = ['Area', 'Population', 'Elevation']
# Set up threading via websites in dictionary
country_sites = sites_list(banned)
NUM_WORKERS = 4
lock = Lock()
task_queue = Queue()
table_made = defaultdict(bool)
table_columns = defaultdict(list)
table_titles = {field: sql_create_title(field) for field in Fields}
test_sites = {j:country_sites[j] for i, j in enumerate(country_sites.keys())
             if i <10}

datadict = defaultdict(lambda: defaultdict(list))
# Add website to task queue, to play the role of "producer"

[task_queue.put(country) for country in country_sites]


# Define the "consumer" function to feed queue items to

def worker(country_sites, fields):
    """Worker function to keep performing task from queue"""
    global datadict
    global table_made
    global table_columns
    
    while True:
        # Constantly check the queue for addresses
        country = task_queue.get()
        html_soup, text = get_soup(country_sites[country])
        
        # Create or update table for each present field by checking content of each country profile url
        
        for field in fields:
            try:
                Fields, Subfields, Units, Data = find_field(field, html_soup)
                if not Fields == []:    
                    # Decide column names based on scenarios

                    # Check if Field is the necessary column label
                    if len(Fields)==1 and Subfields==['']:
                        columns = [sql_create_title(i) for i in Fields]

                    # For remaining cases, Subfield is a sufficient column label
                    else:
                        columns = [sql_create_title(i) for i in Subfields]

                    datadict[field]['columns'].append(('country', *columns))
                    datadict[field]['data'].append((country, *Data))
                                
            except:                        
                print(f'Problem with {country} - {field}')
        task_queue.task_done()
        print(country, ' done')
        
# Create and start workers
threads = [Thread(target=worker, args=(country_sites, 
                    Fields)) for _ in range(NUM_WORKERS)]

[thread.start() for thread in threads]

# Wait for all the tasks in the queue to be processed
task_queue.join()

Akrotiri  done
Albania  done
Algeria  done
Afghanistan  done
American Samoa  done
Andorra  done
Anguilla  done
Angola  done
Antigua and Barbuda  done
Argentina  done
Armenia  done
Aruba  done
Australia  done
Azerbaijan  done
BahrainBahamas, The  done
  done
Austria  done
Bangladesh  done
Barbados  done
Belarus  done
Belgium  done
Belize  done
Benin  done
Bermuda  done
Bhutan  done
Bouvet Island  done
Bolivia  done
Bosnia and Herzegovina  done
Botswana  done
British Indian Ocean Territory  done
British Virgin Islands  done
Brazil  done
Brunei  done
Bulgaria  done
Burkina Faso  done
Burundi  done
Burma  done
Cabo Verde  done
Cambodia  done
Cameroon  done
Canada  done
Cayman Islands  done
Central African Republic  done
Chad  done
Christmas Island  done
Chile  done
Clipperton Island  done
Cocos (Keeling) Islands  done
China  done
Colombia  done
Comoros  done
Congo, Democratic Republic of the  done
Congo, Republic of the  done
Cook Islands  done
Coral Sea Islands  done
Costa Rica  done
Cote

### Pickling the Data Structure for Persistence

Below is code to save the native python data structure into an external file for persistence.

In [3]:
import dill

# Save file for persistence via pickle
filename = 'datadict'
with open(filename, 'wb') as outfile:
    dill.dump(datadict, outfile)


### Writing the Data into and SQLite Database

By now we'll have a nested default dictionary which we will then iterate through to create tables in SQLite. For each field and corresponding table, we'll use the `columns` key's values to construct the table and then the `data` key's values to fill in the table. The database will be saved as a file defined by the value of the variable `sqlite_file` in the same location as this notebook unless another path is specified.

In [4]:
import dill
filename = 'datadict'
with open(filename, 'rb') as infile:
    datadict_loaded = dill.load(infile)
 

In [5]:
import os
sqlite_file = 'countries_sqlite.db'
if os.path.exists(sqlite_file):
    os.remove(sqlite_file)
else:
    print("The file does not exist")

with sqlite3.connect(sqlite_file) as conn:
    for field in datadict_loaded:
        table_title = sql_create_title(field)
        print(table_title)
        data = datadict_loaded[field]['data']
        columns = datadict_loaded[field]['columns']
        # first ensure all possible fields are included in table since some entries are missing
        try:
            conn.execute(sql_create_table(table_title, columns[0]))        
            for cols,vals in zip(columns, data):
                new_columns = set(cols) - set(sql_table_columns(table_title, conn))
                [conn.execute(sql_add_column(table_title, i, 'real')) for i in new_columns]
                try:
                    conn.execute(sql_update_table(table_title, cols, vals), vals)
                    print(f'{field} - {vals[0]} done')
                except:
                    print(f'Problem with {field} - {vals[0]}')
                    print(f'Columns: {cols}')
                    print(f'Values: {vals}')
        except:
            print(f'Problem with table {table_title} and columns {columns[0]}')

Area
Area - Akrotiri done
Area - Algeria done
Area - Albania done
Area - Afghanistan done
Area - American Samoa done
Area - Andorra done
Area - Angola done
Area - Anguilla done
Area - Antigua and Barbuda done
Area - Argentina done
Area - Armenia done
Area - Aruba done
Area - Australia done
Area - Austria done
Area - Azerbaijan done
Area - Bahamas, The done
Area - Bahrain done
Area - Bangladesh done
Area - Barbados done
Area - Belarus done
Area - Belgium done
Area - Belize done
Area - Benin done
Area - Bermuda done
Area - Bhutan done
Area - Bolivia done
Area - Bosnia and Herzegovina done
Area - Bouvet Island done
Area - Botswana done
Area - British Indian Ocean Territory done
Area - Brazil done
Area - British Virgin Islands done
Area - Brunei done
Area - Bulgaria done
Area - Burkina Faso done
Area - Burundi done
Area - Burma done
Area - Cabo Verde done
Area - Cambodia done
Area - Cameroon done
Area - Canada done
Area - Cayman Islands done
Area - Central African Republic done
Area - Chad

Land boundaries - Eswatini done
Land boundaries - Ethiopia done
Land boundaries - Falkland Islands (Islas Malvinas) done
Land boundaries - Faroe Islands done
Land boundaries - Fiji done
Land boundaries - Finland done
Land boundaries - French Polynesia done
Land boundaries - Gabon done
Land boundaries - Gambia, The done
Land boundaries - Gaza Strip done
Land boundaries - Georgia done
Land boundaries - Germany done
Land boundaries - Gibraltar done
Land boundaries - Ghana done
Land boundaries - Greece done
Land boundaries - Greenland done
Land boundaries - Grenada done
Land boundaries - Guam done
Land boundaries - Guatemala done
Land boundaries - Guernsey done
Land boundaries - Guinea-Bissau done
Land boundaries - Guinea done
Land boundaries - Guyana done
Land boundaries - Haiti done
Land boundaries - Heard Island and McDonald Islands done
Land boundaries - Holy See (Vatican City) done
Land boundaries - Honduras done
Land boundaries - Hong Kong done
Land boundaries - Hungary done
Land bou

Coastline - Luxembourg done
Coastline - Macau done
Coastline - Madagascar done
Coastline - Malawi done
Coastline - Malaysia done
Coastline - Maldives done
Coastline - Malta done
Coastline - Mali done
Coastline - Marshall Islands done
Coastline - Mauritania done
Coastline - Mauritius done
Coastline - Mexico done
Coastline - Micronesia, Federated States of done
Coastline - Monaco done
Coastline - Moldova done
Coastline - Mongolia done
Coastline - Montenegro done
Coastline - Montserrat done
Coastline - Morocco done
Coastline - Mozambique done
Coastline - Namibia done
Coastline - Nauru done
Coastline - Navassa Island done
Coastline - Nepal done
Coastline - Netherlands done
Coastline - New Caledonia done
Coastline - New Zealand done
Coastline - Nicaragua done
Coastline - Nigeria done
Coastline - Niger done
Coastline - Norfolk Island done
Coastline - Niue done
Coastline - North Macedonia done
Coastline - Northern Mariana Islands done
Coastline - Oman done
Coastline - Norway done
Coastline - 

Land use - Trinidad and Tobago done
Land use - Tunisia done
Land use - Turkey done
Land use - Turkmenistan done
Land use - Turks and Caicos Islands done
Land use - Tuvalu done
Land use - United Arab Emirates done
Land use - Uganda done
Land use - Ukraine done
Land use - United Kingdom done
Land use - Uruguay done
Land use - Uzbekistan done
Land use - United States done
Land use - Vanuatu done
Land use - Venezuela done
Land use - Vietnam done
Land use - Wake Island done
Land use - Virgin Islands done
Land use - Wallis and Futuna done
Land use - West Bank done
Land use - Western Sahara done
Land use - Zambia done
Land use - Yemen done
Land use - Zimbabwe done
Irrigated_land
Irrigated land - Algeria done
Irrigated land - Albania done
Irrigated land - Afghanistan done
Irrigated land - American Samoa done
Irrigated land - Andorra done
Irrigated land - Angola done
Irrigated land - Anguilla done
Irrigated land - Antigua and Barbuda done
Irrigated land - Argentina done
Irrigated land - Armenia

Population - Kenya done
Population - Korea, South done
Population - Korea, North done
Population - Kosovo done
Population - Kuwait done
Population - Kyrgyzstan done
Population - Laos done
Population - Latvia done
Population - Lebanon done
Population - Lesotho done
Population - Liberia done
Population - Libya done
Population - Liechtenstein done
Population - Lithuania done
Population - Luxembourg done
Population - Macau done
Population - Madagascar done
Population - Malawi done
Population - Malaysia done
Population - Maldives done
Population - Malta done
Population - Mali done
Population - Marshall Islands done
Population - Mauritania done
Population - Mauritius done
Population - Mexico done
Population - Micronesia, Federated States of done
Population - Monaco done
Population - Moldova done
Population - Mongolia done
Population - Montenegro done
Population - Montserrat done
Population - Morocco done
Population - Mozambique done
Population - Namibia done
Population - Nauru done
Populatio

Age structure - Thailand done
Age structure - Togo done
Age structure - Tonga done
Age structure - Trinidad and Tobago done
Age structure - Tunisia done
Age structure - Turkey done
Age structure - Turkmenistan done
Age structure - Turks and Caicos Islands done
Age structure - Tuvalu done
Age structure - United Arab Emirates done
Age structure - Uganda done
Age structure - Ukraine done
Age structure - United Kingdom done
Age structure - Uruguay done
Age structure - Uzbekistan done
Age structure - United States done
Age structure - Vanuatu done
Age structure - Venezuela done
Age structure - Vietnam done
Age structure - Virgin Islands done
Age structure - Wallis and Futuna done
Age structure - West Bank done
Age structure - Western Sahara done
Age structure - Zambia done
Age structure - Yemen done
Age structure - Zimbabwe done
Dependency_ratios
Dependency ratios - Algeria done
Dependency ratios - Albania done
Dependency ratios - Afghanistan done
Dependency ratios - Angola done
Dependency 

Median age - Mauritius done
Median age - Mexico done
Median age - Micronesia, Federated States of done
Median age - Monaco done
Median age - Moldova done
Median age - Mongolia done
Median age - Montenegro done
Median age - Montserrat done
Median age - Morocco done
Median age - Mozambique done
Median age - Namibia done
Median age - Nauru done
Median age - Nepal done
Median age - Netherlands done
Median age - New Caledonia done
Median age - New Zealand done
Median age - Nicaragua done
Median age - Niger done
Median age - Nigeria done
Median age - North Macedonia done
Median age - Northern Mariana Islands done
Median age - Oman done
Median age - Norway done
Median age - Palau done
Median age - Pakistan done
Median age - Panama done
Median age - Papua New Guinea done
Median age - Paraguay done
Median age - Peru done
Median age - Philippines done
Median age - Poland done
Median age - Portugal done
Median age - Puerto Rico done
Median age - Qatar done
Median age - Romania done
Median age - R

Birth rate - Bahrain done
Birth rate - Bangladesh done
Birth rate - Barbados done
Birth rate - Belarus done
Birth rate - Belgium done
Birth rate - Belize done
Birth rate - Benin done
Birth rate - Bermuda done
Birth rate - Bhutan done
Birth rate - Bolivia done
Birth rate - Bosnia and Herzegovina done
Birth rate - Botswana done
Birth rate - Brazil done
Birth rate - British Virgin Islands done
Birth rate - Brunei done
Birth rate - Bulgaria done
Birth rate - Burkina Faso done
Birth rate - Burundi done
Birth rate - Burma done
Birth rate - Cabo Verde done
Birth rate - Cambodia done
Birth rate - Cameroon done
Birth rate - Canada done
Birth rate - Cayman Islands done
Birth rate - Central African Republic done
Birth rate - Chad done
Birth rate - Chile done
Birth rate - China done
Birth rate - Colombia done
Birth rate - Comoros done
Birth rate - Congo, Democratic Republic of the done
Birth rate - Congo, Republic of the done
Birth rate - Cook Islands done
Birth rate - Costa Rica done
Birth rate -

Death rate - Nigeria done
Death rate - North Macedonia done
Death rate - Northern Mariana Islands done
Death rate - Oman done
Death rate - Norway done
Death rate - Palau done
Death rate - Pakistan done
Death rate - Panama done
Death rate - Papua New Guinea done
Death rate - Paraguay done
Death rate - Peru done
Death rate - Philippines done
Death rate - Poland done
Death rate - Portugal done
Death rate - Puerto Rico done
Death rate - Qatar done
Death rate - Romania done
Death rate - Russia done
Death rate - Rwanda done
Death rate - Saint Helena, Ascension, and Tristan da Cunha done
Death rate - Saint Kitts and Nevis done
Death rate - Saint Lucia done
Death rate - Saint Pierre and Miquelon done
Death rate - Saint Vincent and the Grenadines done
Death rate - San Marino done
Death rate - Samoa done
Death rate - Sao Tome and Principe done
Death rate - Saudi Arabia done
Death rate - Serbia done
Death rate - Senegal done
Death rate - Seychelles done
Death rate - Sierra Leone done
Death rate -

Urbanization - Bosnia and Herzegovina done
Urbanization - Botswana done
Urbanization - Brazil done
Urbanization - British Virgin Islands done
Urbanization - Brunei done
Urbanization - Bulgaria done
Urbanization - Burkina Faso done
Urbanization - Burundi done
Urbanization - Burma done
Urbanization - Cabo Verde done
Urbanization - Cambodia done
Urbanization - Cameroon done
Urbanization - Canada done
Urbanization - Cayman Islands done
Urbanization - Central African Republic done
Urbanization - Chad done
Urbanization - Chile done
Urbanization - China done
Urbanization - Colombia done
Urbanization - Comoros done
Urbanization - Congo, Democratic Republic of the done
Urbanization - Congo, Republic of the done
Urbanization - Cook Islands done
Urbanization - Costa Rica done
Urbanization - Cote d'Ivoire done
Urbanization - Croatia done
Urbanization - Cuba done
Urbanization - Curacao done
Urbanization - Cyprus done
Urbanization - Denmark done
Urbanization - Czechia done
Urbanization - Djibouti do

Sex ratio - Montserrat done
Sex ratio - Morocco done
Sex ratio - Mozambique done
Sex ratio - Namibia done
Sex ratio - Nauru done
Sex ratio - Nepal done
Sex ratio - Netherlands done
Sex ratio - New Caledonia done
Sex ratio - New Zealand done
Sex ratio - Nicaragua done
Sex ratio - Niger done
Sex ratio - Nigeria done
Sex ratio - North Macedonia done
Sex ratio - Northern Mariana Islands done
Sex ratio - Oman done
Sex ratio - Norway done
Sex ratio - Palau done
Sex ratio - Pakistan done
Sex ratio - Panama done
Sex ratio - Papua New Guinea done
Sex ratio - Paraguay done
Sex ratio - Peru done
Sex ratio - Philippines done
Sex ratio - Poland done
Sex ratio - Portugal done
Sex ratio - Puerto Rico done
Sex ratio - Qatar done
Sex ratio - Romania done
Sex ratio - Russia done
Sex ratio - Saint Barthelemy done
Sex ratio - Rwanda done
Sex ratio - Saint Helena, Ascension, and Tristan da Cunha done
Sex ratio - Saint Kitts and Nevis done
Sex ratio - Saint Lucia done
Sex ratio - Saint Martin done
Sex ratio

Mother's mean age at first birth - Yemen done
Mother's mean age at first birth - Zimbabwe done
Infant_mortality_rate
Infant mortality rate - Albania done
Infant mortality rate - Afghanistan done
Infant mortality rate - Algeria done
Infant mortality rate - American Samoa done
Infant mortality rate - Andorra done
Infant mortality rate - Angola done
Infant mortality rate - Anguilla done
Infant mortality rate - Antigua and Barbuda done
Infant mortality rate - Argentina done
Infant mortality rate - Armenia done
Infant mortality rate - Aruba done
Infant mortality rate - Australia done
Infant mortality rate - Azerbaijan done
Infant mortality rate - Bahamas, The done
Infant mortality rate - Austria done
Infant mortality rate - Bahrain done
Infant mortality rate - Bangladesh done
Infant mortality rate - Barbados done
Infant mortality rate - Belarus done
Infant mortality rate - Belgium done
Infant mortality rate - Belize done
Infant mortality rate - Benin done
Infant mortality rate - Bermuda don

Life expectancy at birth - Afghanistan done
Life expectancy at birth - Algeria done
Life expectancy at birth - American Samoa done
Life expectancy at birth - Andorra done
Life expectancy at birth - Angola done
Life expectancy at birth - Anguilla done
Life expectancy at birth - Antigua and Barbuda done
Life expectancy at birth - Argentina done
Life expectancy at birth - Armenia done
Life expectancy at birth - Aruba done
Life expectancy at birth - Australia done
Life expectancy at birth - Azerbaijan done
Life expectancy at birth - Bahamas, The done
Life expectancy at birth - Austria done
Life expectancy at birth - Bahrain done
Life expectancy at birth - Bangladesh done
Life expectancy at birth - Barbados done
Life expectancy at birth - Belarus done
Life expectancy at birth - Belgium done
Life expectancy at birth - Belize done
Life expectancy at birth - Benin done
Life expectancy at birth - Bermuda done
Life expectancy at birth - Bhutan done
Life expectancy at birth - Bolivia done
Life ex

Total fertility rate - Afghanistan done
Total fertility rate - American Samoa done
Total fertility rate - Andorra done
Total fertility rate - Angola done
Total fertility rate - Anguilla done
Total fertility rate - Antigua and Barbuda done
Total fertility rate - Argentina done
Total fertility rate - Armenia done
Total fertility rate - Aruba done
Total fertility rate - Australia done
Total fertility rate - Azerbaijan done
Total fertility rate - Bahamas, The done
Total fertility rate - Austria done
Total fertility rate - Bahrain done
Total fertility rate - Bangladesh done
Total fertility rate - Barbados done
Total fertility rate - Belarus done
Total fertility rate - Belgium done
Total fertility rate - Belize done
Total fertility rate - Benin done
Total fertility rate - Bermuda done
Total fertility rate - Bhutan done
Total fertility rate - Bolivia done
Total fertility rate - Bosnia and Herzegovina done
Total fertility rate - Botswana done
Total fertility rate - British Virgin Islands done


Contraceptive prevalence rate - Honduras done
Contraceptive prevalence rate - Hong Kong done
Contraceptive prevalence rate - Hungary done
Contraceptive prevalence rate - India done
Contraceptive prevalence rate - Iran done
Contraceptive prevalence rate - Indonesia done
Contraceptive prevalence rate - Iraq done
Contraceptive prevalence rate - Ireland done
Contraceptive prevalence rate - Italy done
Contraceptive prevalence rate - Jamaica done
Contraceptive prevalence rate - Japan done
Contraceptive prevalence rate - Jordan done
Contraceptive prevalence rate - Kazakhstan done
Contraceptive prevalence rate - Kiribati done
Contraceptive prevalence rate - Kenya done
Contraceptive prevalence rate - Korea, South done
Contraceptive prevalence rate - Korea, North done
Contraceptive prevalence rate - Kyrgyzstan done
Contraceptive prevalence rate - Laos done
Contraceptive prevalence rate - Lebanon done
Contraceptive prevalence rate - Lesotho done
Contraceptive prevalence rate - Liberia done
Contra

Health expenditures - Portugal done
Health expenditures - Qatar done
Health expenditures - Romania done
Health expenditures - Russia done
Health expenditures - Rwanda done
Health expenditures - Saint Kitts and Nevis done
Health expenditures - Saint Lucia done
Health expenditures - Saint Vincent and the Grenadines done
Health expenditures - San Marino done
Health expenditures - Samoa done
Health expenditures - Sao Tome and Principe done
Health expenditures - Saudi Arabia done
Health expenditures - Serbia done
Health expenditures - Senegal done
Health expenditures - Seychelles done
Health expenditures - Sierra Leone done
Health expenditures - Singapore done
Health expenditures - Slovakia done
Health expenditures - Slovenia done
Health expenditures - Solomon Islands done
Health expenditures - South Africa done
Health expenditures - South Sudan done
Health expenditures - Sri Lanka done
Health expenditures - Spain done
Health expenditures - Sudan done
Health expenditures - Suriname done
Hea

Hospital bed density - Guyana done
Hospital bed density - Haiti done
Hospital bed density - Honduras done
Hospital bed density - Hong Kong done
Hospital bed density - Hungary done
Hospital bed density - Iceland done
Hospital bed density - India done
Hospital bed density - Iran done
Hospital bed density - Indonesia done
Hospital bed density - Iraq done
Hospital bed density - Ireland done
Hospital bed density - Israel done
Hospital bed density - Italy done
Hospital bed density - Jamaica done
Hospital bed density - Japan done
Hospital bed density - Jordan done
Hospital bed density - Kazakhstan done
Hospital bed density - Kiribati done
Hospital bed density - Kenya done
Hospital bed density - Korea, South done
Hospital bed density - Korea, North done
Hospital bed density - Kuwait done
Hospital bed density - Kyrgyzstan done
Hospital bed density - Laos done
Hospital bed density - Latvia done
Hospital bed density - Lebanon done
Hospital bed density - Liberia done
Hospital bed density - Libya d

Drinking water source - Niger done
Drinking water source - Nigeria done
Drinking water source - Niue done
Drinking water source - North Macedonia done
Drinking water source - Northern Mariana Islands done
Drinking water source - Norway done
Drinking water source - Oman done
Drinking water source - Palau done
Drinking water source - Pakistan done
Drinking water source - Panama done
Drinking water source - Papua New Guinea done
Drinking water source - Paraguay done
Drinking water source - Peru done
Drinking water source - Philippines done
Drinking water source - Poland done
Drinking water source - Portugal done
Drinking water source - Puerto Rico done
Drinking water source - Qatar done
Drinking water source - Romania done
Drinking water source - Russia done
Drinking water source - Rwanda done
Drinking water source - Saint Kitts and Nevis done
Drinking water source - Saint Lucia done
Drinking water source - Saint Vincent and the Grenadines done
Drinking water source - Samoa done
Drinking 

Sanitation facility access - Serbia done
Sanitation facility access - Senegal done
Sanitation facility access - Seychelles done
Sanitation facility access - Sierra Leone done
Sanitation facility access - Singapore done
Sanitation facility access - Slovakia done
Sanitation facility access - Slovenia done
Sanitation facility access - Solomon Islands done
Sanitation facility access - Somalia done
Sanitation facility access - South Africa done
Sanitation facility access - South Sudan done
Sanitation facility access - Sri Lanka done
Sanitation facility access - Spain done
Sanitation facility access - Sudan done
Sanitation facility access - Suriname done
Sanitation facility access - Sweden done
Sanitation facility access - Switzerland done
Sanitation facility access - Syria done
Sanitation facility access - Tajikistan done
Sanitation facility access - Timor-Leste done
Sanitation facility access - Tanzania done
Sanitation facility access - Togo done
Sanitation facility access - Thailand done


Obesity - adult prevalence rate - Eritrea done
Obesity - adult prevalence rate - Estonia done
Obesity - adult prevalence rate - Eswatini done
Obesity - adult prevalence rate - Ethiopia done
Obesity - adult prevalence rate - Fiji done
Obesity - adult prevalence rate - Finland done
Obesity - adult prevalence rate - France done
Obesity - adult prevalence rate - Gabon done
Obesity - adult prevalence rate - Gambia, The done
Obesity - adult prevalence rate - Georgia done
Obesity - adult prevalence rate - Germany done
Obesity - adult prevalence rate - Ghana done
Obesity - adult prevalence rate - Greece done
Obesity - adult prevalence rate - Grenada done
Obesity - adult prevalence rate - Guatemala done
Obesity - adult prevalence rate - Guinea done
Obesity - adult prevalence rate - Guinea-Bissau done
Obesity - adult prevalence rate - Guyana done
Obesity - adult prevalence rate - Haiti done
Obesity - adult prevalence rate - Honduras done
Obesity - adult prevalence rate - Hungary done
Obesity - a

Children under the age of 5 years underweight - Tonga done
Children under the age of 5 years underweight - Trinidad and Tobago done
Children under the age of 5 years underweight - Tunisia done
Children under the age of 5 years underweight - Turkey done
Children under the age of 5 years underweight - Turkmenistan done
Children under the age of 5 years underweight - Uganda done
Children under the age of 5 years underweight - Uruguay done
Children under the age of 5 years underweight - United States done
Children under the age of 5 years underweight - Vanuatu done
Children under the age of 5 years underweight - Venezuela done
Children under the age of 5 years underweight - Vietnam done
Children under the age of 5 years underweight - West Bank done
Children under the age of 5 years underweight - Zambia done
Children under the age of 5 years underweight - Yemen done
Children under the age of 5 years underweight - Zimbabwe done
Airports
Airports - Akrotiri done
Airports - Albania done
Airpor

Airports - with paved runways - Grenada done
Airports - with paved runways - Guam done
Airports - with paved runways - Guatemala done
Airports - with paved runways - Guernsey done
Airports - with paved runways - Guinea done
Airports - with paved runways - Guinea-Bissau done
Airports - with paved runways - Guyana done
Airports - with paved runways - Haiti done
Airports - with paved runways - Honduras done
Airports - with paved runways - Hong Kong done
Airports - with paved runways - Hungary done
Airports - with paved runways - Iceland done
Airports - with paved runways - India done
Airports - with paved runways - Iran done
Airports - with paved runways - Indonesia done
Airports - with paved runways - Iraq done
Airports - with paved runways - Ireland done
Airports - with paved runways - Isle of Man done
Airports - with paved runways - Israel done
Airports - with paved runways - Jamaica done
Airports - with paved runways - Italy done
Airports - with paved runways - Japan done
Airports - w

Literacy - Lithuania done
Literacy - Macau done
Literacy - Madagascar done
Literacy - Malawi done
Literacy - Malaysia done
Literacy - Maldives done
Literacy - Malta done
Literacy - Mali done
Literacy - Marshall Islands done
Literacy - Mauritania done
Literacy - Mauritius done
Literacy - Mexico done
Literacy - Moldova done
Literacy - Mongolia done
Literacy - Montenegro done
Literacy - Morocco done
Literacy - Mozambique done
Literacy - Namibia done
Literacy - Nepal done
Literacy - New Caledonia done
Literacy - Nicaragua done
Literacy - Niger done
Literacy - Nigeria done
Literacy - North Macedonia done
Literacy - Oman done
Literacy - Palau done
Literacy - Pakistan done
Literacy - Panama done
Literacy - Papua New Guinea done
Literacy - Paraguay done
Literacy - Philippines done
Literacy - Peru done
Literacy - Poland done
Literacy - Portugal done
Literacy - Puerto Rico done
Literacy - Qatar done
Literacy - Romania done
Literacy - Russia done
Literacy - Rwanda done
Literacy - Samoa done
Liter

Education expenditures - Turkmenistan done
Education expenditures - Turks and Caicos Islands done
Education expenditures - Uganda done
Education expenditures - Ukraine done
Education expenditures - United Kingdom done
Education expenditures - Uzbekistan done
Education expenditures - Uruguay done
Education expenditures - United States done
Education expenditures - Vanuatu done
Education expenditures - Venezuela done
Education expenditures - Vietnam done
Education expenditures - West Bank done
Education expenditures - Zimbabwe done
School_life_expectancy_primary_to_tertiary_education
School life expectancy (primary to tertiary education) - Albania done
School life expectancy (primary to tertiary education) - Algeria done
School life expectancy (primary to tertiary education) - Afghanistan done
School life expectancy (primary to tertiary education) - Angola done
School life expectancy (primary to tertiary education) - Antigua and Barbuda done
School life expectancy (primary to tertiary ed

GDP (purchasing power parity) - China done
GDP (purchasing power parity) - Colombia done
GDP (purchasing power parity) - Comoros done
GDP (purchasing power parity) - Congo, Democratic Republic of the done
GDP (purchasing power parity) - Congo, Republic of the done
GDP (purchasing power parity) - Cook Islands done
GDP (purchasing power parity) - Costa Rica done
GDP (purchasing power parity) - Cote d'Ivoire done
GDP (purchasing power parity) - Croatia done
GDP (purchasing power parity) - Cuba done
GDP (purchasing power parity) - Curacao done
GDP (purchasing power parity) - Cyprus done
GDP (purchasing power parity) - Denmark done
GDP (purchasing power parity) - Czechia done
GDP (purchasing power parity) - Djibouti done
GDP (purchasing power parity) - Dominica done
GDP (purchasing power parity) - El Salvador done
GDP (purchasing power parity) - Dominican Republic done
GDP (purchasing power parity) - Ecuador done
GDP (purchasing power parity) - Egypt done
GDP (purchasing power parity) - Equ

GDP (official exchange rate) - El Salvador done
GDP (official exchange rate) - Dominican Republic done
GDP (official exchange rate) - Ecuador done
GDP (official exchange rate) - Egypt done
GDP (official exchange rate) - Equatorial Guinea done
GDP (official exchange rate) - Eritrea done
GDP (official exchange rate) - Estonia done
GDP (official exchange rate) - Eswatini done
GDP (official exchange rate) - Ethiopia done
GDP (official exchange rate) - Falkland Islands (Islas Malvinas) done
GDP (official exchange rate) - Faroe Islands done
GDP (official exchange rate) - Fiji done
GDP (official exchange rate) - Finland done
GDP (official exchange rate) - French Polynesia done
GDP (official exchange rate) - France done
GDP (official exchange rate) - Gabon done
GDP (official exchange rate) - Gambia, The done
GDP (official exchange rate) - Gaza Strip done
GDP (official exchange rate) - Georgia done
GDP (official exchange rate) - Germany done
GDP (official exchange rate) - Ghana done
GDP (offici

GDP - real growth rate - Germany done
GDP - real growth rate - Ghana done
GDP - real growth rate - Greece done
GDP - real growth rate - Greenland done
GDP - real growth rate - Grenada done
GDP - real growth rate - Guam done
GDP - real growth rate - Guatemala done
GDP - real growth rate - Guernsey done
GDP - real growth rate - Guinea done
GDP - real growth rate - Guinea-Bissau done
GDP - real growth rate - Guyana done
GDP - real growth rate - Haiti done
GDP - real growth rate - Honduras done
GDP - real growth rate - Hong Kong done
GDP - real growth rate - Hungary done
GDP - real growth rate - Iceland done
GDP - real growth rate - India done
GDP - real growth rate - Iran done
GDP - real growth rate - Indonesia done
GDP - real growth rate - Iraq done
GDP - real growth rate - Ireland done
GDP - real growth rate - Isle of Man done
GDP - real growth rate - Israel done
GDP - real growth rate - Italy done
GDP - real growth rate - Jamaica done
GDP - real growth rate - Japan done
GDP - real grow

Gross national saving - Iceland done
Gross national saving - India done
Gross national saving - Iran done
Gross national saving - Indonesia done
Gross national saving - Iraq done
Gross national saving - Ireland done
Gross national saving - Israel done
Gross national saving - Italy done
Gross national saving - Jamaica done
Gross national saving - Japan done
Gross national saving - Jordan done
Gross national saving - Kazakhstan done
Gross national saving - Kenya done
Gross national saving - Korea, South done
Gross national saving - Kosovo done
Gross national saving - Kuwait done
Gross national saving - Kyrgyzstan done
Gross national saving - Laos done
Gross national saving - Latvia done
Gross national saving - Lebanon done
Gross national saving - Lesotho done
Gross national saving - Libya done
Gross national saving - Lithuania done
Gross national saving - Luxembourg done
Gross national saving - Madagascar done
Gross national saving - Malawi done
Gross national saving - Malaysia done
Gros

GDP - composition, by end use - Montenegro done
GDP - composition, by end use - Montserrat done
GDP - composition, by end use - Morocco done
GDP - composition, by end use - Mozambique done
GDP - composition, by end use - Namibia done
GDP - composition, by end use - Nauru done
GDP - composition, by end use - Nepal done
GDP - composition, by end use - Netherlands done
GDP - composition, by end use - New Caledonia done
GDP - composition, by end use - New Zealand done
GDP - composition, by end use - Nicaragua done
GDP - composition, by end use - Niger done
GDP - composition, by end use - Nigeria done
GDP - composition, by end use - North Macedonia done
GDP - composition, by end use - Northern Mariana Islands done
GDP - composition, by end use - Oman done
GDP - composition, by end use - Norway done
GDP - composition, by end use - Palau done
GDP - composition, by end use - Pakistan done
GDP - composition, by end use - Panama done
GDP - composition, by end use - Papua New Guinea done
GDP - co

GDP - composition, by sector of origin - Montserrat done
GDP - composition, by sector of origin - Morocco done
GDP - composition, by sector of origin - Mozambique done
GDP - composition, by sector of origin - Namibia done
GDP - composition, by sector of origin - Nauru done
GDP - composition, by sector of origin - Nepal done
GDP - composition, by sector of origin - Netherlands done
GDP - composition, by sector of origin - New Caledonia done
GDP - composition, by sector of origin - New Zealand done
GDP - composition, by sector of origin - Nicaragua done
GDP - composition, by sector of origin - Niger done
GDP - composition, by sector of origin - Nigeria done
GDP - composition, by sector of origin - Niue done
GDP - composition, by sector of origin - North Macedonia done
GDP - composition, by sector of origin - Northern Mariana Islands done
GDP - composition, by sector of origin - Oman done
GDP - composition, by sector of origin - Norway done
GDP - composition, by sector of origin - Palau d

Industrial production growth rate - Lebanon done
Industrial production growth rate - Lesotho done
Industrial production growth rate - Liberia done
Industrial production growth rate - Libya done
Industrial production growth rate - Lithuania done
Industrial production growth rate - Macau done
Industrial production growth rate - Luxembourg done
Industrial production growth rate - Madagascar done
Industrial production growth rate - Malawi done
Industrial production growth rate - Malaysia done
Industrial production growth rate - Maldives done
Industrial production growth rate - Malta done
Industrial production growth rate - Mali done
Industrial production growth rate - Mauritania done
Industrial production growth rate - Mauritius done
Industrial production growth rate - Mexico done
Industrial production growth rate - Monaco done
Industrial production growth rate - Moldova done
Industrial production growth rate - Mongolia done
Industrial production growth rate - Montenegro done
Industrial pr

Labor force - Madagascar done
Labor force - Malawi done
Labor force - Malaysia done
Labor force - Maldives done
Labor force - Malta done
Labor force - Mali done
Labor force - Marshall Islands done
Labor force - Mauritania done
Labor force - Mauritius done
Labor force - Mexico done
Labor force - Micronesia, Federated States of done
Labor force - Monaco done
Labor force - Moldova done
Labor force - Mongolia done
Labor force - Montenegro done
Labor force - Montserrat done
Labor force - Morocco done
Labor force - Mozambique done
Labor force - Namibia done
Labor force - Nepal done
Labor force - Netherlands done
Labor force - New Caledonia done
Labor force - New Zealand done
Labor force - Nicaragua done
Labor force - Niger done
Labor force - Nigeria done
Labor force - Norfolk Island done
Labor force - Niue done
Labor force - North Macedonia done
Labor force - Northern Mariana Islands done
Labor force - Oman done
Labor force - Norway done
Labor force - Palau done
Labor force - Pakistan done
L

Labor force - by occupation - Venezuela done
Labor force - by occupation - Vietnam done
Labor force - by occupation - Virgin Islands done
Labor force - by occupation - Wallis and Futuna done
Labor force - by occupation - Western Sahara done
Labor force - by occupation - West Bank done
Labor force - by occupation - Zambia done
Labor force - by occupation - Zimbabwe done
Unemployment_rate
Unemployment rate - Albania done
Unemployment rate - Algeria done
Unemployment rate - Afghanistan done
Unemployment rate - American Samoa done
Unemployment rate - Andorra done
Unemployment rate - Angola done
Unemployment rate - Anguilla done
Unemployment rate - Antigua and Barbuda done
Unemployment rate - Argentina done
Unemployment rate - Armenia done
Unemployment rate - Aruba done
Unemployment rate - Australia done
Unemployment rate - Azerbaijan done
Unemployment rate - Bahamas, The done
Unemployment rate - Bahrain done
Unemployment rate - Austria done
Unemployment rate - Bangladesh done
Unemployment 

Population below poverty line - Benin done
Population below poverty line - Bhutan done
Population below poverty line - Bermuda done
Population below poverty line - Bolivia done
Population below poverty line - Bosnia and Herzegovina done
Population below poverty line - Botswana done
Population below poverty line - Brazil done
Population below poverty line - Bulgaria done
Population below poverty line - Burkina Faso done
Population below poverty line - Burundi done
Population below poverty line - Burma done
Population below poverty line - Cabo Verde done
Population below poverty line - Cambodia done
Population below poverty line - Cameroon done
Population below poverty line - Canada done
Population below poverty line - Central African Republic done
Population below poverty line - Chad done
Population below poverty line - Chile done
Population below poverty line - China done
Population below poverty line - Colombia done
Population below poverty line - Comoros done
Population below poverty

Household income or consumption by percentage share - Malawi done
Household income or consumption by percentage share - Malaysia done
Household income or consumption by percentage share - Maldives done
Household income or consumption by percentage share - Mali done
Household income or consumption by percentage share - Mauritania done
Household income or consumption by percentage share - Mexico done
Household income or consumption by percentage share - Moldova done
Household income or consumption by percentage share - Mongolia done
Household income or consumption by percentage share - Montenegro done
Household income or consumption by percentage share - Morocco done
Household income or consumption by percentage share - Mozambique done
Household income or consumption by percentage share - Namibia done
Household income or consumption by percentage share - Nepal done
Household income or consumption by percentage share - Netherlands done
Household income or consumption by percentage share -

Distribution of family income - Gini index - Slovakia done
Distribution of family income - Gini index - Slovenia done
Distribution of family income - Gini index - South Africa done
Distribution of family income - Gini index - South Sudan done
Distribution of family income - Gini index - Sri Lanka done
Distribution of family income - Gini index - Spain done
Distribution of family income - Gini index - Sweden done
Distribution of family income - Gini index - Switzerland done
Distribution of family income - Gini index - Taiwan done
Distribution of family income - Gini index - Tajikistan done
Distribution of family income - Gini index - Timor-Leste done
Distribution of family income - Gini index - Tanzania done
Distribution of family income - Gini index - Togo done
Distribution of family income - Gini index - Thailand done
Distribution of family income - Gini index - Tunisia done
Distribution of family income - Gini index - Turkey done
Distribution of family income - Gini index - Turkmenis

Taxes and other revenues - Japan done
Taxes and other revenues - Jersey done
Taxes and other revenues - Jordan done
Taxes and other revenues - Kazakhstan done
Taxes and other revenues - Kiribati done
Taxes and other revenues - Kenya done
Taxes and other revenues - Korea, South done
Taxes and other revenues - Korea, North done
Taxes and other revenues - Kosovo done
Taxes and other revenues - Kuwait done
Taxes and other revenues - Kyrgyzstan done
Taxes and other revenues - Laos done
Taxes and other revenues - Latvia done
Taxes and other revenues - Lebanon done
Taxes and other revenues - Lesotho done
Taxes and other revenues - Liberia done
Taxes and other revenues - Liechtenstein done
Taxes and other revenues - Libya done
Taxes and other revenues - Lithuania done
Taxes and other revenues - Macau done
Taxes and other revenues - Luxembourg done
Taxes and other revenues - Madagascar done
Taxes and other revenues - Malawi done
Taxes and other revenues - Malaysia done
Taxes and other revenues 

Budget surplus (+) or deficit (-) - Korea, South done
Budget surplus (+) or deficit (-) - Korea, North done
Budget surplus (+) or deficit (-) - Kosovo done
Budget surplus (+) or deficit (-) - Kuwait done
Budget surplus (+) or deficit (-) - Kyrgyzstan done
Budget surplus (+) or deficit (-) - Laos done
Budget surplus (+) or deficit (-) - Latvia done
Budget surplus (+) or deficit (-) - Lebanon done
Budget surplus (+) or deficit (-) - Lesotho done
Budget surplus (+) or deficit (-) - Liberia done
Budget surplus (+) or deficit (-) - Liechtenstein done
Budget surplus (+) or deficit (-) - Libya done
Budget surplus (+) or deficit (-) - Lithuania done
Budget surplus (+) or deficit (-) - Macau done
Budget surplus (+) or deficit (-) - Luxembourg done
Budget surplus (+) or deficit (-) - Madagascar done
Budget surplus (+) or deficit (-) - Malawi done
Budget surplus (+) or deficit (-) - Malaysia done
Budget surplus (+) or deficit (-) - Maldives done
Budget surplus (+) or deficit (-) - Malta done
Budg

Public debt - Italy done
Public debt - Jamaica done
Public debt - Japan done
Public debt - Jordan done
Public debt - Kazakhstan done
Public debt - Kiribati done
Public debt - Kenya done
Public debt - Korea, South done
Public debt - Kosovo done
Public debt - Kuwait done
Public debt - Kyrgyzstan done
Public debt - Laos done
Public debt - Latvia done
Public debt - Lebanon done
Public debt - Lesotho done
Public debt - Liberia done
Public debt - Libya done
Public debt - Lithuania done
Public debt - Macau done
Public debt - Luxembourg done
Public debt - Madagascar done
Public debt - Malawi done
Public debt - Malaysia done
Public debt - Maldives done
Public debt - Malta done
Public debt - Mali done
Public debt - Marshall Islands done
Public debt - Mauritania done
Public debt - Mauritius done
Public debt - Mexico done
Public debt - Micronesia, Federated States of done
Public debt - Moldova done
Public debt - Mongolia done
Public debt - Montenegro done
Public debt - Morocco done
Public debt - M

Inflation rate (consumer prices) - Malawi done
Inflation rate (consumer prices) - Malaysia done
Inflation rate (consumer prices) - Maldives done
Inflation rate (consumer prices) - Malta done
Inflation rate (consumer prices) - Mali done
Inflation rate (consumer prices) - Marshall Islands done
Inflation rate (consumer prices) - Mauritania done
Inflation rate (consumer prices) - Mauritius done
Inflation rate (consumer prices) - Mexico done
Inflation rate (consumer prices) - Micronesia, Federated States of done
Inflation rate (consumer prices) - Monaco done
Inflation rate (consumer prices) - Moldova done
Inflation rate (consumer prices) - Mongolia done
Inflation rate (consumer prices) - Montenegro done
Inflation rate (consumer prices) - Montserrat done
Inflation rate (consumer prices) - Morocco done
Inflation rate (consumer prices) - Mozambique done
Inflation rate (consumer prices) - Namibia done
Inflation rate (consumer prices) - Nauru done
Inflation rate (consumer prices) - Nepal done
In

Commercial bank prime lending rate - Mongolia done
Commercial bank prime lending rate - Montenegro done
Commercial bank prime lending rate - Montserrat done
Commercial bank prime lending rate - Morocco done
Commercial bank prime lending rate - Mozambique done
Commercial bank prime lending rate - Namibia done
Commercial bank prime lending rate - Nepal done
Commercial bank prime lending rate - Netherlands done
Commercial bank prime lending rate - New Zealand done
Commercial bank prime lending rate - Nicaragua done
Commercial bank prime lending rate - Niger done
Commercial bank prime lending rate - Nigeria done
Commercial bank prime lending rate - North Macedonia done
Commercial bank prime lending rate - Oman done
Commercial bank prime lending rate - Norway done
Commercial bank prime lending rate - Pakistan done
Commercial bank prime lending rate - Panama done
Commercial bank prime lending rate - Papua New Guinea done
Commercial bank prime lending rate - Paraguay done
Commercial bank prim

Stock of broad money - Anguilla done
Stock of broad money - Antigua and Barbuda done
Stock of broad money - Argentina done
Stock of broad money - Armenia done
Stock of broad money - Aruba done
Stock of broad money - Australia done
Stock of broad money - Azerbaijan done
Stock of broad money - Bahamas, The done
Stock of broad money - Bahrain done
Stock of broad money - Austria done
Stock of broad money - Bangladesh done
Stock of broad money - Barbados done
Stock of broad money - Belarus done
Stock of broad money - Belgium done
Stock of broad money - Belize done
Stock of broad money - Benin done
Stock of broad money - Bhutan done
Stock of broad money - Bermuda done
Stock of broad money - Bolivia done
Stock of broad money - Bosnia and Herzegovina done
Stock of broad money - Botswana done
Stock of broad money - Brazil done
Stock of broad money - Brunei done
Stock of broad money - Bulgaria done
Stock of broad money - Burkina Faso done
Stock of broad money - Burundi done
Stock of broad money 

Stock of domestic credit - Ireland done
Stock of domestic credit - Israel done
Stock of domestic credit - Italy done
Stock of domestic credit - Jamaica done
Stock of domestic credit - Japan done
Stock of domestic credit - Jordan done
Stock of domestic credit - Kazakhstan done
Stock of domestic credit - Kenya done
Stock of domestic credit - Korea, South done
Stock of domestic credit - Kosovo done
Stock of domestic credit - Kuwait done
Stock of domestic credit - Kyrgyzstan done
Stock of domestic credit - Laos done
Stock of domestic credit - Latvia done
Stock of domestic credit - Lebanon done
Stock of domestic credit - Lesotho done
Stock of domestic credit - Liberia done
Stock of domestic credit - Libya done
Stock of domestic credit - Lithuania done
Stock of domestic credit - Macau done
Stock of domestic credit - Luxembourg done
Stock of domestic credit - Madagascar done
Stock of domestic credit - Malawi done
Stock of domestic credit - Malaysia done
Stock of domestic credit - Maldives don

Current account balance - Ukraine done
Current account balance - Uganda done
Current account balance - Uzbekistan done
Current account balance - Uruguay done
Current account balance - United Kingdom done
Current account balance - United States done
Current account balance - Vanuatu done
Current account balance - Venezuela done
Current account balance - Vietnam done
Current account balance - West Bank done
Current account balance - Zambia done
Current account balance - Yemen done
Current account balance - Zimbabwe done
Exports
Exports - Albania done
Exports - Algeria done
Exports - Afghanistan done
Exports - American Samoa done
Exports - Andorra done
Exports - Angola done
Exports - Anguilla done
Exports - Antigua and Barbuda done
Exports - Argentina done
Exports - Armenia done
Exports - Aruba done
Exports - Australia done
Exports - Azerbaijan done
Exports - Bahamas, The done
Exports - Bahrain done
Exports - Austria done
Exports - Bangladesh done
Exports - Barbados done
Exports - Belarus

Imports - Guinea-Bissau done
Imports - Guyana done
Imports - Haiti done
Imports - Honduras done
Imports - Hong Kong done
Imports - Hungary done
Imports - Iceland done
Imports - India done
Imports - Iran done
Imports - Indonesia done
Imports - Iraq done
Imports - Ireland done
Imports - Israel done
Imports - Jamaica done
Imports - Italy done
Imports - Japan done
Imports - Jordan done
Imports - Kazakhstan done
Imports - Kiribati done
Imports - Kenya done
Imports - Korea, South done
Imports - Korea, North done
Imports - Kosovo done
Imports - Kuwait done
Imports - Kyrgyzstan done
Imports - Laos done
Imports - Latvia done
Imports - Lebanon done
Imports - Lesotho done
Imports - Liberia done
Imports - Liechtenstein done
Imports - Libya done
Imports - Lithuania done
Imports - Macau done
Imports - Luxembourg done
Imports - Madagascar done
Imports - Malawi done
Imports - Malaysia done
Imports - Maldives done
Imports - Malta done
Imports - Mali done
Imports - Marshall Islands done
Imports - Maurit

Reserves of foreign exchange and gold - Samoa done
Reserves of foreign exchange and gold - Saudi Arabia done
Reserves of foreign exchange and gold - Serbia done
Reserves of foreign exchange and gold - Senegal done
Reserves of foreign exchange and gold - Seychelles done
Reserves of foreign exchange and gold - Sierra Leone done
Reserves of foreign exchange and gold - Singapore done
Reserves of foreign exchange and gold - Slovakia done
Reserves of foreign exchange and gold - Slovenia done
Reserves of foreign exchange and gold - Solomon Islands done
Reserves of foreign exchange and gold - Somalia done
Reserves of foreign exchange and gold - South Africa done
Reserves of foreign exchange and gold - South Sudan done
Reserves of foreign exchange and gold - Sri Lanka done
Reserves of foreign exchange and gold - Spain done
Reserves of foreign exchange and gold - Sudan done
Reserves of foreign exchange and gold - Suriname done
Reserves of foreign exchange and gold - Sweden done
Reserves of forei

Electricity access - Benin done
Electricity access - Bhutan done
Electricity access - Bermuda done
Electricity access - Bolivia done
Electricity access - Bosnia and Herzegovina done
Electricity access - Botswana done
Electricity access - Brazil done
Electricity access - Brunei done
Electricity access - Bulgaria done
Electricity access - Burkina Faso done
Electricity access - Burundi done
Electricity access - Burma done
Electricity access - Cabo Verde done
Electricity access - Cambodia done
Electricity access - Cameroon done
Electricity access - Canada done
Electricity access - Cayman Islands done
Electricity access - Central African Republic done
Electricity access - Chad done
Electricity access - Chile done
Electricity access - China done
Electricity access - Colombia done
Electricity access - Comoros done
Electricity access - Congo, Democratic Republic of the done
Electricity access - Congo, Republic of the done
Electricity access - Costa Rica done
Electricity access - Cote d'Ivoire 

Electricity - production - Iraq done
Electricity - production - Ireland done
Electricity - production - Israel done
Electricity - production - Jamaica done
Electricity - production - Italy done
Electricity - production - Japan done
Electricity - production - Jordan done
Electricity - production - Kazakhstan done
Electricity - production - Kiribati done
Electricity - production - Kenya done
Electricity - production - Korea, South done
Electricity - production - Korea, North done
Electricity - production - Kosovo done
Electricity - production - Kuwait done
Electricity - production - Kyrgyzstan done
Electricity - production - Laos done
Electricity - production - Latvia done
Electricity - production - Lebanon done
Electricity - production - Lesotho done
Electricity - production - Liberia done
Electricity - production - Liechtenstein done
Electricity - production - Libya done
Electricity - production - Lithuania done
Electricity - production - Macau done
Electricity - production - Luxembour

Electricity - exports - Greece done
Electricity - exports - Greenland done
Electricity - exports - Grenada done
Electricity - exports - Guam done
Electricity - exports - Guatemala done
Electricity - exports - Guinea done
Electricity - exports - Guinea-Bissau done
Electricity - exports - Guyana done
Electricity - exports - Haiti done
Electricity - exports - Honduras done
Electricity - exports - Hong Kong done
Electricity - exports - Hungary done
Electricity - exports - Iceland done
Electricity - exports - India done
Electricity - exports - Iran done
Electricity - exports - Indonesia done
Electricity - exports - Iraq done
Electricity - exports - Ireland done
Electricity - exports - Israel done
Electricity - exports - Jamaica done
Electricity - exports - Italy done
Electricity - exports - Japan done
Electricity - exports - Jordan done
Electricity - exports - Kazakhstan done
Electricity - exports - Kiribati done
Electricity - exports - Kenya done
Electricity - exports - Korea, South done
E

Electricity - imports - Hungary done
Electricity - imports - Iceland done
Electricity - imports - India done
Electricity - imports - Iran done
Electricity - imports - Indonesia done
Electricity - imports - Iraq done
Electricity - imports - Ireland done
Electricity - imports - Israel done
Electricity - imports - Jamaica done
Electricity - imports - Italy done
Electricity - imports - Japan done
Electricity - imports - Jordan done
Electricity - imports - Kazakhstan done
Electricity - imports - Kiribati done
Electricity - imports - Kenya done
Electricity - imports - Korea, South done
Electricity - imports - Korea, North done
Electricity - imports - Kosovo done
Electricity - imports - Kuwait done
Electricity - imports - Kyrgyzstan done
Electricity - imports - Laos done
Electricity - imports - Latvia done
Electricity - imports - Lebanon done
Electricity - imports - Lesotho done
Electricity - imports - Liberia done
Electricity - imports - Liechtenstein done
Electricity - imports - Libya done


Electricity - installed generating capacity - Germany done
Electricity - installed generating capacity - Ghana done
Electricity - installed generating capacity - Gibraltar done
Electricity - installed generating capacity - Greece done
Electricity - installed generating capacity - Greenland done
Electricity - installed generating capacity - Grenada done
Electricity - installed generating capacity - Guam done
Electricity - installed generating capacity - Guatemala done
Electricity - installed generating capacity - Guinea done
Electricity - installed generating capacity - Guinea-Bissau done
Electricity - installed generating capacity - Guyana done
Electricity - installed generating capacity - Haiti done
Electricity - installed generating capacity - Honduras done
Electricity - installed generating capacity - Hong Kong done
Electricity - installed generating capacity - Hungary done
Electricity - installed generating capacity - Iceland done
Electricity - installed generating capacity - India

Electricity - from fossil fuels - Falkland Islands (Islas Malvinas) done
Electricity - from fossil fuels - Faroe Islands done
Electricity - from fossil fuels - Fiji done
Electricity - from fossil fuels - Finland done
Electricity - from fossil fuels - French Polynesia done
Electricity - from fossil fuels - Gabon done
Electricity - from fossil fuels - France done
Electricity - from fossil fuels - Gambia, The done
Electricity - from fossil fuels - Georgia done
Electricity - from fossil fuels - Germany done
Electricity - from fossil fuels - Ghana done
Electricity - from fossil fuels - Gibraltar done
Electricity - from fossil fuels - Greece done
Electricity - from fossil fuels - Greenland done
Electricity - from fossil fuels - Grenada done
Electricity - from fossil fuels - Guam done
Electricity - from fossil fuels - Guatemala done
Electricity - from fossil fuels - Guinea done
Electricity - from fossil fuels - Guinea-Bissau done
Electricity - from fossil fuels - Guyana done
Electricity - fro

Electricity - from hydroelectric plants - Comoros done
Electricity - from hydroelectric plants - Congo, Democratic Republic of the done
Electricity - from hydroelectric plants - Congo, Republic of the done
Electricity - from hydroelectric plants - Cook Islands done
Electricity - from hydroelectric plants - Costa Rica done
Electricity - from hydroelectric plants - Cote d'Ivoire done
Electricity - from hydroelectric plants - Croatia done
Electricity - from hydroelectric plants - Cuba done
Electricity - from hydroelectric plants - Cyprus done
Electricity - from hydroelectric plants - Denmark done
Electricity - from hydroelectric plants - Czechia done
Electricity - from hydroelectric plants - Djibouti done
Electricity - from hydroelectric plants - Dominica done
Electricity - from hydroelectric plants - El Salvador done
Electricity - from hydroelectric plants - Dominican Republic done
Electricity - from hydroelectric plants - Ecuador done
Electricity - from hydroelectric plants - Equatorial

Electricity - from other renewable sources - Burundi done
Electricity - from other renewable sources - Burma done
Electricity - from other renewable sources - Cabo Verde done
Electricity - from other renewable sources - Cambodia done
Electricity - from other renewable sources - Cameroon done
Electricity - from other renewable sources - Canada done
Electricity - from other renewable sources - Cayman Islands done
Electricity - from other renewable sources - Central African Republic done
Electricity - from other renewable sources - Chad done
Electricity - from other renewable sources - Chile done
Electricity - from other renewable sources - China done
Electricity - from other renewable sources - Colombia done
Electricity - from other renewable sources - Comoros done
Electricity - from other renewable sources - Congo, Democratic Republic of the done
Electricity - from other renewable sources - Congo, Republic of the done
Electricity - from other renewable sources - Cook Islands done
Electr

Crude oil - production - Chile done
Crude oil - production - China done
Crude oil - production - Colombia done
Crude oil - production - Comoros done
Crude oil - production - Congo, Democratic Republic of the done
Crude oil - production - Congo, Republic of the done
Crude oil - production - Cook Islands done
Crude oil - production - Costa Rica done
Crude oil - production - Cote d'Ivoire done
Crude oil - production - Croatia done
Crude oil - production - Cuba done
Crude oil - production - Curacao done
Crude oil - production - Cyprus done
Crude oil - production - Denmark done
Crude oil - production - Czechia done
Crude oil - production - Djibouti done
Crude oil - production - Dominica done
Crude oil - production - El Salvador done
Crude oil - production - Dominican Republic done
Crude oil - production - Ecuador done
Crude oil - production - Equatorial Guinea done
Crude oil - production - Egypt done
Crude oil - production - Eritrea done
Crude oil - production - Estonia done
Crude oil - pro

Crude oil - exports - Namibia done
Crude oil - exports - Nauru done
Crude oil - exports - Nepal done
Crude oil - exports - Netherlands done
Crude oil - exports - New Caledonia done
Crude oil - exports - New Zealand done
Crude oil - exports - Nicaragua done
Crude oil - exports - Niger done
Crude oil - exports - Nigeria done
Crude oil - exports - Niue done
Crude oil - exports - North Macedonia done
Crude oil - exports - Oman done
Crude oil - exports - Norway done
Crude oil - exports - Pakistan done
Crude oil - exports - Panama done
Crude oil - exports - Papua New Guinea done
Crude oil - exports - Paraguay done
Crude oil - exports - Philippines done
Crude oil - exports - Peru done
Crude oil - exports - Poland done
Crude oil - exports - Portugal done
Crude oil - exports - Puerto Rico done
Crude oil - exports - Qatar done
Crude oil - exports - Romania done
Crude oil - exports - Russia done
Crude oil - exports - Rwanda done
Crude oil - exports - Saint Helena, Ascension, and Tristan da Cunha 

Crude oil - proved reserves - Cambodia done
Crude oil - proved reserves - Cameroon done
Crude oil - proved reserves - Canada done
Crude oil - proved reserves - Cayman Islands done
Crude oil - proved reserves - Central African Republic done
Crude oil - proved reserves - Chad done
Crude oil - proved reserves - Chile done
Crude oil - proved reserves - China done
Crude oil - proved reserves - Colombia done
Crude oil - proved reserves - Comoros done
Crude oil - proved reserves - Congo, Democratic Republic of the done
Crude oil - proved reserves - Congo, Republic of the done
Crude oil - proved reserves - Cook Islands done
Crude oil - proved reserves - Costa Rica done
Crude oil - proved reserves - Cote d'Ivoire done
Crude oil - proved reserves - Croatia done
Crude oil - proved reserves - Cuba done
Crude oil - proved reserves - Curacao done
Crude oil - proved reserves - Cyprus done
Crude oil - proved reserves - Denmark done
Crude oil - proved reserves - Czechia done
Crude oil - proved reserves

Crude oil - proved reserves - Zimbabwe done
Refined_petroleum_products_production
Refined petroleum products - production - Albania done
Refined petroleum products - production - Algeria done
Refined petroleum products - production - Afghanistan done
Refined petroleum products - production - American Samoa done
Refined petroleum products - production - Andorra done
Refined petroleum products - production - Angola done
Refined petroleum products - production - Antigua and Barbuda done
Refined petroleum products - production - Argentina done
Refined petroleum products - production - Armenia done
Refined petroleum products - production - Aruba done
Refined petroleum products - production - Australia done
Refined petroleum products - production - Azerbaijan done
Refined petroleum products - production - Bahrain done
Refined petroleum products - production - Bahamas, The done
Refined petroleum products - production - Austria done
Refined petroleum products - production - Bangladesh done
Ref

Refined petroleum products - production - Slovenia done
Refined petroleum products - production - Solomon Islands done
Refined petroleum products - production - Somalia done
Refined petroleum products - production - South Africa done
Refined petroleum products - production - South Sudan done
Refined petroleum products - production - Sri Lanka done
Refined petroleum products - production - Spain done
Refined petroleum products - production - Sudan done
Refined petroleum products - production - Suriname done
Refined petroleum products - production - Sweden done
Refined petroleum products - production - Syria done
Refined petroleum products - production - Switzerland done
Refined petroleum products - production - Taiwan done
Refined petroleum products - production - Tajikistan done
Refined petroleum products - production - Timor-Leste done
Refined petroleum products - production - Tanzania done
Refined petroleum products - production - Togo done
Refined petroleum products - production - T

Refined petroleum products - consumption - Papua New Guinea done
Refined petroleum products - consumption - Paraguay done
Refined petroleum products - consumption - Philippines done
Refined petroleum products - consumption - Peru done
Refined petroleum products - consumption - Poland done
Refined petroleum products - consumption - Portugal done
Refined petroleum products - consumption - Puerto Rico done
Refined petroleum products - consumption - Qatar done
Refined petroleum products - consumption - Romania done
Refined petroleum products - consumption - Russia done
Refined petroleum products - consumption - Rwanda done
Refined petroleum products - consumption - Saint Helena, Ascension, and Tristan da Cunha done
Refined petroleum products - consumption - Saint Kitts and Nevis done
Refined petroleum products - consumption - Saint Lucia done
Refined petroleum products - consumption - Saint Pierre and Miquelon done
Refined petroleum products - consumption - Saint Vincent and the Grenadines

Refined petroleum products - exports - Malta done
Refined petroleum products - exports - Mali done
Refined petroleum products - exports - Marshall Islands done
Refined petroleum products - exports - Mauritania done
Refined petroleum products - exports - Mauritius done
Refined petroleum products - exports - Mexico done
Refined petroleum products - exports - Micronesia, Federated States of done
Refined petroleum products - exports - Moldova done
Refined petroleum products - exports - Montenegro done
Refined petroleum products - exports - Mongolia done
Refined petroleum products - exports - Montserrat done
Refined petroleum products - exports - Morocco done
Refined petroleum products - exports - Mozambique done
Refined petroleum products - exports - Namibia done
Refined petroleum products - exports - Nauru done
Refined petroleum products - exports - Nepal done
Refined petroleum products - exports - Netherlands done
Refined petroleum products - exports - New Caledonia done
Refined petroleu

Refined petroleum products - imports - Maldives done
Refined petroleum products - imports - Malta done
Refined petroleum products - imports - Mali done
Refined petroleum products - imports - Marshall Islands done
Refined petroleum products - imports - Mauritius done
Refined petroleum products - imports - Mauritania done
Refined petroleum products - imports - Mexico done
Refined petroleum products - imports - Moldova done
Refined petroleum products - imports - Montenegro done
Refined petroleum products - imports - Mongolia done
Refined petroleum products - imports - Montserrat done
Refined petroleum products - imports - Morocco done
Refined petroleum products - imports - Mozambique done
Refined petroleum products - imports - Namibia done
Refined petroleum products - imports - Nauru done
Refined petroleum products - imports - Nepal done
Refined petroleum products - imports - Netherlands done
Refined petroleum products - imports - New Caledonia done
Refined petroleum products - imports - 

Natural gas - production - Marshall Islands done
Natural gas - production - Mauritania done
Natural gas - production - Mauritius done
Natural gas - production - Mexico done
Natural gas - production - Micronesia, Federated States of done
Natural gas - production - Moldova done
Natural gas - production - Montenegro done
Natural gas - production - Mongolia done
Natural gas - production - Montserrat done
Natural gas - production - Morocco done
Natural gas - production - Mozambique done
Natural gas - production - Namibia done
Natural gas - production - Nauru done
Natural gas - production - Nepal done
Natural gas - production - Netherlands done
Natural gas - production - New Caledonia done
Natural gas - production - New Zealand done
Natural gas - production - Nicaragua done
Natural gas - production - Nigeria done
Natural gas - production - Niger done
Natural gas - production - Niue done
Natural gas - production - North Macedonia done
Natural gas - production - Oman done
Natural gas - product

Natural gas - consumption - Mongolia done
Natural gas - consumption - Montserrat done
Natural gas - consumption - Morocco done
Natural gas - consumption - Mozambique done
Natural gas - consumption - Namibia done
Natural gas - consumption - Nauru done
Natural gas - consumption - Nepal done
Natural gas - consumption - Netherlands done
Natural gas - consumption - New Caledonia done
Natural gas - consumption - New Zealand done
Natural gas - consumption - Nicaragua done
Natural gas - consumption - Nigeria done
Natural gas - consumption - Niger done
Natural gas - consumption - Niue done
Natural gas - consumption - North Macedonia done
Natural gas - consumption - Oman done
Natural gas - consumption - Norway done
Natural gas - consumption - Pakistan done
Natural gas - consumption - Panama done
Natural gas - consumption - Papua New Guinea done
Natural gas - consumption - Paraguay done
Natural gas - consumption - Philippines done
Natural gas - consumption - Peru done
Natural gas - consumption - 

Natural gas - exports - Romania done
Natural gas - exports - Russia done
Natural gas - exports - Rwanda done
Natural gas - exports - Saint Helena, Ascension, and Tristan da Cunha done
Natural gas - exports - Saint Kitts and Nevis done
Natural gas - exports - Saint Lucia done
Natural gas - exports - Saint Pierre and Miquelon done
Natural gas - exports - Saint Vincent and the Grenadines done
Natural gas - exports - Sao Tome and Principe done
Natural gas - exports - Samoa done
Natural gas - exports - Saudi Arabia done
Natural gas - exports - Serbia done
Natural gas - exports - Senegal done
Natural gas - exports - Seychelles done
Natural gas - exports - Sierra Leone done
Natural gas - exports - Singapore done
Natural gas - exports - Slovakia done
Natural gas - exports - Slovenia done
Natural gas - exports - Solomon Islands done
Natural gas - exports - Somalia done
Natural gas - exports - South Africa done
Natural gas - exports - South Sudan done
Natural gas - exports - Sri Lanka done
Natur

Carbon dioxide emissions from consumption of energy - British Virgin Islands done
Carbon dioxide emissions from consumption of energy - Brazil done
Carbon dioxide emissions from consumption of energy - Brunei done
Carbon dioxide emissions from consumption of energy - Bulgaria done
Carbon dioxide emissions from consumption of energy - Burkina Faso done
Carbon dioxide emissions from consumption of energy - Burundi done
Carbon dioxide emissions from consumption of energy - Burma done
Carbon dioxide emissions from consumption of energy - Cabo Verde done
Carbon dioxide emissions from consumption of energy - Cambodia done
Carbon dioxide emissions from consumption of energy - Cameroon done
Carbon dioxide emissions from consumption of energy - Canada done
Carbon dioxide emissions from consumption of energy - Cayman Islands done
Carbon dioxide emissions from consumption of energy - Central African Republic done
Carbon dioxide emissions from consumption of energy - Chad done
Carbon dioxide emiss

Telephones - fixed lines - China done
Telephones - fixed lines - Colombia done
Telephones - fixed lines - Comoros done
Telephones - fixed lines - Congo, Republic of the done
Telephones - fixed lines - Cook Islands done
Telephones - fixed lines - Costa Rica done
Telephones - fixed lines - Cote d'Ivoire done
Telephones - fixed lines - Croatia done
Telephones - fixed lines - Cuba done
Telephones - fixed lines - Cyprus done
Telephones - fixed lines - Denmark done
Telephones - fixed lines - Czechia done
Telephones - fixed lines - Djibouti done
Telephones - fixed lines - Dominica done
Telephones - fixed lines - El Salvador done
Telephones - fixed lines - Dominican Republic done
Telephones - fixed lines - Ecuador done
Telephones - fixed lines - Equatorial Guinea done
Telephones - fixed lines - Egypt done
Telephones - fixed lines - Eritrea done
Telephones - fixed lines - Estonia done
Telephones - fixed lines - Eswatini done
Telephones - fixed lines - Ethiopia done
Telephones - fixed lines - Fa

Telephones - mobile cellular - Colombia done
Telephones - mobile cellular - Comoros done
Telephones - mobile cellular - Congo, Democratic Republic of the done
Telephones - mobile cellular - Congo, Republic of the done
Telephones - mobile cellular - Cook Islands done
Telephones - mobile cellular - Costa Rica done
Telephones - mobile cellular - Cote d'Ivoire done
Telephones - mobile cellular - Croatia done
Telephones - mobile cellular - Cuba done
Telephones - mobile cellular - Cyprus done
Telephones - mobile cellular - Denmark done
Telephones - mobile cellular - Czechia done
Telephones - mobile cellular - Djibouti done
Telephones - mobile cellular - Dominica done
Telephones - mobile cellular - El Salvador done
Telephones - mobile cellular - Dominican Republic done
Telephones - mobile cellular - Ecuador done
Telephones - mobile cellular - Equatorial Guinea done
Telephones - mobile cellular - Egypt done
Telephones - mobile cellular - Eritrea done
Telephones - mobile cellular - Estonia done

Internet users - Djibouti done
Internet users - Dominica done
Internet users - El Salvador done
Internet users - Dominican Republic done
Internet users - Ecuador done
Internet users - Equatorial Guinea done
Internet users - Eritrea done
Internet users - Egypt done
Internet users - Estonia done
Internet users - Eswatini done
Internet users - Ethiopia done
Internet users - Falkland Islands (Islas Malvinas) done
Internet users - Faroe Islands done
Internet users - Fiji done
Internet users - Finland done
Internet users - French Polynesia done
Internet users - Gabon done
Internet users - France done
Internet users - Gaza Strip done
Internet users - Gambia, The done
Internet users - Georgia done
Internet users - Germany done
Internet users - Ghana done
Internet users - Gibraltar done
Internet users - Greece done
Internet users - Greenland done
Internet users - Grenada done
Internet users - Guam done
Internet users - Guatemala done
Internet users - Guernsey done
Internet users - Guinea done
I

National air transport system - Germany done
National air transport system - Ghana done
National air transport system - Greece done
National air transport system - Greenland done
National air transport system - Grenada done
National air transport system - Guatemala done
National air transport system - Guernsey done
National air transport system - Guyana done
National air transport system - Haiti done
National air transport system - Honduras done
National air transport system - Hong Kong done
National air transport system - Hungary done
National air transport system - Iceland done
National air transport system - India done
National air transport system - Iran done
National air transport system - Indonesia done
National air transport system - Iraq done
National air transport system - Ireland done
National air transport system - Israel done
National air transport system - Jamaica done
National air transport system - Italy done
National air transport system - Japan done
National air transp

Airports - with unpaved runways - Laos done
Airports - with unpaved runways - Latvia done
Airports - with unpaved runways - Lebanon done
Airports - with unpaved runways - Lesotho done
Airports - with unpaved runways - Liberia done
Airports - with unpaved runways - Libya done
Airports - with unpaved runways - Lithuania done
Airports - with unpaved runways - Luxembourg done
Airports - with unpaved runways - Madagascar done
Airports - with unpaved runways - Malawi done
Airports - with unpaved runways - Malaysia done
Airports - with unpaved runways - Maldives done
Airports - with unpaved runways - Mali done
Airports - with unpaved runways - Marshall Islands done
Airports - with unpaved runways - Mauritania done
Airports - with unpaved runways - Mauritius done
Airports - with unpaved runways - Mexico done
Airports - with unpaved runways - Moldova done
Airports - with unpaved runways - Mongolia done
Airports - with unpaved runways - Morocco done
Airports - with unpaved runways - Mozambique d

Roadways - Norfolk Island done
Roadways - Niger done
Roadways - Niue done
Roadways - Northern Mariana Islands done
Roadways - North Macedonia done
Roadways - Oman done
Roadways - Norway done
Roadways - Palau done
Roadways - Pakistan done
Roadways - Papua New Guinea done
Roadways - Pitcairn Islands done
Roadways - Philippines done
Roadways - Peru done
Roadways - Poland done
Roadways - Portugal done
Roadways - Puerto Rico done
Roadways - Qatar done
Roadways - Romania done
Roadways - Russia done
Roadways - Rwanda done
Roadways - Saint Helena, Ascension, and Tristan da Cunha done
Roadways - Saint Kitts and Nevis done
Roadways - Saint Lucia done
Roadways - Saint Pierre and Miquelon done
Roadways - San Marino done
Roadways - Saudi Arabia done
Roadways - Serbia done
Roadways - Senegal done
Roadways - Seychelles done
Roadways - Sint Maarten done
Roadways - Singapore done
Roadways - Slovakia done
Roadways - Slovenia done
Roadways - Solomon Islands done
Roadways - South Sudan done
Roadways - Sri

HIV/AIDS - adult prevalence rate - Burkina Faso done
HIV/AIDS - adult prevalence rate - Burundi done
HIV/AIDS - adult prevalence rate - Burma done
HIV/AIDS - adult prevalence rate - Cabo Verde done
HIV/AIDS - adult prevalence rate - Cambodia done
HIV/AIDS - adult prevalence rate - Cameroon done
HIV/AIDS - adult prevalence rate - Central African Republic done
HIV/AIDS - adult prevalence rate - Chad done
HIV/AIDS - adult prevalence rate - Chile done
HIV/AIDS - adult prevalence rate - Colombia done
HIV/AIDS - adult prevalence rate - Congo, Democratic Republic of the done
HIV/AIDS - adult prevalence rate - Congo, Republic of the done
HIV/AIDS - adult prevalence rate - Costa Rica done
HIV/AIDS - adult prevalence rate - Cote d'Ivoire done
HIV/AIDS - adult prevalence rate - Cuba done
HIV/AIDS - adult prevalence rate - Cyprus done
HIV/AIDS - adult prevalence rate - Denmark done
HIV/AIDS - adult prevalence rate - Djibouti done
HIV/AIDS - adult prevalence rate - El Salvador done
HIV/AIDS - adult

Market value of publicly traded shares - Israel done
Market value of publicly traded shares - Jamaica done
Market value of publicly traded shares - Italy done
Market value of publicly traded shares - Japan done
Market value of publicly traded shares - Jordan done
Market value of publicly traded shares - Kazakhstan done
Market value of publicly traded shares - Kenya done
Market value of publicly traded shares - Korea, South done
Market value of publicly traded shares - Kuwait done
Market value of publicly traded shares - Kyrgyzstan done
Market value of publicly traded shares - Laos done
Market value of publicly traded shares - Latvia done
Market value of publicly traded shares - Lebanon done
Market value of publicly traded shares - Lithuania done
Market value of publicly traded shares - Macau done
Market value of publicly traded shares - Luxembourg done
Market value of publicly traded shares - Malawi done
Market value of publicly traded shares - Malaysia done
Market value of publicly tr

<a id='extracting'></a>

## 3.3 Extracting Data from the Database 

Now that we've successfully stored the data into a persistent form, we can now proceed to extract it for future use. This will be accomplished using the `pandas` library. We will list a field name, and using a field-SQL title dictionary, extract the table into a dataframe.

In [9]:
import pandas as pd
import sqlite3

def extract_data(field):
    """
    Extracts countries data from postGRES database into a dataframe
    
    Parameters:
    ----------------
    field: string
        String exactly matching a field from the world factbook
        
    Returns:    
    ----------------
    df: pandas DataFrame object
        Data frame version of the SQL table    
    """  
    table = sql_create_title(field)    
    sql = f'select * from {table}'
    with sqlite3.connect(sqlite_file) as conn:
        cur = conn.cursor()
        df = pd.read_sql(sql, conn)
    conn.close()
    return df

In [17]:
field = 'Internet Users'
df = extract_data(field)

column_to_sort_by = df.columns[1]
df=df.sort_values(column_to_sort_by, ascending=False).head(10).reset_index(drop=True)
df

Unnamed: 0,country,total,percent_of_population
0,China,730723960.0,53.2
1,India,374328160.0,29.5
2,United States,246809221.0,76.2
3,Brazil,122841218.0,59.7
4,Japan,116565962.0,92.0
5,Russia,108772470.0,76.4
6,Mexico,73334032.0,59.5
7,Germany,72365643.0,89.6
8,Indonesia,65525226.0,25.4
9,United Kingdom,61064454.0,94.8


<a id='visualization'></a>

# 4. Visualization

Now that we are able to access the data, we can now also visualize them on a chloropleth, or colored map, with the package [plotly](https://plot.ly/python/choropleth-maps/).

In [None]:
import plotly
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.graph_objs as go
init_notebook_mode(connected=True)

def plot_global_chloropleth(df, cbar_title='', source_text=''):
    """
    Wrapper function that takes in a dataframe of country names and a quantitative field and then plots
    a Viridis chloropleth with many default settings of the plotly chloropleth example.
    
    Parameters:
    ----------------
    df: pandas DataFrame object
        First column must be country names and the second a quantitative field
    cbar_title: string
        Title to put on the colorbar
    source_text: string
        Annotation to give to the bottom of the map
    """
    data = [go.Choropleth(
        locations = df[df.columns[0]],
        locationmode = 'country names',
        z = df[df.columns[1]],
        colorscale='Viridis',
        autocolorscale = False,
        reversescale = True,
        marker = go.choropleth.Marker(
            line = go.choropleth.marker.Line(
                color = 'rgb(180,180,180)',
                width = 0.5
            )),
        colorbar = go.choropleth.ColorBar(
            title = cbar_title),
    )]

    layout = go.Layout(
        title = go.layout.Title(
            text = df.columns[1]
        ),
        geo = go.layout.Geo(
            showframe = False,
            showcoastlines = False,
            projection = go.layout.geo.Projection(
                type = 'equirectangular'
            )
        ),
        annotations = [go.layout.Annotation(
            x = 0.55,
            y = 0.1,
            xref = 'paper',
            yref = 'paper',
            text = source_text,
            showarrow = False
        )]
    )

    fig = go.Figure(data = data, layout = layout)
    iplot(fig, filename = 'd3-world-map')


Below, if we simply narrow down our dataframe to two columns, the country names and the numbers of interest, we can feed it into the above function and generate a quick chloropleth. This assumes that plotly's example is working, which as of 6/11/2019, is currently not the case.

In [37]:
df = df[['country', 'total']]
plot_global_chloropleth(df, 'Internet Users')

An alternative to using plotly is simply loading a large global cholorpleth with `geopandas`. However, loading the map can take a while and they are not interactive.

<a id='conclusion'></a>

# 5. Concluding Remarks and Future Directions

### Summary

In summary, we were able to scrape the CIA world factbook and use Python's versatile toolset to crawl through the webpage structures and extract the data. Specifically we used:

- BeautifulSoup to navigate the tags in the tree structure of the world factbook website
- regular expressions to clean up data names and search for specific data
- multithreading to speed up the webscraping and writing the data into a Python data structure
- SQLite to write the data into a persistent location without the need to setup a server
- pandas to extract and manipulate the data

### Future Directions

- Visualization: we only scratched the surface
- Analysis: since many fields of data are now available, this can be used for comparisons between countries and attempts to find trends or group countries by similarities. 