# Lesson 7: Advanced Web Scraping and Data Gathering
## Topic 3: Reading data from an API
This Notebook shows how to use a free API (no authorization or API key needed) to download some basic information about various countries around the world and put them in a DataFrame.

### Import libraries

In [2]:
import urllib.request, urllib.parse
from urllib.error import HTTPError,URLError
import pandas as pd

### Exercise 20: Define the base URL

In [3]:
serviceurl = 'https://restcountries.eu/rest/v2/name/'

### Exercise 21: Define a function to pull the country data from the API

In [4]:
def get_country_data(country):
    """
    Function to get data about a country from "https://restcountries.eu" API
    """
    country_name = str(country)
    url = serviceurl + country_name
    
    try: 
        uh = urllib.request.urlopen(url)
    except HTTPError as e:
        print("Sorry! Could not retrieve anything on {}".format(country_name))
        return None
    except URLError as e:
        print('Failed to reach a server.')
        print('Reason: ', e.reason)
        return None
    else:
        data = uh.read().decode()
        print("Retrieved data on {}. Total {} characters read.".format(country_name,len(data)))
        return data

### Exercise 22: Test the function by passing a correct and an incorrect argument

In [5]:
country_name = 'Switzerland'

In [6]:
data=get_country_data(country_name)

Retrieved data on Switzerland. Total 1090 characters read.


In [6]:
country_name1 = 'Switzerland1'

In [7]:
data1=get_country_data(country_name1)

Sorry! Could not retrive anything on Switzerland1


### Exercise 23: Use the built-in `JSON` library to read and examine the data properly

In [11]:
import json

In [12]:
# Load from string 'data'
x=json.loads(data)

In [13]:
# Load the only element
y=x[0]

In [14]:
type(y)

dict

In [15]:
y.keys()

dict_keys(['name', 'topLevelDomain', 'alpha2Code', 'alpha3Code', 'callingCodes', 'capital', 'altSpellings', 'region', 'subregion', 'population', 'latlng', 'demonym', 'area', 'gini', 'timezones', 'borders', 'nativeName', 'numericCode', 'currencies', 'languages', 'translations', 'flag', 'regionalBlocs', 'cioc'])

### Exercise 24: Can you print all the data elements one by one?

In [16]:
for k,v in y.items():
    print("{}: {}".format(k,v))

name: Switzerland
topLevelDomain: ['.ch']
alpha2Code: CH
alpha3Code: CHE
callingCodes: ['41']
capital: Bern
altSpellings: ['CH', 'Swiss Confederation', 'Schweiz', 'Suisse', 'Svizzera', 'Svizra']
region: Europe
subregion: Western Europe
population: 8341600
latlng: [47.0, 8.0]
demonym: Swiss
area: 41284.0
gini: 33.7
timezones: ['UTC+01:00']
borders: ['AUT', 'FRA', 'ITA', 'LIE', 'DEU']
nativeName: Schweiz
numericCode: 756
currencies: [{'code': 'CHF', 'name': 'Swiss franc', 'symbol': 'Fr'}]
languages: [{'iso639_1': 'de', 'iso639_2': 'deu', 'name': 'German', 'nativeName': 'Deutsch'}, {'iso639_1': 'fr', 'iso639_2': 'fra', 'name': 'French', 'nativeName': 'français'}, {'iso639_1': 'it', 'iso639_2': 'ita', 'name': 'Italian', 'nativeName': 'Italiano'}]
translations: {'de': 'Schweiz', 'es': 'Suiza', 'fr': 'Suisse', 'ja': 'スイス', 'it': 'Svizzera', 'br': 'Suíça', 'pt': 'Suíça', 'nl': 'Zwitserland', 'hr': 'Švicarska', 'fa': 'سوئیس'}
flag: https://restcountries.eu/data/che.svg
regionalBlocs: [{'acrony

### Exercise 25: The dictionary values are not of the same type - print all the languages spoken

In [17]:
for lang in y['languages']:
    print(lang['name'])

German
French
Italian


### Exercise 26: Write a function which can take a list of countries and return a DataFrame containing key info
* Capital
* Region
* Sub-region
* Population
* lattitude/longitude
* Area
* Gini index
* Timezones
* Currencies
* Languages

In [15]:
import pandas as pd
import json
def build_country_database(list_country):
    """
    Takes a list of country names.
    Output a DataFrame with key information about those countries.
    """
    # Define an empty dictionary with keys
    country_dict={'Country':[],'Capital':[],'Region':[],'Sub-region':[],'Population':[],
                  'Lattitude':[],'Longitude':[],'Area':[],'Gini':[],'Timezones':[],
                  'Currencies':[],'Languages':[]}
    
    for c in list_country:
        data = get_country_data(c)
        if data!=None:
            x = json.loads(data)
            y=x[0]
            country_dict['Country'].append(y['name'])
            country_dict['Capital'].append(y['capital'])
            country_dict['Region'].append(y['region'])
            country_dict['Sub-Region'].append(y['subregion'])
            country_dict['Population'].append(y['population'])
            country_dict['Lattitude'].append(y['latlng'][0])
            country_dict['Longitude'].append(y['latlng'][1])
            country_dict['Area'].append(y['area'])
            country_dict['Gini'].append(y['gini'])
            # Note the code to handle possibility of multiple timezones as a list
            if len(y['timezones'])>1:
                country_dict['Timezones'].append(','.join(y['timezones']))
            else:
                country_dict['Timezones'].append(y['timezones'][0])
            # Note the code to handle possibility of multiple currencies as dictionaries
            if len(y['currencies'])>1:
                lst_currencies = []
                for i in y['currencies']:
                    lst_currencies.append(i['name'])
                country_dict['Currencies'].append(','.join(lst_currencies))
            else:
                country_dict['Currencies'].append(y['currencies'][0]['name'])
            # Note the code to handle possibility of multiple languages as dictionaries
            if len(y['languages'])>1:
                lst_languages = []
                for i in y['languages']:
                    lst_languages.append(i['name'])
                country_dict['Languages'].append(','.join(lst_languages))
            else:
                country_dict['Languages'].append(y['languages'][0]['name'])
    
    # Return as a Pandas DataFrame
    return pd.DataFrame(country_dict)

### Exercise 27: Test the function by building a small database of countries' info. Include an incorrect name too.

In [16]:
df1=build_country_database(['Nigeria','Switzerland','France',
                            'Turmeric','Russia','Kenya','Singapore'])

Retrieved data on Nigeria. Total 1004 characters read.
Retrieved data on Switzerland. Total 1090 characters read.
Retrieved data on France. Total 1047 characters read.
Sorry! Could not retrive anything on Turmeric
Retrieved data on Russia. Total 1120 characters read.
Retrieved data on Kenya. Total 1052 characters read.
Retrieved data on Singapore. Total 1223 characters read.


In [17]:
df1

Unnamed: 0,Area,Capital,Country,Currencies,Gini,Languages,Lattitude,Longitude,Population,Region,Sub-region,Timezones
0,923768.0,Abuja,Nigeria,Nigerian naira,48.8,English,10.0,8.0,186988000,Africa,Western Africa,UTC+01:00
1,41284.0,Bern,Switzerland,Swiss franc,33.7,"German,French,Italian",47.0,8.0,8341600,Europe,Western Europe,UTC+01:00
2,640679.0,Paris,France,Euro,32.7,French,46.0,2.0,66710000,Europe,Western Europe,"UTC-10:00,UTC-09:30,UTC-09:00,UTC-08:00,UTC-04..."
3,17124442.0,Moscow,Russian Federation,Russian ruble,40.1,Russian,60.0,100.0,146599183,Europe,Eastern Europe,"UTC+03:00,UTC+04:00,UTC+06:00,UTC+07:00,UTC+08..."
4,580367.0,Nairobi,Kenya,Kenyan shilling,47.7,"English,Swahili",1.0,38.0,47251000,Africa,Eastern Africa,UTC+03:00
5,710.0,Singapore,Singapore,"Brunei dollar,Singapore dollar",48.1,"English,Malay,Tamil,Chinese",1.366667,103.8,5535000,Asia,South-Eastern Asia,UTC+08:00
