## Reading Data from an API

#### A web API is an API over the web.

Think of an API like a fast-food restaurant's customer service desk. Internally, there are many food items, raw materials, cooking resources, and recipe management systems, but all you see are **fixed menu items** on the board and you can only interact through those items. It is like a port that can be accessed using an HTTP protocol and that's able to deliver data and services if used properly.<br>

Web APIs are extremely popular these days for all kinds of data services. Therefore, it is very important for a data wrangling professional to understand the basics of data extraction from a web API as you are extremely likely to find yourself in a situation where large quantities of data must be read through an API for processing and wrangling.

we will use a free API to read some information about various **countries around the world in JSON format** and process it.


#### Required library

- Python's built-in urllib module
- pandas to make a DataFrame
- Python's json module

1. Import the necessary libraries:

In [1]:
import urllib.request, urllib.parse
from urllib.error import HTTPError,URLError
import json
import pandas as pd

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [3]:
import os
os.chdir("drive/My Drive/Lab3/")

2. Define the service_url variable:

In [4]:
serviceurl = 'https://restcountries.com/v3.1/name/'

3. Define a function to pull out data when we pass the name of a country as an argument. The crux of the operation is contained in the following two lines of code:

In [5]:
country_name = 'Switzerland'
# appends the country name as a string to the base URL
url = serviceurl + country_name
# sends a get request to the API endpoint.
# If all goes well, we get back the data, decode it, and read it as a JSON file.
uh = urllib.request.urlopen(url)

In [6]:
print(uh)

<http.client.HTTPResponse object at 0x7a1f7a6020e0>


4. Define the get_country_data function:

In [7]:
def get_country_data(country):
    """
    Function to get data about country from "https://restcountries.com" API
    """
    country_name=str(country)
    url = serviceurl + country_name

    try:
        uh = urllib.request.urlopen(url)
    except HTTPError as e:
        print("Sorry! Could not retrieve anything on {}".format(country_name))
        return None
    except URLError as e:
        print('Failed to reach a server.')
        print('Reason: ', e.reason)
        return None
    else:
        data = uh.read().decode()
        print("Retrieved data on {}. Total {} characters read.".format(country_name,len(data)))
        return data

5. Type in the following command:

In [8]:
country_name = 'Thailand'
data = get_country_data(country_name)

Retrieved data on Thailand. Total 3011 characters read.


In [9]:
print(type(data))
print(data[:5])

<class 'str'>
[{"na


6. Feed erroneous data in country_name1:

In [10]:
country_name1 = 'Switzerland1'
data = get_country_data(country_name1)

Sorry! Could not retrieve anything on Switzerland1


***

### Using the Built-In JSON Library to Read and Examine Data

We will use Python's requests module to read raw data in that format and see what we can process further:

In [11]:
country_name = 'Switzerland'
data = get_country_data(country_name)

Retrieved data on Switzerland. Total 3228 characters read.


In [12]:
# Load from string 'data' to json
x=json.loads(data)
print(type(x[0]))

<class 'dict'>


In [13]:
# Load the only element
y=x[0]
type(y)

dict

We can quickly check the keys of the dictionary by using the keys() method on the dictionary, that is, the JSON data

In [14]:
y.keys()

dict_keys(['name', 'tld', 'cca2', 'ccn3', 'cca3', 'cioc', 'independent', 'status', 'unMember', 'currencies', 'idd', 'capital', 'altSpellings', 'region', 'subregion', 'languages', 'translations', 'latlng', 'landlocked', 'borders', 'area', 'demonyms', 'flag', 'maps', 'population', 'gini', 'fifa', 'car', 'timezones', 'continents', 'flags', 'coatOfArms', 'startOfWeek', 'capitalInfo', 'postalCode'])

In [15]:
#Printing All the Data Elements
for k,v in y.items():
    print("{}: {}".format(k,v))

name: {'common': 'Switzerland', 'official': 'Swiss Confederation', 'nativeName': {'fra': {'official': 'Confédération suisse', 'common': 'Suisse'}, 'gsw': {'official': 'Schweizerische Eidgenossenschaft', 'common': 'Schweiz'}, 'ita': {'official': 'Confederazione Svizzera', 'common': 'Svizzera'}, 'roh': {'official': 'Confederaziun svizra', 'common': 'Svizra'}}}
tld: ['.ch']
cca2: CH
ccn3: 756
cca3: CHE
cioc: SUI
independent: True
status: officially-assigned
unMember: True
currencies: {'CHF': {'name': 'Swiss franc', 'symbol': 'Fr.'}}
idd: {'root': '+4', 'suffixes': ['1']}
capital: ['Bern']
altSpellings: ['CH', 'Swiss Confederation', 'Schweiz', 'Suisse', 'Svizzera', 'Svizra']
region: Europe
subregion: Western Europe
languages: {'fra': 'French', 'gsw': 'Swiss German', 'ita': 'Italian', 'roh': 'Romansh'}
translations: {'ara': {'official': 'الاتحاد السويسري', 'common': 'سويسرا'}, 'bre': {'official': 'Kengevredad Suis', 'common': 'Suis'}, 'ces': {'official': 'Švýcarská konfederace', 'common': '

Write a small loop to extract the languages spoken in Switzerland

<img src="./images/api_country1.jpg" align="left" style="width:600px;"/>

In [16]:
for k,lang in y['languages'].items():
    print(lang)

French
Swiss German
Italian
Romansh


Create a function to build a Small Database of Country Information

In [17]:
import pandas as pd
import json
def build_country_database(list_country):
    """
    Takes a list of country names.
    Output a DataFrame with key information about those countries.
    """
    # Define an empty dictionary with keys
    country_dict={'Country':[],'Capital':[],'Region':[],'Sub-region':[],'Population':[],
                  'Latitude':[],'Longitude':[],'Area':[],'Gini':[],'Timezones':[],
                  'Currencies':[],'Languages':[]}

    for c in list_country:
        data = get_country_data(c)
        if data!=None:
            x = json.loads(data)
            y=x[0]
            country_dict['Country'].append(y['name'])
            country_dict['Capital'].append(y['capital'])
            country_dict['Region'].append(y['region'])
            country_dict['Sub-region'].append(y['subregion'])
            country_dict['Population'].append(y['population'])
            country_dict['Latitude'].append(y['latlng'][0])
            country_dict['Longitude'].append(y['latlng'][1])
            country_dict['Area'].append(y['area'])
            country_dict['Gini'].append(y['gini'])
            # Note the code to handle possibility of multiple timezones as a list
            if len(y['timezones'])>1:
                country_dict['Timezones'].append(','.join(y['timezones']))
            else:
                country_dict['Timezones'].append(y['timezones'][0])
            # Note the code to handle possibility of multiple currencies as dictionaries
            #print("o"*30)
            #xx1 = y['currencies'].values()
            #print(list(xx1)[0]['name'])

            if len(y['currencies'])>1:
                lst_currencies = []
                for i in y['currencies']:
                    lst_currencies.append(i['name'])
                country_dict['Currencies'].append(','.join(lst_currencies))
            else:
                country_dict['Currencies'].append(list(y['currencies'].values())[0]['name'])
            # Note the code to handle possibility of multiple languages as dictionaries
            if len(y['languages'])>1:
                lst_languages = []
                for i in y['languages']:
                    lst_languages.append(i['name'])
                country_dict['Languages'].append(','.join(lst_languages))
            else:
                country_dict['Languages'].append(list(y['languages'].values())[0])

    # Return as a Pandas DataFrame
    return pd.DataFrame(country_dict)

To test its robustness, we pass in an erroneous name, such as Turmeric in this case:
<img src="./images/api_country2.jpg" align="left" style="width:600px;"/>

In [None]:
df1=build_country_database(['Thailand','Nigeria'])

In [None]:
df1

In [None]:
#Fix timezones, countries with multiple timezones, currencies and languages