# Dowloading Trade Data with the Comtrade API

Many institutions including the World Bank, the Fed, or the ECB now provide access to their data bases through APIs (Automated Programming Interfaces). In this exercise we will use the United Nations Comtrade API to download trade flows. 

The trade data accessible through the API can also be downloaded manually through drop-down menus on the [comtrade website](https://comtrade.un.org/data/). But if one is interested in making multiple downloads the API will come in pretty handy. 


## Working directory
First, let us set up the working directory since we will download files from the UN comtrade. If you are running this notebook from the session 5 folder of your fork, you should have it as current directory.

In [1]:
import os

# The os.getcwd() returns a string, you can assign it to a variable if you need using var = os.getcwd(). 
# Then, var will be assigned to that string.
print(os.getcwd())

/home/moritz/Documents/GitHub/Classes/Session_5


Now, create a folder to store the data, call it `/Data` inside your working directory. 

In [9]:
os.makedirs("Data", exist_ok = True)

## API Documentation

Every API comes with a documentation. To understand how to use the Comtrade API you **need** to look at the [UN Comtrade documentation here](https://comtrade.un.org/data/doc/api/) to get an idea of the parameters required to make a request. 

Let's start with a simple request using the `requests` package, we want: 
- Commodities
- Annual frequency
- Year 2013
- HS Sector Classification
- UK to World
- Imports and exports 

Check out the url including these parameters!


In [11]:
import requests

url = "http://comtrade.un.org/api/get?max=100000&type=C&freq=A&px=HS&ps=2013&r=826&p=0&rg=all&cc=ALL&fmt=csv&head=M"

data_1 = requests.get(url)

if data_1.status_code == 200:
    print("Your request was successful")
else:
    print(f"Error {data_1.status_code} on your request ")

Your request was successful


Some initial remarks: Above you might receive the "ChunkedEncodingError" which stops your code. If you receive this before the function "bilateral_requests" is defined, just run again the block of code returning the error. Instead, if you get it in the bilateral_requests call, or in the blocks of code after that, just ignore it and read the rest of the code without running it. We have not used exception handling to solve this problem on purpose, since we want to show some of the problems you might have using the UN Comtrade API. At the end of this notebook we mention this (and other problems) and possible ways to solve them, but we leave it is an exercise to include those solutions in your code.

You don't need to always print the status code when you download data. The HTTP code 200 means that the request was succesful and the object required has been returned. You can learn more about HTTP codes [here](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html). The use of HTTP codes helps when you want to use exception handling to deal with possible problems in retrieving data.

Now that you have obtained an object as result from your query (in this case the csv file), you might want to store the file somewhere. In this case, we will use the *Data* folder of the previous step.

In [12]:
# Use one of the variables below depending on your OS.

# For Windows
#data_path = os.getcwd() + "\\Data\\UK_world"

# For MacOS or Linux
data_path = os.getcwd() + "/Data/"

# Function to write csv for reporter and partner data
def write(req, path, reporter = "", partner = ""):
    
    print(f"Writing .csv file in {path}")

    # The function open below just opens the file defined as path in write mode. 
    # Then, while this file is open, the following line will write the text content of the request 
    # to this file (after some manipulation using join and replace)
    with open(path + reporter + "_" + partner + ".csv", 'w', newline = "") as f:
        # This will access the content of our request, and we already know that it is a csv file. 
        # It will write that file in the directory that we specify as path.
        f.write("".join(req.text.replace(";","")))
    print(f"File .csv saved in {path}")

    
# Execute the Function    
write(data_1, data_path, "UK", "World")

Writing .csv file in /home/moritz/Documents/GitHub/Classes/Session_5/Data/
File .csv saved in /home/moritz/Documents/GitHub/Classes/Session_5/Data/


Good job! Now you should have the csv file inside the data folder. To start all over again, let's remove it:

In [5]:
os.remove(data_path + "_.csv")

## Country IDs

The API uses numeric ISO codes for specific countries. In order to retrieve a list of country codes, the API allows the following call (see Documentation!): 

In [15]:
# This is the url to the json file with the id-country pairs
url_country_values= "https://comtrade.un.org/Data/cache/reporterAreas.json"

country_values = requests.get(url_country_values).json()["results"]

# Let's print the first 10 codes
print(country_values[1:10])

<class 'list'>
[{'id': '4', 'text': 'Afghanistan'}, {'id': '8', 'text': 'Albania'}, {'id': '12', 'text': 'Algeria'}, {'id': '20', 'text': 'Andorra'}, {'id': '24', 'text': 'Angola'}, {'id': '660', 'text': 'Anguilla'}, {'id': '28', 'text': 'Antigua and Barbuda'}, {'id': '32', 'text': 'Argentina'}, {'id': '51', 'text': 'Armenia'}]


Let's get the format into a more convienent shape:

In [19]:
# Object is of type
print(type(country_values))
# First item inside object is of type
print(type(country_values[1]))

<class 'dict'>
<class 'dict'>


In [20]:
# Below you can find two different ways to achieve the same result
"""
unpacked_id = []
unpacked_countries = []
for x in range(len(country_values)):
    unpacked_id.append(country_values[x]["id"])
    unpacked_countries.append(country_values[x]["text"])
    
unpacked_values = list(zip(unpacked_id, unpacked_countries))
print(unpacked_values)
"""
unpacked_values = [(x, y) for entry in range(len(country_values)) for x, y in [(country_values[entry].get("id"), country_values[entry].get("text"))]]

print(type(unpacked_values))
print(type(unpacked_values[1]))
print(unpacked_values[1:25])

<class 'list'>
<class 'tuple'>
[('4', 'Afghanistan'), ('8', 'Albania'), ('12', 'Algeria'), ('20', 'Andorra'), ('24', 'Angola'), ('660', 'Anguilla'), ('28', 'Antigua and Barbuda'), ('32', 'Argentina'), ('51', 'Armenia'), ('533', 'Aruba'), ('36', 'Australia'), ('40', 'Austria'), ('31', 'Azerbaijan'), ('44', 'Bahamas'), ('48', 'Bahrain'), ('50', 'Bangladesh'), ('52', 'Barbados'), ('112', 'Belarus'), ('56', 'Belgium'), ('58', 'Belgium-Luxembourg'), ('84', 'Belize'), ('204', 'Benin'), ('60', 'Bermuda'), ('64', 'Bhutan')]


Since we know the name of a country but not the id, which is what we need for the API request, we can construct a function that takes the name of the country as argument and return the associated id.

In [21]:
def obtain_id(country_name):
    for x in range(len(unpacked_values)):
        if country_name in unpacked_values[x]:
            print(f"The country {country_name} is in the list with id {unpacked_values[x][0]}")
            i = unpacked_values[x][0]
            return i
    else:
        print(f"The country {country_name} is not on the list, check the exact name used by the UN comtrade for that country")
        
# Let's try it:
print(obtain_id("Belgium"))
        

The country Belgium is in the list with id 56
56


## API Call

Let's build another helper function. It creates new folders to store our data with the arguments:
- Frequency
- Sector classificiation
- Year
- Reporter

Since we work on different OS, we will also add an argument that takes the string "Windows" or "MacOS"

In [61]:
def folder(frequency, classification, year, reporter, OS, month = ""):
    if OS == "Windows":
        path = os.getcwd() + "\\Data\\" + frequency + "\\" + classification + "\\" + year + month + "\\" + reporter
        os.makedirs(path, exist_ok = True)
        print(f"The folder at {path} has been created.")
        return path + "\\"
    elif OS == "MacOS":
        path = os.getcwd() + "/Data/" + frequency + "/" + classification + "/" + year + month + "/" + reporter + "/"
        os.makedirs(path, exist_ok = True)
        print(f"The folder at {path} has been created.")
        return path
    

Let's creates the function with the actual call to the API. It should take as arguments the parameters required by the API. Use the name of a country instead of the id, which we can recover from the previous function. As arguments use:

* "frequency" to which we will assign the value "A" or "M" to get the data frequency
* "classification" that takes the values "HS", "H4", etc. depending on the calssification that we want to use
* "year" for the data reference year
* "reporter" the reporter country, we will recover the id using the previous function
* "partner" same as reporter but for the trading partner

For the other parameters in the URL fix the following values:
* Commodities (type=C)
* Obtaind data on imports and exports (rg=1,2)
* For all the classification codes within a classification (cc=all)
* The format returned is a csv file (fmt=csv)

This function should return the object of the query (like we did with "data_1 = requests.get(url)")

We are including a frequency argument in the function but we will always use the annual frequency data for this exercise. If you want to get monthly data you should adjust the function in the following way:
* If you did read the documentation of the API, you should have noticed that the format of the parameter at annual frequency is 2017, 2016, etc.
* Instead, for the monthly frequency you have 201701, 201702, etc. the second part is the month
* To obtain this parameter, you should add another argument (called "month") to the function. You will use the values 01, 02, 03, etc. for this parameter
* In the url_year part you should concatenate the year and month arguments to get the required values, i.e. 201701.
* The only classification available for the monthly data is "HS"

In [62]:
def query_constructor_C(frequency, classification, year, reporter, partner, month = ""):
    url_frequency = "&freq=" + frequency
    url_classification = "&px=" + classification
    url_year = "&ps=" + year + month
    url_reporter = "&r=" + obtain_id(reporter)
    url_partner = "&p=" + obtain_id(partner)
    url_final = "&rg=1,2&cc=ALL&fmt=csv&head=M"
    url = "http://comtrade.un.org/api/get?max=100000&type=C" + url_frequency + url_classification + url_year + url_reporter + url_partner + url_final
    print(f"The url for {classification} and trade flows between {reporter} and {partner} in {year + month} has been created. Processing request...")
    req = requests.get(url) 
    print(f"The request for {classification}, {reporter}, {partner}, {year + month} has been completed. The HTTP code is: {req.status_code}")
    print(url)
    return req

First, let us try if the function is working properly. For now we will include it in a temporary function together with the folder function just to store the file. Then, we will check how to improve things.

In [50]:
def bilateral_requests(frequency, classification, year, reporter, partner, OS, month = ""):
    path = folder(frequency, classification, year, reporter, OS, month = month)
    print(path)
    req = query_constructor_C(frequency, classification, year, reporter, partner, month = month)
    write(req, path, reporter, partner)
    return req

In [82]:
# Check your folder now, this should have created the csv file
bilateral_requests("A", "H4", "2017", "France", "Germany", "MacOS")
# Let us try with monthly data
#bilateral_requests("M", "HS", "2017", "France", "Germany", "MacOs", month = "01")

The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
The country France is in the list with id 251
The country Germany is in the list with id 276
The url for H4 and trade flows between France and Germany in 2017 has been created. Processing request...
The request for H4, France, Germany, 2017 has been completed. The HTTP code is: 200
http://comtrade.un.org/api/get?max=100000&type=C&freq=A&px=H4&ps=2017&r=251&p=276&rg=1,2&cc=ALL&fmt=csv&head=M
Writing .csv file in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
File .csv saved in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/


<Response [200]>

Everything should be fine for now. Instead of bilateral data, try to get trade values between France and all its trading partners.

In [67]:
bilateral_requests("A", "H4", "2017", "France", "All", "MacOS")

The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
The country France is in the list with id 251
The country All is in the list with id all
The url for H4 and trade flows between France and All in 2017 has been created. Processing request...
The request for H4, France, All, 2017 has been completed. The HTTP code is: 200
http://comtrade.un.org/api/get?max=100000&type=C&freq=A&px=H4&ps=2017&r=251&p=all&rg=1,2&cc=ALL&fmt=csv&head=M
Writing .csv file in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
File .csv saved in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/


<Response [200]>

If you check the last file, you will notice that the request was successful but we did not get the data since the number of observations is above the limit. Usually, it is better to receive an error when you make the request that exceed the limit instead of a successful request code. However, this is how the UN Comtrade deisgned its API. Anyway, we can take a step back to avoid this problem. If you did check the API documentation, the UN Comtrade has a separate file to check the data availability (it returns a json with multiple informations). Let us define a couple of functions to use this data availability request.

In [69]:
def query_availability_C(frequency, classification, year, reporter, partner = "", month = ""):
    url_frequency = "http://comtrade.un.org/api/refs/da/view?type=C&freq=" + frequency
    url_classification = "&px=" + classification
    url_year = "&ps=" + year + month
    url_reporter = "&r=" + obtain_id(reporter)
    if partner == "":
        url_partner = "&p="
    elif partner != "":
        url_partner = "&p=" + obtain_id(partner)
    url_final = "&rg=1,2&cc=ALL"
    url = url_frequency + url_classification + url_year + url_reporter + url_partner + url_final
    print(f"The url for {classification} and trade flows between {reporter} and {partner} in {year + month} has been created. Processing data availability file...")
    req = requests.get(url).json()
    print(f"The json for {classification}, {reporter}, {partner}, {year + month} is now available. Now it is time to unpack it.")
    unpacked = [(x) for entry in range(len(req)) for x in req[entry].items()]
    print(unpacked)

Now, start from our previous query on annual data between France and Germany.

In [83]:
query_availability_C("A", "H4", "2017", "France", partner = "Germany")

The country France is in the list with id 251
The country Germany is in the list with id 276
The url for H4 and trade flows between France and Germany in 2017 has been created. Processing data availability file...
The json for H4, France, Germany, 2017 is now available. Now it is time to unpack it.
[('type', 'COMMODITIES'), ('freq', 'ANNUAL'), ('px', 'H4'), ('r', '251'), ('rDesc', 'France'), ('ps', '2017'), ('TotalRecords', 684593), ('isOriginal', 0), ('publicationDate', '2018-08-24T00:00:00'), ('isPartnerDetail', 1)]


As you can see, the data availabilty json is good if you have problems for observations between a reporter country and all its trade partners, while it is useless to solve the observation problem should it arise for bilateral flows (it should not since UN Comtrade increase the max size of the request). We did include a partner paramater but the request ignored it since it is not a paramenter of the data availability query. The request sent us the number of observations for all trade flows (including re-exports and re-imports) between the reporter country and all trade partners.

Let us check the availability for the monthly data just for fun

In [72]:
query_availability_C("M", "HS", "2017", "France", partner = "All", month = "01")

The country France is in the list with id 251
The country All is in the list with id all
The url for HS and trade flows between France and All in 201701 has been created. Processing data availability file...
The json for HS, France, All, 201701 is now available. Now it is time to unpack it.
[('type', 'COMMODITIES'), ('freq', 'MONTHLY'), ('px', 'HS'), ('r', '251'), ('rDesc', 'France'), ('ps', '201701'), ('TotalRecords', 373901), ('isOriginal', 1), ('publicationDate', '2018-08-22T00:00:00'), ('isPartnerDetail', 1)]


Same problem as before, we only get aggregate observations for France.

Clean the Data folder by deleting the csv files. We could also write down a function to do that. You can do that as an exercise, you will need to look at some of the functions in the os library.

In [84]:
# For Windows
#os.remove(os.getcwd() + "\\Data\\A\\H4\\2017\\France\\France_Germany.csv")
#os.remove(os.getcwd() + "\\Data\\A\\H4\\2017\\France\\France_All.csv")
#os.remove(os.getcwd() + "\\Data\\M\\HS\\201701\\France\\France_Germany.csv")

# For MacOS
os.remove(os.getcwd() + "/Data/A/H4/2017/France/France_Germany.csv")
os.remove(os.getcwd() + "/Data/A/H4/2017/France/France_All.csv")
#os.remove(os.getcwd() + "/Data/M/HS/201701/France/France_Germany.csv")

We will look at some of the problems that you might have using the UN Comtrade API without a license below. First, we will create a for loop to make requests for bilateral data.

In [75]:
# We might include an elif to avoid the request for the "All" partner, but we do not know if, at least for some reporter country (maybe small countries) it works because there are less observations
def reporter_requests(frequency, classification, year, reporter, OS, month = ""):
    index = 0
    for x in range(len(unpacked_values)):
        if reporter == unpacked_values[x][1]:
            continue
        else:
            req = bilateral_requests(frequency, classification, year, reporter, unpacked_values[x][1], OS, month = month)
            index += 1
            if req.status_code != 200:
                print(f"The request was not successful for {frequency}, {classification}, {year + month}, {reporter}, {unpacked_values[x][1]}")
                break
            elif index == 4:
                print(f"Since this is only an example, we stop at the index {index} since you might not want to download all the bilateral data for {reporter} in {year + month}")
                break

In [76]:
reporter_requests("A", "H4", "2017", "France", "MacOS")

The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
The country France is in the list with id 251
The country All is in the list with id all
The url for H4 and trade flows between France and All in 2017 has been created. Processing request...
The request for H4, France, All, 2017 has been completed. The HTTP code is: 200
http://comtrade.un.org/api/get?max=100000&type=C&freq=A&px=H4&ps=2017&r=251&p=all&rg=1,2&cc=ALL&fmt=csv&head=M
Writing .csv file in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
File .csv saved in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2017/France/
The country France is in the list with id 251
The country Afghanistan is in the list wit

Now, let us mention a couple of problems that you might find using the UN Comtrade API:
1. If you got an error at some point the loop will break. There are a couple of possible explanation for this:
 * We know that the request for the parameters above is well defined (meaning that the data are available, believe me on this). However, as we have seen before, they might exceed the number of observations available to free users. We have seen that this still return an HTTP code equal to 200, so it does not break the loop (more on this below). More on how to deal with large number of observations below.
 * Most likely, the problem is that free users can send 1 request per second. Since the files for the bilateral data are quite small, the loop might cycle requests really fast. In that case, you should get an HTTP code different from 200, breaking the loop. To avoid this, we can just import the time library and include a time.sleep(1) inside the loop.
 * You exceed the number of requests (100) that free users can make to the API in a 60 minutes window. To keep track of this, we can add an index = 0 at the beginning of the function, and increase it by 1 with each iteration of the for loop. Then, just add an if statement that, when the index is close to 100, uses time.sleep() for a sufficient number of minutes to reset the counter. Inside the function, after the time.sleep(), you can reset the index and go back to the iterations of the loop. Otherwise, instead of the counter, you can use exception handling to tell the code to sleep once the API returns the error (using an HTTP code) associated to the user request limit. When you hit the limit you receive a 409 HTTP code, which would stop the code, and the first row of the last file donwloaded will tell you why you received the error and, if the error was cause by the requests limit per hour, the time in which you can get back to sending requests. You could import in Python the first row from that file to extract, using regular expressions, the time to resume the requests, and use it to restart the loop creating the requests. We will not provide the code to do that here since we already cover a lot of topics in this session.  
2. The loop stops after you have already completed a certain amount of iterations. This might happen for multiple reasons such as a loss of internet connection which leads to an error when you try to make the request or you simply interrupt the code by hand to close the notebook. In that case, you do not want to download again files that you have already stored in your folder, since it will burn your number of available requests per hour. Below we address this problem using a function that tells you whether a file is already stored in your folder and, in that case, does not submit a request since you already have the file. Below we do not provide the code to deal handle the exception from the loss of internet connection during a request. Try to write it without our help!

Below the function to check the existing files in your folder.

In [77]:
def check_existence(frequency, classification, year, reporter, partner, OS, month = ""):
    if OS == "Windows":
        try:
            if reporter + "_" + partner + ".csv" in os.listdir(os.getcwd() + "\\Data\\" + frequency + "\\" + classification + "\\" + year + month + "\\" + reporter):
                print (f"File {reporter}_{partner}.csv already exists, skip to next iteration.")
                return True
        except FileNotFoundError:
                print (f"The folder does not exist, implying that the file {reporter}_{partner}.csv does not exist, continue with this iteration.")
    elif OS == "MacOS":
        try:
            if reporter + "_" + partner + ".csv" in os.listdir(os.getcwd() + "/Data/" + frequency + "/" + classification + "/" + year + month + "/" + reporter):
                print (f"File {reporter}_{partner}.csv already exists, skip to next iteration.")
                return True
        except FileNotFoundError:
                print (f"The folder does not exist, implying that the file {reporter}_{partner}.csv does not exist, continue with this iteration.")

Slightly modify the reporter_requests function defined above to include this

In [78]:
def reporter_requests_v2(frequency, classification, year, reporter, OS, month = ""):
    index = 0
    for x in range(len(unpacked_values)):
        if reporter == unpacked_values[x][1]:
            continue
        else:
            existence = check_existence(frequency, classification, year, reporter, unpacked_values[x][1], OS, month = month)
            if existence:
                continue
            else:
                req = bilateral_requests(frequency, classification, year, reporter, unpacked_values[x][1], OS, month = month)
                index += 1
                if req.status_code != 200:
                    print(f"The request was not successful for {frequency}, {classification}, {year + month}, {reporter}, {unpacked_values[x][1]}")
                    break
                elif index == 4:
                    print(f"Since this is only an example, we stop at the index {index} since you might not want to download all the bilateral data for {reporter} in {year + month}")
                    break

In [79]:
reporter_requests_v2("A", "H4", "2016", "France", "MacOS")

The folder does not exist, implying that the file France_All.csv does not exist, continue with this iteration.
The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2016/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2016/France/
The country France is in the list with id 251
The country All is in the list with id all
The url for H4 and trade flows between France and All in 2016 has been created. Processing request...
The request for H4, France, All, 2016 has been completed. The HTTP code is: 200
http://comtrade.un.org/api/get?max=100000&type=C&freq=A&px=H4&ps=2016&r=251&p=all&rg=1,2&cc=ALL&fmt=csv&head=M
Writing .csv file in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2016/France/
File .csv saved in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2016/France/
The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/2016/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/

To see another problem with the API, let us make a request for data that are not in the database (the H4 classification is from 2012, so there are no data using this classification in 1992).

In [80]:
reporter_requests_v2("A", "H4", "1992", "France", "MacOS")

The folder does not exist, implying that the file France_All.csv does not exist, continue with this iteration.
The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/1992/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/1992/France/
The country France is in the list with id 251
The country All is in the list with id all
The url for H4 and trade flows between France and All in 1992 has been created. Processing request...
The request for H4, France, All, 1992 has been completed. The HTTP code is: 200
http://comtrade.un.org/api/get?max=100000&type=C&freq=A&px=H4&ps=1992&r=251&p=all&rg=1,2&cc=ALL&fmt=csv&head=M
Writing .csv file in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/1992/France/
File .csv saved in /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/1992/France/
The folder at /home/moritz/Documents/GitHub/Classes/Session_5/Data/A/H4/1992/France/ has been created.
/home/moritz/Documents/GitHub/Classes/Session_5/

If you try to open one of the .csv file downloaded, you will notice that the request was successful but the first entry of the downloaded file tells you that the data are not available for that query.

Check what we get from the data availability request if we use this classification and year

In [81]:
query_availability_C("A", "H4", "1992", "France")

The country France is in the list with id 251
The url for H4 and trade flows between France and  in 1992 has been created. Processing data availability file...
The json for H4, France, , 1992 is now available. Now it is time to unpack it.
[]


It is empty, meaning that the data are not available for this combination of parameter. Let us discuss the problems of the current code and possible solutions. You can improve the code to include those solutions as an exercise.

# Exceeding number of observations

We have seen that the query for the data availability does not really help to adrress this problem. We have also noticed that the request is successful but we get a file that in the first entry says "Result too large: you do not have permissions to access such a large resultset.".
1. Remember that our query asks for imports and exports at the same time. You can rewrite the functions to include trade_flows as argument (adjust also the functions to construct folders, etc. obvisouly). In this way, you reduce the number of observations per file, but increase the number of requests you have to make.
2. Include in our main function another function that opens each file that we download to read the first entry. If we get the string "Results too large etc." we can then sae the parameters of this query to a list so that we know which query had the observations problem. Then, we will have to break down those query to find a request with fewer observations. For example, instead of downloading all the commodity codes at once, we split them up (look at the "cc=" parameters on the API documentation). Otherwise, split the trade flows as suggested in 1.

# More than 1 request per second

The rate limit should return a 409 error (according to the documentation). This is the same code returned once you exceed 100 requests per hour (again, according to the documentation).
1. To avoid the more than 1 request per second problem you just need to include a time.sleep(1) between iterations of the requests. 
2. Even if you have this problem, it is possible that the UN Comtrade will download a .csv file containing the error as first entry. As for the observations problem, you can just open the file and check the first entry to see if there is a problem. Then, just tell the loop to submit the query again if that was the problem listed in the file

# Usage limit, more than 100 requests per hour

As mentioned above, this should get a 409 status code, so you can deal with it using exception handling and time.sleep(). As for the previous problems, you might get a file that in the first entry report the error, including the time at which you can start submitting requests again. You can extract that and tell Python to restart the requests at that time. Otherwise, just include an index to keep track of how many requests you have made, to stop before the limit, and to tell the code to sleep.

# Data not available

For this problem you can again use the first entry of the file downloaded, which will tell you that there is a problem of availability with the data. Otherwise, include at the start of the function the query for the avaialbility of the data. If the json from that query is an empty list you know that the data are not available, so you can avoid making that request for the data to the API. The latter approach is better since it should not burn your number of requests per hour (the request for the data availability probably is not included in the usage limit)

# Folder management

There are things that you can do to improve folder management of the data downloaded while keeping track of the requests made. You should add a function that deletes the file reporting errors (without data) after you open them to get information about the error contained and the parameter used for that request. In this way you can keep track of the paramaters for which you do not have data due to errors. You should also store these information somewhere since they might be useful at some point.

If you want to update your data files over time you can create a function that looks at the creation time of your .csv files and delete them if they are older than your desired threshold. Then, you can download them again. Whether this is useful or not depends on how often UN Comtrade revises old data. For example, they might never change the content of files for trade flows between countries in 1973. In that case, deleting the old file and downloading it again is useless since there are no changes to the data. Instead, they might update trade flows data for the last 2-3 years, so you might want to update the files associated to this time window.