# Data extraction

In the following cells we are going to import the data from the [FAO website](https://www.fao.org/faostat/en/#home) and from the [World Bank website] (https://www.worldbank.org/en/home), from their respective Statistical or Data sections, and decompress it. So later on, we can process it in the integration, and subsequent parts.

In [7]:
import requests
import glob
import os
from zipfile import ZipFile

To download the data, as it is over 1000 Mb, in compressed files, we are going to need a bypass of the <code>stream</code> and <code>verify</code>. Therefore, at the end we will have the data downloaded in our working directory data by design. (It can be obtained with the following function <code>os.getcwd()</code>)

In [None]:
url = 'https://fenixservices.fao.org/faostat/static/bulkdownloads/FAOSTAT.zip'
r = requests.get(url, allow_redirects=True, stream = True,verify=False)
open('FAOSTAT.zip', 'wb').write(r.content)

In [None]:
url2 = 'https://databank.worldbank.org/data/download/WDI_csv.zip'
r2 = requests.get(url2, allow_redirects=True, stream = True,verify=False)
open('WDI_csv.zip', 'wb').write(r.content)

Then we extract all the files that are contained in <em>'FAOSTAT.zip'</em> to the default directory.

In [None]:
zip_name = "FAOSTAT.zip"
with ZipFile(zip_name, 'r') as zip:
    zip.printdir()
    zip.extractall() 

Thirdly, we decompressed all the files that have been extracted from the *FAOSTAT.zip* and decompressed the *WDI_csv.zip* , into the default directory in a new folder called *Data*, with a simple `for` loop. 

In [None]:
zip_list = glob.glob(os.getcwd()+'/**.zip')
for i in range(0, len(zip_list)):
    with ZipFile(zip_list[i], 'r') as zip:
        zip.printdir()
        zip.extractall("Data") 

Finally we are going to delete the *FAOSTAT.zip* and the *WDI_csv.zip*.

In [None]:
os.remove(os.getcwd()+"/**.zip")