# Documentation for downloading height datasets

General dataset: Actueel Hoogtebestand Nederland (AHN2)

This script is used to download the content of all datasets which are on the page : https://www.pdok.nl/downloads?articleid=1948857


These datasets have the shape of an xml file which contains many links to the files we are looking for.

## Import libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import xml.etree.ElementTree as ET
import subprocess

## Set variables

In [2]:
download_links = [] #Contains links of xml files that we need to go through
datasets_links = {} #Contains all the datasets and files associated to each datasets

## Define the functions

In [3]:
def extract_dataset_from_link(dataset_xml_link):
    """
    This function creates a dictionary with an association between a file name and its download link
    The input is a link of a xml file
    The output is file_and_link 
    """
    file_and_link = {}
    zip_link = ''
    id = ''

    xml_page = requests.get(dataset_xml_link)
    root = ET.fromstring(xml_page.text)

    #Fill data into a dictionary (name : link)
    for child in root:
        for elt in child:
            if elt.tag[-2:] == 'id':
                id = elt.text[:-8]
            if elt.tag[-4:] == 'link' and elt.attrib.get('type') == "application/x-compressed":
                zip_link = elt.attrib.get('href')
            #Map id and link
            if(id != '' and zip_link != ''):
                file_and_link[id] = zip_link

    return file_and_link

In [4]:
def call_script_sh(filename, url):
    """
    This function calls an shell script on the server
    This shell script download the dataset into /data/height repertory on the server
    The file downloaded is a .zip
    """
    process = subprocess.Popen(['./dl.sh',filename, url], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    #print(stdout)

## Analysis of the webpage

In [5]:
#Get content of the web page where we can see all the dataset
datasets_page = requests.get("https://www.pdok.nl/downloads?articleid=1948857")

In [6]:
#Create beautiful Object
soup_html = BeautifulSoup(datasets_page.text,"html.parser")

#Looking for HTML attributes which contain XML links 
download_link_containers = soup_html.find_all('a', class_ = 'btn btn-action btn-download btn-download-width')

#Extract the XML link from the HTML attribute
for link in download_link_containers:
    download_links.append(link.get("href"))

In [7]:
download_links

['http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_05m_int.xml',
 'http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_05m_non.xml',
 'http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_05m_ruw.xml',
 'http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_5m.xml',
 'http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_gefilterd.xml',
 'http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_uitgefilterd.xml']

## Map a dataset with the files it contain

In [8]:
#Map an xml and all the file it contain
for xml_link in download_links:
    extract = extract_dataset_from_link(xml_link)
    datasets_links[xml_link] = extract

## Structure of the data we get

In [9]:
datasets_links

{'http://geodata.nationaalgeoregister.nl/ahn2/atom/ahn2_05m_int.xml': {'i01cz1': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01cz1.tif.zip',
  'i01cz2': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01cz2.tif.zip',
  'i01dz1': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01dz1.tif.zip',
  'i01dz2': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01dz2.tif.zip',
  'i01gn1': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01gn1.tif.zip',
  'i01gn2': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01gn2.tif.zip',
  'i01gz1': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01gz1.tif.zip',
  'i01gz2': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01gz2.tif.zip',
  'i01hn1': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_int/i01hn1.tif.zip',
  'i01hn2': 'http://geodata.nationaalgeoregister.nl/ahn2/extract/ahn2_05m_in

## Execute : Launch downloading on the server

## If you have an error during the download

### Restart the download of files in a specific dataset

### Download one dataset from its link

# References

General datasets page : https://www.pdok.nl/datasets


Dataset we are using : Actueel Hoogtebestand Nederland (AHN2)


Link to this dataset : https://www.pdok.nl/downloads?articleid=1948857