<a href="https://colab.research.google.com/github/OnroerendErfgoed/scriptorium/blob/main/notebooks/get_collection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Accessing a collection of resources

This notebook show how to access a collection of resources from the Agentschap Onroerend Erfgoed webservices. It assumes you have already mastered the techniques described in the [`get_resource` notebook](https://github.com/OnroerendErfgoed/scriptorium/blob/main/notebooks/get_resource.ipynb).

Again we'll be using the Inventaris Onroerend Erfgoed as our example, but the techniques in this notebook could also be applied to the Onroerend Erfgoed Beelbank or Archeologieportaal.

The basic idea for accessing a collection is very similar to accesing a resource, we GET a certain URL. But, this time the resources does not reprensent a single resource, but a collection of resources. This has a few consequences:

* When accessing a single resource, we receive an *object* that consists of attributes and their values. When accessing a collection, we receive a *list* of objects.
* When accessing a single resource, we get all the data on a single object. When accessing a collection, we get a summary of each object in the collection. To get all the data, we need to GET the full resource.
* When accessing a collection of resources, it's possible that the collection is to big to send to the client in one go. In this case we divide the collection in smaller pieces, often called *pages*.
* When accessing a collection, we often want to *filter* the objects that returned to us. Eg. we do not want the collection of all Erfgoedobject, but a smaller part such as the collection of *all Erfgoedobjecten in Antwerpen*.

# A simple example
Let's start with a simple example. We'll request the entire collection of Erfgoedobject and output the first page of results we receive.

We'll also output how many results there actually are. This information is sent back by the server in the `Content-Range` header. This header will contain a value like `items 0-9/124`. This means we are viewing resources 0 through 9 (the first ten) out of a total of 124 resources.


In [None]:
# Import requests library to contact the REST service
import requests
# Import IPython library to produce Markdown
import IPython

# Create a requests Session
session = requests.Session()

# Set the default Accept header to `application/json`
session.headers.update({'Accept': 'application/json'})

# Make a request and store the response
response = session.get(
    'https://inventaris.onroerenderfgoed.be/erfgoedobjecten'
)

# Remove the first 6 characters from the header (items )
content_range = response.headers['Content-Range'][6:]
# The part after the slash is the total number of results
number_of_results = content_range.split('/')[1]
# Split the part before the slash on `-`, before is the first result in this page
start_result = content_range.split('/')[0].split('-')[0].strip()
# Split the part before the slash on `-`, after is the last result in this page
end_result = content_range.split('/')[0].split('-')[1].strip()

# Turn the response's JSON data into a Python dictionary
data = response.json()

output = f"**Showing results {start_result} to {end_result} of {number_of_results}**\n\n"
for erfgoedobject in data:
  output += f"* {erfgoedobject['naam']} ({erfgoedobject['uri']})\n"
  output += f"\t* *Locatie:* {erfgoedobject['locatie_samenvatting']}\n"
  output += f"\t* *Korte beschrijving:* {erfgoedobject['korte_beschrijving']}\n"


IPython.display.Markdown(output)

# Filtering the collection

In our first examples, we're requesting the entire collection, more than 90.000 resources. As can be imagined, this is generally not what we want and accessing the entire collection would be very time consuming. Quite often, we're only interested in a subset of the collection. We might only care about Erfgoedobjecten in a certain gemeente (municipality) or provincie (province) or Erfgoedobjecten of a certain type. 

To filter a collection, we pass it parameters. These are added to the URl after a question mark `?` and look like `<parameter_name>=<parameter_value>`. When there is more than one parameter, they get sepparated by an ampersand `&`. So, if we pass 2 parameters, we get `?parameter_one=value_one&paramerer_two=value_two`.

To know which parameters the server accepts, we consult the API documentation of the server. Generally these can be accessed at `/api_doc`, relative to the server root. Eg. https://inventaris.onroerenderfgoed.be/api_doc or https://beeldbank.onroerenderfgoed.be/api_doc

For our second example, we'll limit the results to Erfgoedobjecten that are built heritage (`discipline=2`), that have some form of bescherming (`rechtsgevolgen=beschermd`) and are situated in Ghent (`gemeente=44021`).

In [None]:
# Import requests library to contact the REST service
import requests
# Import IPython library to produce Markdown
import IPython

# Create a requests Session
session = requests.Session()

# Set the default Accept header to `application/json`
session.headers.update({'Accept': 'application/json'})

# Make a request and store the response
response = session.get(
    'https://inventaris.onroerenderfgoed.be/erfgoedobjecten',
    params = {
        'gemeente': 44021,
        'discipline': 2,
        'rechtsgevolgen': 'beschermd'
    }
)

# Remove the first 6 characters from the header (items )
content_range = response.headers['Content-Range'][6:]
# The part after the slash is the total number of results
number_of_results = content_range.split('/')[1]
# Split the part before the slash on `-`, before is the first result in this page
start_result = content_range.split('/')[0].split('-')[0].strip()
# Split the part before the slash on `-`, after is the last result in this page
end_result = content_range.split('/')[0].split('-')[1].strip()

# Turn the response's JSON data into a Python dictionary
data = response.json()

# Generate output
output = f"**Showing results {start_result} to {end_result} of {number_of_results}**\n\n"
for erfgoedobject in data:
  output += f"* {erfgoedobject['naam']} ({erfgoedobject['uri']})\n"
  output += f"\t* *Locatie:* {erfgoedobject['locatie_samenvatting']}\n"
  output += f"\t* *Korte beschrijving:* {erfgoedobject['korte_beschrijving']}\n"


IPython.display.Markdown(output)

We have reduced our more than 90.000 results to a more manageable 1.300 results. Still a lot, but a closer to a number we can actually do something with. We're still only seeing the first ten results though. Suppose we want to see all those 1.300 results, how do we do that?

To access the next page of results we can send and extra parameter to the sever indicatin which page of results we want to see. Our first page contains results 0-9, the second page 10-19, the third 20-29 and so forth. By adding `pagina=2` to the parameters we can fetch this page.

In [None]:
# Import requests library to contact the REST service
import requests
# Import IPython library to produce Markdown
import IPython

# Create a requests Session
session = requests.Session()

# Set the default Accept header to `application/json`
session.headers.update({'Accept': 'application/json'})

# Make a request and store the response
response = session.get(
    'https://inventaris.onroerenderfgoed.be/erfgoedobjecten',
    params = {
        'gemeente': 44021,
        'discipline': 2,
        'rechtsgevolgen': 'beschermd',
        'pagina': 4
    }
)

# Remove the first 6 characters from the header (items )
content_range = response.headers['Content-Range'][6:]
# The part after the slash is the total number of results
number_of_results = content_range.split('/')[1]
# Split the part before the slash on `-`, before is the first result in this page
start_result = content_range.split('/')[0].split('-')[0].strip()
# Split the part before the slash on `-`, after is the last result in this page
end_result = content_range.split('/')[0].split('-')[1].strip()

# Turn the response's JSON data into a Python dictionary
data = response.json()

# Generate output
output = f"**Showing results {start_result} to {end_result} of {number_of_results}**\n\n"
for erfgoedobject in data:
  output += f"* {erfgoedobject['naam']} ({erfgoedobject['uri']})\n"
  output += f"\t* *Locatie:* {erfgoedobject['locatie_samenvatting']}\n"
  output += f"\t* *Korte beschrijving:* {erfgoedobject['korte_beschrijving']}\n"


IPython.display.Markdown(output)

This is still somewhat impractical as we are now seeing a different page of results, but still not all results. It also requires us to know that we need to use the `pagina` parameter. An other service might call it `page` or `slice`. Still other services do not use a `page` parameter, but use parameters like `limit` (how many results to return) and `offset` (which result to start returning from). It's hardly fair to expect the user to know what they're supposed to do. Off course they can read the documentation, but most people prefer a more obvious solution. We have adopted the HTTP Link header for this purpose.

The `Link` header is a header included in the server's response that gives a list of URL's that point to other parts of the collection we're trying to browse. Every URL provided includes a tag that tells us which page it is: the first, the last, the next or the previous. 

In [None]:
# Import requests library to contact the REST service
import requests

# Create a requests Session
session = requests.Session()

# Set the default Accept header to `application/json`
session.headers.update({'Accept': 'application/json'})

# Make a request and store the response
response = session.get(
    'https://inventaris.onroerenderfgoed.be/erfgoedobjecten',
    params = {
        'gemeente': 44021,
        'discipline': 2,
        'rechtsgevolgen': 'beschermd',
        'pagina': 4
    }
)

print(response.headers['Content-Range'])
print(response.headers['Link'])

Using these links we can quickly fetch an entire dataset. The Requests library makes things easy on us. It knows about this header and allows us to use it with minimal effort. We'll try that out. For didactic reasons, we'll fetch a smaller set, all built heritage in Knokke-Heist that has some form of "bescherming".

In [None]:
# Import requests library to contact the REST service
import requests
# Import IPython library to produce Markdown
import IPython

# Create a requests Session
session = requests.Session()

# Set the default Accept header to `application/json`
session.headers.update({'Accept': 'application/json'})

# Make a request and store the response
response = session.get(
    'https://inventaris.onroerenderfgoed.be/erfgoedobjecten',
    params = {
        'gemeente': 31043,
        'discipline': 2,
        'rechtsgevolgen': 'beschermd'
    }
)

# Remove the first 6 characters from the header (items )
content_range = response.headers['Content-Range'][6:]
# The part after the slash is the total number of results
number_of_results = content_range.split('/')[1]

# Turn the response's JSON data into a Python dictionary
data = response.json()

# Fetch all the other pages of results
while 'next' in response.links:
  # GET the url pointing to the next page
  response = session.get(response.links['next']['url'])
  # add the data from the next page to the data we're collecting
  data.extend(response.json())

#Generate output
output = f"**Showing {number_of_results} results**\n\n"
for erfgoedobject in data:
  output += f"* {erfgoedobject['naam']} ({erfgoedobject['uri']})\n"
  output += f"\t* *Locatie:* {erfgoedobject['locatie_samenvatting']}\n"
  output += f"\t* *Korte beschrijving:* {erfgoedobject['korte_beschrijving']}\n"


IPython.display.Markdown(output)