<a href="https://colab.research.google.com/github/WetSuiteLeiden/data-collection/blob/master/api_tweede_kamer_part1_first_api_and_parties.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Purpose of this notebook

Mainly to explore the two APIs at the [tweede kamer open data portal](https://opendata.tweedekamer.nl/),
seeing what you might use them for.

<!-- -->

A note before we get intro technical details:

Neither API is is a very _fast_ interface, 
nor will they be efficient when your goal is sifting through the entire collection in detailed ways it was not initially designed for.
As with any API, it's great at the things its designers kept in mind, and not at others.
For some purposes, instead of trying to interrogate this API, you might like to download all and figure it out later (or, if we did that already, find the resulting dataset).

Those two interfaces are:
- an [Atom-style API called SyncFeed](https://opendata.tweedekamer.nl/documentatie/syncfeed-api), and 
  - a little easier to interact without a whole library.
  - speaks XML/Atom in interaction
- an [OData API](https://opendata.tweedekamer.nl/documentatie/odata-api).
  - takes more up-front consideration, but can be more thorough and flexible - and there _is_ a library for it
  - speaks JSON in interaction

Both APIs broadly adhere to the same [relational data model](https://opendata.tweedekamer.nl/documentatie/informatiemodel),
that you should be thinking of when fishing data out of them.

Relational means that one piece of data say it relates to other data by just pointing to it.
Part of the below tries to explore how you might use such relations.

...because we probably don't want to just fetch everything,
we want to show how to figure how to get and use the parts you need for a specific purpose.

<!-- -->

This notebook and its compantion will end up not being interested in most of its parts,
yet we will give at least _some_ introduction to the less trivial bits.

In [1]:
import os, random, collections
import json, pprint

import wetsuite.datacollect.tweedekamer_nl  # contains some basic code dealing with the syncfeed API
import wetsuite.helpers.localdata
import wetsuite.helpers.etree

#from wetsuite.helpers  import etree
#from wetsuite.helpers  import strings
#from wetsuite.helpers  import notebook

In [2]:
# for reference, these are the major entity types in that model
wetsuite.datacollect.tweedekamer_nl.resource_types  # This is a list from our library.  We could also fetch these, but they would be the same.

('Activiteit',
 'ActiviteitActor',
 'Agendapunt',
 'Besluit',
 'Commissie',
 'CommissieContactinformatie',
 'CommissieZetel',
 'CommissieZetelVastPersoon',
 'CommissieZetelVastVacature',
 'CommissieZetelVervangerPersoon',
 'CommissieZetelVervangerVacature',
 'Document',
 'DocumentActor',
 'DocumentVersie',
 'Fractie',
 'FractieAanvullendGegeven',
 'FractieZetel',
 'FractieZetelPersoon',
 'FractieZetelVacature',
 'Kamerstukdossier',
 'Persoon',
 'PersoonContactinformatie',
 'PersoonGeschenk',
 'PersoonLoopbaan',
 'PersoonNevenfunctie',
 'PersoonNevenfunctieInkomsten',
 'PersoonOnderwijs',
 'PersoonReis',
 'Reservering',
 'Stemming',
 'Vergadering',
 'Verslag',
 'Zaak',
 'ZaakActor',
 'Zaal')

## Atom/SyncFeed API

Let's take a look at the Atom/SyncFeed API first.

### "Fetch one resource type"

SyncFeed starts by interacting with an URL like:

        https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Feed?category=Persoon

If you manage to load that in a browser, you will see an Atom XML structure mentioning the first bunch/page of them, and links to successive pages.

To fetch **all** resources of the mentioned soort/category, and to then perhaps make a single list of it, 
you need something that follows and fetches those 'next page' style links until there are no more. 
Code like our `wetsuite.datacollect.tweedekamer_nl.fetch_all` does that.

That said, due to the relational nature of of the data (pointing at related objects, and you often wanting to fetch a related constellation of them),
it quickly becomes interesting to interrogate the data with **more foresight than "just give me everything of a single thing"** - we'll get to that soon.

Uses vary. 
- Below, we'll end up extracting who is member of what party, 
  which is a somewhat manual piecing together of 'Persoon', 'Fractie', 'FractieZetel', and 'FractieZetelPersoon'.

- If you wanted a record of what gets done in an everyday way, you might care about 'Vergadering', 'Verslag', 'Stemming',

- if you are more interested in documentation, then 'Document', 'Kamerstukdossier'

- The 'Zaak' type lies somewhere inbetween

- Kamerstukdossiers are interesting for their wocnents, relating/organizing Zaken, Documents, and more. 
  Their relations are a little more complex than they seem at first, which is why it's not our first example.

For a list of types with some explanation of their relations, see [the documentation](https://opendata.tweedekamer.nl/documentatie/).

#### Small example: Zaal

This doesn't do much by itself. 

You would probably only do this when you want to know what Activiteiten happened where.

This just serves as an example of basic interation, that doesn't have spammy output.

In [3]:
zaal_etrees = wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Zaal' ) 
# The helper functions consider that this comes from separate fetches,
#   and that in some cases you may not want to fetch all.
# (consider that e.g. all Stemming, Zaak, Document total to hundreds of megabytes)

# So fetch_all() specifically _does_ return all, as a list of etree objects, one for each page of results. 
# Chances are we want to see that as one big list...
zaal_tree = wetsuite.datacollect.tweedekamer_nl.merge_etrees( zaal_etrees )

In [4]:
# ...whether you want write it out into a file...
with open('Zaal.xml', 'wb') as zaal_file:
    zaal_file.write( wetsuite.helpers.etree.tostring( zaal_tree ) )

In [5]:
# ...or consume it in code. For which you probably want to use the following helpers to see that XML as python dicts...
entry_dicts = wetsuite.datacollect.tweedekamer_nl.entry_dicts( zaal_tree )

pprint.pprint( entry_dicts[:2] ) # show a few

# and for reference, the XML that came from:
#print( wetsuite.helpers.etree.debug_pretty( zaal_tree ) )

[{'category': 'zaal',
  'content': {'bijgewerkt': '2019-08-12T12:56:35.4070000',
              'id': '6e7dfdae-583a-4191-8818-a89a538c469f',
              'naam': 'Z7 - Statenpassage - Petitie',
              'refs': {},
              'sysCode': '154',
              'tagname': 'zaal',
              'verwijderd': 'false'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/6e7dfdae-583a-4191-8818-a89a538c469f',
  'title': '6e7dfdae-583a-4191-8818-a89a538c469f',
  'updated': '2019-08-15T14:45:52Z'},
 {'category': 'zaal',
  'content': {'bijgewerkt': '2019-08-12T12:56:34.8600000',
              'id': 'f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
              'naam': 'Eerste Kamer',
              'refs': {},
              'sysCode': '101',
              'tagname': 'zaal',
              'verwijderd': 'false'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
  'title': 'f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
  'u

In [6]:
# ...where the dicts are more directly useful than the etree. An example of picking things from such dicts:
for detail_dict in wetsuite.datacollect.tweedekamer_nl.entry_dicts( zaal_tree ):
    print(f"{detail_dict['content']['naam']:40s}    {detail_dict['updated']}")

Z7 - Statenpassage - Petitie                2019-08-15T14:45:52Z
Eerste Kamer                                2019-08-15T14:45:52Z
Z6 - Plein 2 - Petitie                      2019-08-15T14:45:52Z
Regentenkamer                               2019-08-15T14:45:51Z
Schrijfkamer                                2019-08-15T14:45:51Z
Statenlokaal                                2019-08-15T14:45:51Z
Van Mierlozaal                              2019-08-15T14:45:52Z
Z5 - Schriftelijke Inbreng                  2019-08-15T14:45:52Z
van Someren-Downerzaal                      2019-08-15T14:45:51Z
Rooksalon                                   2019-08-15T14:45:51Z
Fortuynzaal                                 2019-08-15T14:45:52Z
Koffiekamer                                 2019-08-15T14:45:52Z
Extern                                      2019-08-15T14:45:53Z
Z5 - (Nog) geen zaal beschikbaar            2019-08-15T14:45:53Z
Evenementruimte 1                           2019-08-15T14:45:53Z
Koffiekamer              

### What about documents?

The above does not mention documents part of the dossiers. 

So what do documents mention?

In [7]:
# break_actually stops after the first page, because the full set is 700+ MByte and would easily take ten minutes to fetch
# Note: you don't really need to run this, this is mainly here for a point we are about to make below
firstpage = wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Document', break_actually=True )[0]

#print( wetsuite.helpers.etree.debug_color( firstpage.find('entry') ) ) # first entry, for reference.
doc_dicts = wetsuite.datacollect.tweedekamer_nl.entry_dicts( firstpage )
pprint.pprint( doc_dicts[-2:] )

[{'category': 'document',
  'content': {'bijgewerkt': '2019-07-03T15:32:12.6700000',
              'id': 'd1bd5ec5-72fb-4702-8115-b6d24b552cdb',
              'refs': {},
              'tagname': 'document',
              'verwijderd': 'true'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/d1bd5ec5-72fb-4702-8115-b6d24b552cdb',
  'title': 'd1bd5ec5-72fb-4702-8115-b6d24b552cdb',
  'updated': '2019-07-03T13:33:10Z'},
 {'category': 'document',
  'content': {'bijgewerkt': '2019-07-03T15:33:02.7970000',
              'id': '3f75d7c1-379e-4241-9f82-539d244887ff',
              'refs': {},
              'tagname': 'document',
              'verwijderd': 'true'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/3f75d7c1-379e-4241-9f82-539d244887ff',
  'title': '3f75d7c1-379e-4241-9f82-539d244887ff',
  'updated': '2019-07-03T13:33:10Z'}]


...that turns out contain nothing very useful _about_ the actual document (that we could fetch based on this). 

Presumably there is more useful metadata when something else points at them, e.g. when a kamerstukdossier refers to its documents.

...kamerstukdossiers however turn out to be a little more complex (see part 2).
First, we do something that requires less investigation:

### What about parties?

Our first real quest is "make a list of who is member of what party".

This information is spread four different resource types: `Persoon`, `FractieZetelPersoon`, `FractieZetel`, and `Fractie`.

You can see FractieZetelPersoon as something like a [many-to-many junction table](https://en.wikipedia.org/wiki/Many-to-many_(data_model)):
- a reference to a FractieZetel (which itself happens to reference just a Fractie)
- a reference to a Persoon

Persoon and Fractie have no references, so are just details to each item.

The below code 
- does the relational joins required to find party memberships, 
- then simplifies the results somewhat by adding those membership to the persoon details

In [3]:
# Fetch all resources of teachhe mentioned soort
# The combination should take no longer than a minute or two.
print("Fetching Persoon")
data_Persoon            = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Persoon' ) )
print("Fetching Fractie")
data_Fractie            = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Fractie' ) )
print("Fetching FractieZetel")
data_FractieZetel       = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'FractieZetel' ) )
print("Fetching FractieZetelPersoon")
data_FractieZetelPesoon = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'FractieZetelPersoon' ) )

Fetching Persoon
Fetching Fractie
Fetching FractieZetel
Fetching FractieZetelPersoon


In [10]:
# For some insight/reference, the XML form of one FractieZetelPersoon entry, as xml
print( wetsuite.helpers.etree.debug_pretty(data_FractieZetelPesoon.find('entry')) )

<entry>
  <title>808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e</title>
  <id>https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e</id>
  <author>
    <name>Tweede Kamer der Staten-Generaal</name>
  </author>
  <updated>2023-08-29T13:23:13Z</updated>
  <category term="fractieZetelPersoon"/>
  <link rel="next" href="https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Feed?category=FractieZetelPersoon&amp;skiptoken=16687327"/>
  <content type="application/xml">
    <fractieZetelPersoon id="808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e" bijgewerkt="2023-08-29T11:10:24Z" verwijderd="false">
      <fractieZetel ref="ca826e72-cf57-4cca-b090-d5c444ec6c2d"/>
      <persoon ref="ec273841-069f-408b-b434-8524904ae314"/>
      <functie>Lid</functie>
      <van>2002-05-23</van>
      <totEnMet>2010-10-11</totEnMet>
    </fractieZetelPersoon>
  </content>
</entry>



In [12]:
# and the way we flatten that into python dicts:
display( wetsuite.datacollect.tweedekamer_nl.entry_dicts( data_FractieZetelPesoon ) [0] )

{'title': '808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e',
 'updated': '2023-08-29T13:23:13Z',
 'category': 'fractieZetelPersoon',
 'content': {'refs': {'fractieZetel': 'ca826e72-cf57-4cca-b090-d5c444ec6c2d',
   'persoon': 'ec273841-069f-408b-b434-8524904ae314'},
  'tagname': 'fractieZetelPersoon',
  'id': '808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e',
  'bijgewerkt': '2023-08-29T11:10:24Z',
  'verwijderd': 'false',
  'functie': 'Lid',
  'van': '2002-05-23',
  'totEnMet': '2010-10-11'},
 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e'}

In [4]:
# Since we just fetched completely separate things,
# we need to _ourselves_ do what amounts to a manual JOIN of relational data

# It happens we can make the next bit of code a little more bite-sized by by reshaping some data to assist it, 
# in particular by making it easier to fetch varied individual items by id (and by type).
# Why we do this should become clearer in the next code block (...if you care).
id_thing = {}   #               guid -> detailsdict
by_type  = {}   #     soort/category -> list of all such detailsdicts

for etree in (data_Persoon, data_Fractie, data_FractieZetel, data_FractieZetelPesoon):
    for entry_dict in wetsuite.datacollect.tweedekamer_nl.entry_dicts( etree ):
        id_thing[ entry_dict['content']['id'] ] = entry_dict

        category = entry_dict['category']
        if category not in by_type:
            by_type[category] = []
        by_type[ category ].append( entry_dict )

In [5]:
persoon_combined = {}  # which is _mostly_ still the persoon dicts, except we are adding a key that is the membership

# That join-y code could look like:

for fzp_dict in by_type['fractieZetelPersoon']: # go through that thing that points at fractieZetel and Persoon

    # fetch the details of the FractieZetel and Persoon it's referring to
    if 'fractieZetel' not in fzp_dict['content']['refs']: # TODO: figure out why these cases exist
        print("  FractieZetelPersoon item without FractieZetel - huh? %r"%fzp_dict)
        continue
    fractiezetel_id = fzp_dict['content']['refs']['fractieZetel']
    persoon_id      = fzp_dict['content']['refs']['persoon']

    fractiezetel_dict = id_thing[ fractiezetel_id ]
    persoon_dict      = id_thing[ persoon_id ]

    # similarly, fetch the Fractie from the FractieZetel
    fractie_id = fractiezetel_dict['content']['refs']['fractie']
    frac_dict = id_thing[ fractie_id ]


    # now we have all the information we want - to start adding things to the Persoon details, as mentioned
    if persoon_id not in persoon_combined:
        persoon_combined[ persoon_id ] = persoon_dict
        persoon_combined[ persoon_id ]['fractie_membership'] = []
        
    persoon_combined[ persoon_id ]['fractie_membership'].append( 
        {
            'fractie_id':frac_dict['content']['id'], 
            'fractie_afkorting':frac_dict['content']['afkorting'], 
            # maybe more or all of frac_dict?
            'functie':fzp_dict['content']['functie'], 
            'van':fzp_dict['content']['van'], 
            'totEnMet':fzp_dict['content']['totEnMet'],
        } 
    )

  FractieZetelPersoon item without FractieZetel - huh? {'title': 'd73d7f69-1235-4746-aa94-84b593909bfc', 'updated': '2023-08-29T14:00:09Z', 'category': 'fractieZetelPersoon', 'content': {'refs': {}, 'tagname': 'fractieZetelPersoon', 'id': 'd73d7f69-1235-4746-aa94-84b593909bfc', 'bijgewerkt': '2023-08-29T11:10:32Z', 'verwijderd': 'true'}, 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/d73d7f69-1235-4746-aa94-84b593909bfc'}
  FractieZetelPersoon item without FractieZetel - huh? {'title': '8cca26af-365a-46fc-b72f-c42b2a17a992', 'updated': '2023-12-06T13:48:37Z', 'category': 'fractieZetelPersoon', 'content': {'refs': {}, 'tagname': 'fractieZetelPersoon', 'id': '8cca26af-365a-46fc-b72f-c42b2a17a992', 'bijgewerkt': '2023-12-05T14:58:31Z', 'verwijderd': 'true'}, 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/8cca26af-365a-46fc-b72f-c42b2a17a992'}
  FractieZetelPersoon item without FractieZetel - huh? {'title': '6d685347-a5c8-498b-94df-b7bbb1bc12

In [6]:
fracs = {} # id -> details, for consistency with the above and easier lookup
for fracs_dict in by_type['fractie']:
    fracs[ fracs_dict['id'] ] = fracs_dict

As an indication of what we have just constructed:

In [28]:
random.choice( list(fracs.values()) )

{'title': 'd3b4d880-ef37-4ce6-99ec-4940266ac466',
 'updated': '2023-12-06T13:48:14Z',
 'category': 'fractie',
 'content': {'refs': {},
  'tagname': 'fractie',
  'id': 'd3b4d880-ef37-4ce6-99ec-4940266ac466',
  'bijgewerkt': '2023-12-05T14:36:07Z',
  'verwijderd': 'false',
  'contentType': 'image/jpeg',
  'contentLength': '11374',
  'nummer': '2764',
  'afkorting': 'PvdD',
  'naamNl': 'Partij voor de Dieren',
  'naamEn': 'Party for the Animals',
  'aantalZetels': '3',
  'aantalStemmen': '235148',
  'datumActief': '2006-11-30',
  'datumInactief': None},
 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/d3b4d880-ef37-4ce6-99ec-4940266ac466'}

In [31]:
random.choice( list(persoon_combined.values()) )

{'title': 'c7822b58-103f-4612-87ef-648be97192c6',
 'updated': '2024-03-11T16:44:09Z',
 'category': 'persoon',
 'content': {'refs': {},
  'tagname': 'persoon',
  'id': 'c7822b58-103f-4612-87ef-648be97192c6',
  'bijgewerkt': '2024-03-11T16:43:16Z',
  'verwijderd': 'false',
  'contentType': 'image/jpeg',
  'contentLength': '666768',
  'nummer': '4967',
  'titels': None,
  'initialen': 'E.M.',
  'tussenvoegsel': None,
  'achternaam': 'Westerveld',
  'voornamen': 'Elisabeth Marij',
  'roepnaam': 'Lisa',
  'geslacht': 'vrouw',
  'functie': 'Tweede Kamerlid',
  'geboortedatum': '1981-11-16',
  'geboorteplaats': 'Aalten',
  'geboorteland': 'Nederland',
  'overlijdensdatum': None,
  'overlijdensplaats': None,
  'woonplaats': 'Nijmegen',
  'land': 'NL',
  'fractielabel': None},
 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/c7822b58-103f-4612-87ef-648be97192c6',
 'fractie_membership': [{'fractie_id': '8fd1a907-0355-4d27-8dc1-fd5a531b471e',
   'fractie_afkorting': 'GL',
 

In [None]:
print("WRITING as JSON (tweedekamer-fracties-struc.json, tweedekamer-fractie-membership-struc.json)")

with open('tweedekamer-fracties-struc.json','wb') as jsonfile:
    jsonfile.write( json.dumps({ 
        'description_short':''' Description of political parties/fracties. ''',
        'description':      ''' Description of political parties/fracties.

    Items look something like: 
        {'title': 'ae48391e-ce4d-47e0-86e3-ee310282f66f',
        'updated': '2023-12-06T13:48:37Z',
        'category': 'fractie',
        'nummer': '50311',
        'afkorting': 'Volt',
        'naamNl': 'Volt',
        'naamEn': 'Volt',
        'aantalZetels': '2',
        'aantalStemmen': '178802',
        'datumActief': '2021-03-31',
        'datumInactief': None,
        'id': 'ae48391e-ce4d-47e0-86e3-ee310282f66f'}
        ''',
        'data':fracs} ).encode('ascii')
    )

with open('tweedekamer-fractie-membership-struc.json','wb') as jsonfile:
    jsonfile.write( json.dumps({ 
        'description_short':'''Description of people, including party memberships over time.''',
        'description':      '''Description of people, including party memberships over time (each with fractie_afkorting, functie, van, totEnMet) ''',
        'data':persoon_combined} ).encode('ascii')
    )

In [8]:

print("WRITING as dataset store (tweedekamer-fracties-struc.db, tweedekamer-fractie-membership-struc.db)")

with wetsuite.helpers.localdata.MsgpackKV('tweedekamer-fracties-struc.db') as fracties_db:
    fracties_db._put_meta('description_short', '''Description of political parties/fracties.''')
    fracties_db._put_meta('description',       '''Description of political parties/fracties.

    Items look something like: 
        {'title': 'ae48391e-ce4d-47e0-86e3-ee310282f66f',
        'updated': '2023-12-06T13:48:37Z',
        'category': 'fractie',
        'nummer': '50311',
        'afkorting': 'Volt',
        'naamNl': 'Volt',
        'naamEn': 'Volt',
        'aantalZetels': '2',
        'aantalStemmen': '178802',
        'datumActief': '2021-03-31',
        'datumInactief': None,
        'id': 'ae48391e-ce4d-47e0-86e3-ee310282f66f'}
        ''')
    for k, v in fracs.items():
        fracties_db.put( k, v) 

with wetsuite.helpers.localdata.MsgpackKV('tweedekamer-fractie-membership-struc.db') as membership_db:
    membership_db._put_meta('description_short', '''Description of people, including party memberships over time.''')
    membership_db._put_meta('description',       '''Description of people, including party memberships over time(each with fractie_afkorting, functie, van, totEnMet)''')

    for k, v in persoon_combined.items():
        membership_db.put( k, v) 

WRITING as dataset store (tweedekamer-fracties-struc.db, tweedekamer-fractie-membership-struc.db)
