<a href="https://colab.research.google.com/github/knobs-dials/wetsuite-datacollect/blob/main/tweede_kamer_apis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Purpose of this notebook

Mainly to explore the two APIs at the [tweede kamer open data portal](https://opendata.tweedekamer.nl/),
seeing what you might use them for.

<!-- -->

A note before we get intro technical details:

Neither API is is a very _fast_ interface, 
nor will they be efficient when your goal is sifting through the entire collection in detailed ways it was not initially designed for. 
(It also seems it sometimes refuses to connect?)

For some purposes, you might like to download all and figure it out later -- or, if we did that already, find the resulting dataset.

Those two interfaces are:
- an [Atom-style API called SyncFeed](https://opendata.tweedekamer.nl/documentatie/syncfeed-api), and 
  - a little easier to interact without a whole library.
  - speaks XML/Atom in interaction
- an [OData API](https://opendata.tweedekamer.nl/documentatie/odata-api).
  - The OData is a little bit more up-front preparation, but can be more thorough and flexible.
  - speaks JSON in interaction

Both APIs should adhere to the same [relational data model](https://opendata.tweedekamer.nl/documentatie/informatiemodel) that you should be thinking of.

Relational means that data will say it relates to other data, so much of the below is trying to explore/show
what kind of interesting things might be in there in the first place.

...because we probably don't want to just fetch everything,
we want to show how to figure how to get and use the parts you need for a specific purpose.

<!-- -->

This notebook and its compantion will end up not being interested in most of its parts,
yet we will give at least _some_ introduction to the less trivial bits.

In [1]:
import random
import collections
import json
import pprint
import os 

import wetsuite.datacollect.tweedekamer_nl  # contains some basic code dealing with the syncfeed API
import wetsuite.helpers.etree

#from wetsuite.helpers  import etree
#from wetsuite.helpers  import strings
#from wetsuite.helpers  import notebook

In [2]:
# for reference, these are the major entity types in that model
wetsuite.datacollect.tweedekamer_nl.resource_types  # This is a list from our library.  We could also fetch these, but they would be the same.

('Activiteit',
 'ActiviteitActor',
 'Agendapunt',
 'Besluit',
 'Commissie',
 'CommissieContactinformatie',
 'CommissieZetel',
 'CommissieZetelVastPersoon',
 'CommissieZetelVastVacature',
 'CommissieZetelVervangerPersoon',
 'CommissieZetelVervangerVacature',
 'Document',
 'DocumentActor',
 'DocumentVersie',
 'Fractie',
 'FractieAanvullendGegeven',
 'FractieZetel',
 'FractieZetelPersoon',
 'FractieZetelVacature',
 'Kamerstukdossier',
 'Persoon',
 'PersoonContactinformatie',
 'PersoonGeschenk',
 'PersoonLoopbaan',
 'PersoonNevenfunctie',
 'PersoonNevenfunctieInkomsten',
 'PersoonOnderwijs',
 'PersoonReis',
 'Reservering',
 'Stemming',
 'Vergadering',
 'Verslag',
 'Zaak',
 'ZaakActor',
 'Zaal')

## Atom/SyncFeed API

Let's take a look at the Atom/SyncFeed API first.

### "Fetch one resource type"

SyncFeed starts by interacting with an URL like:

        https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Feed?category=Persoon

If you manage to load that in a browser, you will see an Atom XML structure mentioning the first bunch/page of them, and links to successive pages.

To fetch all resources of the mentioned soort/category, and to then perhaps make a single list of it, 
you need something that follows and fetches those 'next page' style links. Code like our `wetsuite.datacollect.tweedekamer_nl.fetch_all` does that.

That said, due to the relational nature of of the data (pointing at related objects, and you often wanting to fetch a related constellation of them),
it quickly becomes interesting to interrogate the data with **more foresight than "just give me everything"** - we'll get to that later.

Uses vary. 
- Below, we'll end up extracting who is member of what party, 
  which is a somewhat manual piecing together of 'Persoon', 'Fractie', 'FractieZetel', and 'FractieZetelPersoon'.

- If you wanted a record of what gets done in an everyday way, you might care about 'Vergadering', 'Verslag', 'Stemming',

- if you are more interested in documentation, then 'Document', 'Kamerstukdossier'

- The 'Zaak' type lies somewhere inbetween

- Kamerstukdossiers are interesting for their wocnents, relating/organizing Zaken, Documents, and more. 
  Their relations are a little more complex than they seem at first, which is why it's not our first example.

For a list of types with some explanation of their relations, see [the documentation](https://opendata.tweedekamer.nl/documentatie/).

#### Smaller example: Zaal

This doesn't do much by itself. 

You would probably only do this when you want to know what Activiteiten happened where.

This just serves as an example of basic interation, that doesn't have spammy output.

In [3]:
zaal_etrees = wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Zaal' ) 
# The helper functions consider that this comes from separate fetches,
#   and that in some cases you may not want to fetch all.
# (consider that e.g. all Stemming, Zaak, Document total to hundreds of megabytes)

# So fetch_all() specifically _does_ return all, as a list of etree objects, one for each page of results. 
# Chances are we want to see that as one big list...
zaal_tree = wetsuite.datacollect.tweedekamer_nl.merge_etrees( zaal_etrees )

In [4]:
# ...whether you want write it out into a file...
with open('Zaal.xml', 'wb') as zaal_file:
    zaal_file.write( wetsuite.helpers.etree.tostring( zaal_tree ) )

In [5]:
# ...or consume it in code. For which you probably want to use the following helpers to see that XML as python dicts...
entry_dicts = wetsuite.datacollect.tweedekamer_nl.entry_dicts( zaal_tree )

pprint.pprint( entry_dicts[:2] ) # show a few

# and for reference, the XML that came from:
#print( wetsuite.helpers.etree.debug_pretty( zaal_tree ) )

[{'category': 'zaal',
  'content': {'bijgewerkt': '2019-08-12T12:56:35.4070000',
              'id': '6e7dfdae-583a-4191-8818-a89a538c469f',
              'naam': 'Z7 - Statenpassage - Petitie',
              'refs': {},
              'sysCode': '154',
              'tagname': 'zaal',
              'verwijderd': 'false'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/6e7dfdae-583a-4191-8818-a89a538c469f',
  'title': '6e7dfdae-583a-4191-8818-a89a538c469f',
  'updated': '2019-08-15T14:45:52Z'},
 {'category': 'zaal',
  'content': {'bijgewerkt': '2019-08-12T12:56:34.8600000',
              'id': 'f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
              'naam': 'Eerste Kamer',
              'refs': {},
              'sysCode': '101',
              'tagname': 'zaal',
              'verwijderd': 'false'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
  'title': 'f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
  'u

In [6]:
# ...where the dicts are more directly useful than the etree. An example of picking things from such dicts:
for detail_dict in wetsuite.datacollect.tweedekamer_nl.entry_dicts( zaal_tree ):
    print(f"{detail_dict['content']['naam']:40s}    {detail_dict['updated']}")

Z7 - Statenpassage - Petitie                2019-08-15T14:45:52Z
Eerste Kamer                                2019-08-15T14:45:52Z
Z6 - Plein 2 - Petitie                      2019-08-15T14:45:52Z
Regentenkamer                               2019-08-15T14:45:51Z
Schrijfkamer                                2019-08-15T14:45:51Z
Statenlokaal                                2019-08-15T14:45:51Z
Van Mierlozaal                              2019-08-15T14:45:52Z
Z5 - Schriftelijke Inbreng                  2019-08-15T14:45:52Z
van Someren-Downerzaal                      2019-08-15T14:45:51Z
Rooksalon                                   2019-08-15T14:45:51Z
Fortuynzaal                                 2019-08-15T14:45:52Z
Koffiekamer                                 2019-08-15T14:45:52Z
Extern                                      2019-08-15T14:45:53Z
Z5 - (Nog) geen zaal beschikbaar            2019-08-15T14:45:53Z
Evenementruimte 1                           2019-08-15T14:45:53Z
Koffiekamer              

### What about documents?

The above does not mention documents part of the dossiers. 

So what do documents mention?

In [7]:
# break_actually stops after the first page, because the full set is 700+ MByte and would easily take ten minutes 
# (you don't really need to run this, this is mainly here for a point we are about to make below)
firstpage = wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Document', break_actually=True )[0]

#print( wetsuite.helpers.etree.debug_pretty( firstpage.find('entry') ) ) # first entry, for reference.
doc_dicts = wetsuite.datacollect.tweedekamer_nl.entry_dicts( firstpage )
pprint.pprint( doc_dicts[-2:] )

[{'category': 'document',
  'content': {'bijgewerkt': '2019-07-03T15:32:12.6700000',
              'id': 'd1bd5ec5-72fb-4702-8115-b6d24b552cdb',
              'refs': {},
              'tagname': 'document',
              'verwijderd': 'true'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/d1bd5ec5-72fb-4702-8115-b6d24b552cdb',
  'title': 'd1bd5ec5-72fb-4702-8115-b6d24b552cdb',
  'updated': '2019-07-03T13:33:10Z'},
 {'category': 'document',
  'content': {'bijgewerkt': '2019-07-03T15:33:02.7970000',
              'id': '3f75d7c1-379e-4241-9f82-539d244887ff',
              'refs': {},
              'tagname': 'document',
              'verwijderd': 'true'},
  'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/3f75d7c1-379e-4241-9f82-539d244887ff',
  'title': '3f75d7c1-379e-4241-9f82-539d244887ff',
  'updated': '2019-07-03T13:33:10Z'}]


...that turns out contain nothing very useful _about_ the actual document (that we could fetch based on this). 

Presumably there is more useful metadata when something else points at them, e.g. when a kamerstukdossier refers to its documents.

...kamerstukdossiers however turn out to be a little more complex (see part 2).
First, we do something that requires less investigation:

### What about parties?

Our first quest is "make a list of who is member of what party".

This information is spread four different resource types: `Persoon`, `FractieZetelPersoon`, `FractieZetel`, `Fractie`.

You can see FractieZetelPersoon as something like a [many-to-many junction table](https://en.wikipedia.org/wiki/Many-to-many_(data_model)):
- a reference to a FractieZetel, which itself just references a Fractie
- a reference to a Persoon

Persoon and Fractie have no references, so are just details to each item.

The below code 
- does the relational joins required to find party memberships, 
- then simplifies the results somewhat by adding those membership to the persoon details

In [8]:
# Fetch all resources of teachhe mentioned soort
# The combination should take no longer than a minute or two.
print("Fetching Persoon")
data_Persoon            = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Persoon' ) )
print("Fetching Fractie")
data_Fractie            = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'Fractie' ) )
print("Fetching FractieZetel")
data_FractieZetel       = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'FractieZetel' ) )
print("Fetching FractieZetelPersoon")
data_FractieZetelPesoon = wetsuite.datacollect.tweedekamer_nl.merge_etrees( wetsuite.datacollect.tweedekamer_nl.fetch_all( 'FractieZetelPersoon' ) )

Fetching Persoon
Fetching Fractie
Fetching FractieZetel
Fetching FractieZetelPersoon


In [10]:
# For some insight/reference, the XML form of one FractieZetelPersoon entry, as xml
print( wetsuite.helpers.etree.debug_pretty(data_FractieZetelPesoon.find('entry')) )


<entry>
  <title>808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e</title>
  <id>https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e</id>
  <author>
    <name>Tweede Kamer der Staten-Generaal</name>
  </author>
  <updated>2023-08-29T13:23:13Z</updated>
  <category term="fractieZetelPersoon"/>
  <link rel="next" href="https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Feed?category=FractieZetelPersoon&amp;skiptoken=16687327"/>
  <content type="application/xml">
    <fractieZetelPersoon id="808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e" bijgewerkt="2023-08-29T11:10:24Z" verwijderd="false">
      <fractieZetel ref="ca826e72-cf57-4cca-b090-d5c444ec6c2d"/>
      <persoon ref="ec273841-069f-408b-b434-8524904ae314"/>
      <functie>Lid</functie>
      <van>2002-05-23</van>
      <totEnMet>2010-10-11</totEnMet>
    </fractieZetelPersoon>
  </content>
</entry>



In [12]:
# and the way we flatten that into python dicts:
display( wetsuite.datacollect.tweedekamer_nl.entry_dicts( data_FractieZetelPesoon ) [0] )

{'title': '808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e',
 'updated': '2023-08-29T13:23:13Z',
 'category': 'fractieZetelPersoon',
 'content': {'refs': {'fractieZetel': 'ca826e72-cf57-4cca-b090-d5c444ec6c2d',
   'persoon': 'ec273841-069f-408b-b434-8524904ae314'},
  'tagname': 'fractieZetelPersoon',
  'id': '808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e',
  'bijgewerkt': '2023-08-29T11:10:24Z',
  'verwijderd': 'false',
  'functie': 'Lid',
  'van': '2002-05-23',
  'totEnMet': '2010-10-11'},
 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/808fcd50-a0dc-4f60-8b9d-c404a2eb5b2e'}

In [15]:
# Since we just fetched completely separate things,
# we need to _ourselves_ do what amounts to a manual JOIN of relational data

# It happens we can make the next bit of code a little more bite-sized by by reshaping some data to assist it, 
# in particular by making it easier to fetch varied individual items by id (and by type).
# Why we do this should become clearer in the next code block (...if you care).
id_thing = {}   #               guid -> detailsdict
by_type  = {}   #     soort/category -> list of all such detailsdicts

for etree in (data_Persoon, data_Fractie, data_FractieZetel, data_FractieZetelPesoon):
    for entry_dict in wetsuite.datacollect.tweedekamer_nl.entry_dicts( etree ):
        id_thing[ entry_dict['content']['id'] ] = entry_dict

        category = entry_dict['category']
        if category not in by_type:
            by_type[category] = []
        by_type[ category ].append( entry_dict )

{'persoon': [{'title': '77dc181f-00d6-4d5e-b188-3fd0c02f4006',
   'updated': '2023-08-29T11:10:11Z',
   'category': 'persoon',
   'content': {'refs': {},
    'tagname': 'persoon',
    'id': '77dc181f-00d6-4d5e-b188-3fd0c02f4006',
    'bijgewerkt': '2023-08-29T11:09:44Z',
    'verwijderd': 'false',
    'nummer': '8',
    'titels': 'Jhr.mr.',
    'initialen': 'AF',
    'tussenvoegsel': 'de',
    'achternaam': 'Savornin Lohman',
    'voornamen': 'Alexander Frederik',
    'roepnaam': None,
    'geslacht': 'man',
    'functie': 'Oud Kamerlid',
    'geboortedatum': '1837-05-29',
    'geboorteplaats': 'Groningen',
    'geboorteland': None,
    'overlijdensdatum': '1924-06-11',
    'overlijdensplaats': "'s-Gravenhage",
    'woonplaats': None,
    'land': None,
    'fractielabel': None},
   'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/77dc181f-00d6-4d5e-b188-3fd0c02f4006'},
  {'title': '675b63a1-c1f4-4384-8a52-6ae322a4fabb',
   'updated': '2023-08-29T11:10:11Z',
   'ca

In [18]:
persoon_combined = {}  # which is mostly still the persoon dicts, except we added a key that is the membership

# That join-y code could look like:

for fzp_dict in by_type['fractieZetelPersoon']: # go through that thing that points at fractieZetel and Persoon

    # fetch the details of the FractieZetel and Persoon it's referring to
    if 'fractieZetel' not in fzp_dict['content']['refs']: 
        print("  FractieZetelPersoon item without FractieZetel - huh? %r"%fzp_dict)
        continue
    fractiezetel_id = fzp_dict['content']['refs']['fractieZetel']
    persoon_id      = fzp_dict['content']['refs']['persoon']

    fractiezetel_dict = id_thing[ fractiezetel_id ]
    persoon_dict      = id_thing[ persoon_id ]

    # similarly, fetch the Fractie from the FractieZetel
    fractie_id = fractiezetel_dict['content']['refs']['fractie']
    frac_dict = id_thing[ fractie_id ]

    # now we have all the information we want - to start adding things to the Persoon details, as mentioned
    if persoon_id not in persoon_combined:
        persoon_combined[ persoon_id ] = persoon_dict
        persoon_combined[ persoon_id ]['fractie_membership'] = []
        
    persoon_combined[ persoon_id ]['fractie_membership'].append( 
        {
            'fractie_id':frac_dict['content']['id'], 
            'fractie_afkorting':frac_dict['content']['afkorting'], 
            # maybe more or all of frac_dict?
            'functie':fzp_dict['content']['functie'], 
            'van':fzp_dict['content']['van'], 
            'totEnMet':fzp_dict['content']['totEnMet'],
        } 
    )

  FractieZetelPersoon item without FractieZetel - huh? {'title': 'd73d7f69-1235-4746-aa94-84b593909bfc', 'updated': '2023-08-29T14:00:09Z', 'category': 'fractieZetelPersoon', 'content': {'refs': {}, 'tagname': 'fractieZetelPersoon', 'id': 'd73d7f69-1235-4746-aa94-84b593909bfc', 'bijgewerkt': '2023-08-29T11:10:32Z', 'verwijderd': 'true'}, 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/d73d7f69-1235-4746-aa94-84b593909bfc'}
  FractieZetelPersoon item without FractieZetel - huh? {'title': '8cca26af-365a-46fc-b72f-c42b2a17a992', 'updated': '2023-12-06T13:48:37Z', 'category': 'fractieZetelPersoon', 'content': {'refs': {}, 'tagname': 'fractieZetelPersoon', 'id': '8cca26af-365a-46fc-b72f-c42b2a17a992', 'bijgewerkt': '2023-12-05T14:58:31Z', 'verwijderd': 'true'}, 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/8cca26af-365a-46fc-b72f-c42b2a17a992'}
  FractieZetelPersoon item without FractieZetel - huh? {'title': '6d685347-a5c8-498b-94df-b7bbb1bc12

In [19]:
fracs = {} # id -> details, for consistency with the above and easier lookup
for fracs_dict in by_type['fractie']:
    fracs[ fracs_dict['id'] ] = fracs_dict

As an indication of what we have just constructed:

In [28]:
random.choice( list(fracs.values()) )

{'title': 'd3b4d880-ef37-4ce6-99ec-4940266ac466',
 'updated': '2023-12-06T13:48:14Z',
 'category': 'fractie',
 'content': {'refs': {},
  'tagname': 'fractie',
  'id': 'd3b4d880-ef37-4ce6-99ec-4940266ac466',
  'bijgewerkt': '2023-12-05T14:36:07Z',
  'verwijderd': 'false',
  'contentType': 'image/jpeg',
  'contentLength': '11374',
  'nummer': '2764',
  'afkorting': 'PvdD',
  'naamNl': 'Partij voor de Dieren',
  'naamEn': 'Party for the Animals',
  'aantalZetels': '3',
  'aantalStemmen': '235148',
  'datumActief': '2006-11-30',
  'datumInactief': None},
 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/d3b4d880-ef37-4ce6-99ec-4940266ac466'}

In [31]:
random.choice( list(persoon_combined.values()) )

{'title': 'c7822b58-103f-4612-87ef-648be97192c6',
 'updated': '2024-03-11T16:44:09Z',
 'category': 'persoon',
 'content': {'refs': {},
  'tagname': 'persoon',
  'id': 'c7822b58-103f-4612-87ef-648be97192c6',
  'bijgewerkt': '2024-03-11T16:43:16Z',
  'verwijderd': 'false',
  'contentType': 'image/jpeg',
  'contentLength': '666768',
  'nummer': '4967',
  'titels': None,
  'initialen': 'E.M.',
  'tussenvoegsel': None,
  'achternaam': 'Westerveld',
  'voornamen': 'Elisabeth Marij',
  'roepnaam': 'Lisa',
  'geslacht': 'vrouw',
  'functie': 'Tweede Kamerlid',
  'geboortedatum': '1981-11-16',
  'geboorteplaats': 'Aalten',
  'geboorteland': 'Nederland',
  'overlijdensdatum': None,
  'overlijdensplaats': None,
  'woonplaats': 'Nijmegen',
  'land': 'NL',
  'fractielabel': None},
 'id': 'https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Entiteiten/c7822b58-103f-4612-87ef-648be97192c6',
 'fractie_membership': [{'fractie_id': '8fd1a907-0355-4d27-8dc1-fd5a531b471e',
   'fractie_afkorting': 'GL',
 

In [None]:
print("WRITING as JSON (tweedekamer-fracties-struc.json, tweedekamer-fractie-membership-struc.json)")

with open('tweedekamer-fracties-struc.json','wb') as jsonfile:
    jsonfile.write( json.dumps({ 
        'description_short':''' Description of political parties/fracties. ''',
        'description':      ''' Description of political parties/fracties.

    Items look something like: 
        {'title': 'ae48391e-ce4d-47e0-86e3-ee310282f66f',
        'updated': '2023-12-06T13:48:37Z',
        'category': 'fractie',
        'nummer': '50311',
        'afkorting': 'Volt',
        'naamNl': 'Volt',
        'naamEn': 'Volt',
        'aantalZetels': '2',
        'aantalStemmen': '178802',
        'datumActief': '2021-03-31',
        'datumInactief': None,
        'id': 'ae48391e-ce4d-47e0-86e3-ee310282f66f'}
        ''',
        'data':fracs} ).encode('ascii')
    )

with open('tweedekamer-fractie-membership-struc.json','wb') as jsonfile:
    jsonfile.write( json.dumps({ 
        'description_short':''' Description of people, including party memberships over time. ''',
        'description':      ''' Description of people, including party memberships over time(each with fractie_afkorting, functie, van, totEnMet) ''',
        'data':persoon_combined} ).encode('ascii')
    )

In [37]:
import wetsuite.helpers.localdata

print("WRITING as dataset store (tweedekamer-fracties-struc.db, tweedekamer-fractie-membership-struc.db)")

with wetsuite.helpers.localdata.MsgpackKV('tweedekamer-fracties-struc.db') as fracties_db:
    fracties_db._put_meta('description_short', ''' Description of political parties/fracties. ''')
    fracties_db._put_meta('description',       ''' Description of political parties/fracties.

    Items look something like: 
        {'title': 'ae48391e-ce4d-47e0-86e3-ee310282f66f',
        'updated': '2023-12-06T13:48:37Z',
        'category': 'fractie',
        'nummer': '50311',
        'afkorting': 'Volt',
        'naamNl': 'Volt',
        'naamEn': 'Volt',
        'aantalZetels': '2',
        'aantalStemmen': '178802',
        'datumActief': '2021-03-31',
        'datumInactief': None,
        'id': 'ae48391e-ce4d-47e0-86e3-ee310282f66f'}
        ''')
    for k, v in fracs.items():
        fracties_db.put( k, v) 

with wetsuite.helpers.localdata.MsgpackKV('tweedekamer-fractie-membership-struc.db') as membership_db:
    membership_db._put_meta('description_short', ''' Description of people, including party memberships over time. ''')
    membership_db._put_meta('description',       ''' Description of people, including party memberships over time(each with fractie_afkorting, functie, van, totEnMet) ''')

    for k, v in persoon_combined.items():
        membership_db.put( k, v) 

WRITING as dataset store (tweedekamer-fracties-struc.db, tweedekamer-fractie-membership-struc.db)


## Looking at kamerstukdossiers


30800 ['VIII', 'XVI', 'XIV']
31314 ['(R1843)']
31444 ['IXB', 'X', None, 'XIV', 'VII', 'XII']
31200 ['XV', 'XIV', 'A', None, 'B', 'I', 'D', 'IXB', 'IV', 'VII', 'XVIII', 'XVII', 'VIII', 'III', 'XIII', 'XI', 'XVI', 'VI', 'XII', 'V', 'X']
31725 ['(R1867)']
31449 ['(R1857)']
31792 ['F', 'IIA', 'C', 'B', 'XII', 'A', 'XIII', 'VII', 'IXA', 'D', 'XVIII', 'IXB', 'XI', 'X', 'IV', None, 'V', 'XVI', 'XVII', 'XV', 'XIV', 'VI', 'VIII', 'III', 'G', 'IIB']
31754 ['(R1869)']
31740 ['(R1868)']
31422 ['(R1853)']
31429 ['(R1855)']
31900 ['(R1879)']
31846 ['(R1875)']
31882 ['(R1878)']
32167 ['(R1895)']
31965 ['XIII', 'III', 'E', 'VI', 'XVII', 'IXA', 'VIII', 'D', 'IXB', 'XVIII', 'V', 'B', 'C', 'VII', 'X', 'A', 'G', 'IIA', 'IV', 'XVI', 'IIB', 'XIV', 'XV', 'F', 'XII', None, 'XI']
31700 ['IXA', 'IIA', 'I', 'G', 'F', 'IXB', 'D', 'XIII', 'XII', 'XV', 'V', 'A', 'C', 'B', 'IV', None, 'VII', 'XVII', 'VI', 'III', 'XVIII', 'VIII', 'XVI', 'XI', 'XIV', 'E', 'IIB', 'X']
32148 ['(R1893)']
32123 ['E', 'G', 'IXA', 'IIA', 'F

## OData interface

The SyncFeed API is perfectly functional, though it leaves you to do interpretation of the relations yourself, 
so let's see if the OData API is any more help.

There is a helpful library out there, [tkapi](https://github.com/openkamer/tkapi), which means much less work for us.

In [None]:
# if you haven't already:
# !pip3 install tkapi

In [6]:
import tkapi, tkapi.document   
from tkapi.document import DocumentSoort
api = tkapi.TKApi()

In [69]:
# Get an idea of what this API even does
#   the document types are function calls, to wit:
list(name   for name in dir(api)   if name.startswith('get_'))

['get_activiteiten',
 'get_agendapunten',
 'get_all_items',
 'get_antwoorden',
 'get_besluiten',
 'get_commissies',
 'get_documenten',
 'get_dossiers',
 'get_fractie_zetels',
 'get_fracties',
 'get_geschenken',
 'get_item',
 'get_items',
 'get_kamervragen',
 'get_personen',
 'get_reizen',
 'get_related',
 'get_stemmingen',
 'get_vergaderingen',
 'get_verslagen',
 'get_verslagen_van_algemeen_overleg',
 'get_zaken']

In [4]:
all_dossiers = api.get_dossiers( ) # a few thousand of them, fetching takes a handful of seconds
len(all_dossiers)

#If you only wanted a specific dossier, use something like:
#from tkapi.dossier import Dossier
#dossier_filter = Dossier.create_filter()
#dossier_filter.filter_nummer('35302')
#dossiers = api.get_dossiers( dossier_filter )

6934

In [76]:
# tkapi knows about properties, and relationships to other objects, 
# and allows you to fetch them via attributes and functions it puts on each object  (for the programmers: it's much like an ORM)
# the following (while not a clean list) but helps illistrate that point, in particular see documenten, zaken (which are its relations mentioned in the diagram from ealier)
list( name  for name in dir(all_dossiers[0])  if not name.startswith('_'))

['afgesloten',
 'begin_date_key',
 'create_filter',
 'documenten',
 'end_date_key',
 'expand_params',
 'filter_param',
 'get_date_from_datetime_or_none',
 'get_date_or_none',
 'get_datetime_or_none',
 'get_param_expand',
 'get_params_default',
 'get_property_enum_or_none',
 'get_property_or_empty_string',
 'get_property_or_none',
 'get_resource_url_or_none',
 'get_year_or_none',
 'gewijzigd_op',
 'id',
 'nummer',
 'orderby_param',
 'organisatie',
 'print_json',
 'related_item',
 'related_items',
 'related_items_deep',
 'titel',
 'toevoeging',
 'type',
 'url',
 'zaken']

In [7]:
# Let's get a basic summary of dossiers.
#    we find out apparently  dossier nummers  are not unique without the  toevoeging

# fetch all, sort by dossier number (and toevoeging, which requires minor syntax-fu right now)
#all_dossiers = api.get_dossiers() # we just did this above
some_dossiers = random.sample( all_dossiers, 50)
sorted_dossiers = sorted(  some_dossiers,   key=lambda dossier:str(dossier.nummer)+(dossier.toevoeging or '')  )

i = 0
for dossier in sorted_dossiers:
    print('\n\n')

    # you could e.g. figure out other zaken that refer to the same documents
    #zaaknrs = set()
    #for zaak in dossier.zaken:
    #    zaaknrs.add( zaak.nummer ) # zaak.onderwerp)

    nummer_and_toevoeging = ('%s-%s'%(dossier.nummer, dossier.toevoeging or '')).rstrip('-')
    print( f"== Dossier {nummer_and_toevoeging} == {dossier.titel} ==" )
    #print( '  ',dossier.url.replace(')','%29') ) # the replace is to make the notebook's url include the final bracket

    # It seems that many documenten have a related zaal, but it's not one-to-one;  TODO: 
    for zaak in dossier.zaken:
        print(f'   ZAAK      {zaak.nummer}  {str(zaak.soort).split(".",1)[1]:18s} {zaak.onderwerp}')
        #print('         ',zaak.url)
    
    for document in sorted(dossier.documenten, key=lambda doc:doc.volgnummer):
        print( f'   DOC #{str(document.volgnummer):3s}  {str(document.nummer):10s} - {str(document.datum):12s} - {document.onderwerp:100s} - {document.bestand_url:30s}')
        
        #for zaak in document.zaken:
        #    print(f'     DOCZAAK      {zaak.nummer}  {str(zaak.soort).split(".",1)[1]:18s} {zaak.onderwerp}')

# short examples include 
#  34865-(R2097)
#  33805-VIII
#  35169






== Dossier 26488 == Programma doorontwikkeling F-35 ==
   ZAAK      2009Z02151  BRIEF_REGERING     Geluidsgegevens kandidaat-toestellen Vervanging F-16
   ZAAK      2016Z23302  BRIEF_REGERING     Informatie over de financiële gegevens Block Buy
   ZAAK      2014Z07871  BRIEF_REGERING     Reactie op berichten in de media over het naar Nederland halen van een F-35 testtoestel
   ZAAK      2008Z09934  BRIEF_REGERING     Toezending rapporten actualisering kandidatenvergelijking project vervanging F-16
   ZAAK      2014Z08304  BRIEF_REGERING     Informatievoorziening project Vervanging F-16
   ZAAK      2011Z11048  BRIEF_REGERING     Lijst van vragen en antwoorden over de rapportage van het project Vervanging F-16 over het jaar 2010
   ZAAK      2010Z08586  BRIEF_REGERING     M6.5-softwaremodificatie voor de F-16
   ZAAK      2009Z04569  BRIEF_REGERING     Regeerakkoord en herijking businesscase
   ZAAK      2008Z03307  BRIEF_REGERING     Rafale en Eurofighter
   ZAAK      2020Z01791  BR

In [8]:
# Out of interest, and for a short example, look for larger dossiers.  Many of them will be more thematic.
# (the implied fetches should take 15 mins)
for dossier in sorted( all_dossiers, key=lambda d:d.nummer ): # sort to show the same-nummer-different-toevoeging cases together
    if len(dossier.documenten) > 100:
        toe_s = dossier.toevoeging is not None  and  '-%s'%dossier.toevoeging  or  '   ' # apologies for the syntax-fu
        print( "%5s%-5s  is a large dossier with %4d documents, titled:  %s" %(
                dossier.nummer, toe_s, len(dossier.documenten), dossier.titel))

17050       is a large dossier with  253 documents, titled:  Misbruik en oneigenlijk gebruik op het gebied van belastingen, sociale zekerheid en subsidies
19637       is a large dossier with 2001 documents, titled:  Vreemdelingenbeleid
20454       is a large dossier with  112 documents, titled:  Voortgangsrapportage uitvoering wetten oorlogsgetroffenen
21501-08    is a large dossier with  655 documents, titled:  Milieuraad
21501-20    is a large dossier with 1649 documents, titled:  Europese Raad
21501-02    is a large dossier with 2024 documents, titled:  Raad Algemene Zaken en Raad Buitenlandse Zaken
21501-32    is a large dossier with 1341 documents, titled:  Landbouw- en Visserijraad
21501-33    is a large dossier with  887 documents, titled:  Raad voor Vervoer, Telecommunicatie en Energie
21501-34    is a large dossier with  313 documents, titled:  Raad voor Onderwijs, Jeugd, Cultuur en Sport
21501-31    is a large dossier with  590 documents, titled:  Raad voor de Werkgelegenheid

In [9]:
# Out of a different interest, let's get a count of the document types
#   (the implied fetching will take order of fifteen minutes)
soorten = collections.defaultdict(list)

for dossier in sorted( all_dossiers, key=lambda d:d.nummer ):
    for document in sorted( dossier.documenten, key=lambda doc:doc.volgnummer ):
        try:
            soorten[ str(document.soort).split(".",1)[1] ].append( document ) # takes the enum name (rather than value) and splits off the DocumentSoort.
        except Exception as e:
            print("SKIP invalid soort: %s"%(e))

In [10]:
# key a list of (soort,count), sort it by count, descending
soorten_by_count = sorted( list( soorten.items() ), key=lambda pair: len(pair[1]), reverse=True )

for soort, doclist in soorten_by_count:
    print( f'{len(doclist):7d}  {soort}')

  72448  BRIEF_REGERING
  48694  MOTIE
   8154  AMENDEMENT
   5175  MOTIE_GEWIJZIGDNADER
   5037  VERSLAG_VAN_EEN_ALGEMEEN_OVERLEG
   4570  AMENDEMENT_GEWIJZIGD_NADER_VERVANGEND
   4118  VERSLAG_VAN_EEN_SCHRIFTELIJK_OVERLEG
   3912  VOORSTEL_VAN_WET
   3911  MEMORIE_VAN_TOELICHTING
   3177  VERSLAG_INITIATIEFWETSVOORSTEL_NADER
   2888  LIJST_VAN_VRAGEN_EN_ANTWOORDEN
   2474  NOTA_NAV_HET_NADERTWEEDE_NADERENZ_VERSLAG
   2370  NOTA_VAN_WIJZIGING
   2278  KONINKLIJKE_BOODSCHAP
   2130  ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_NADER_RAPPORT
   1163  VERSLAG_HOUDENDE_EEN_LIJST_VAN_VRAGEN_EN_ANTWOORDEN
    902  BRIEF_COMMISSIE
    873  BRIEF_ALGEMENE_REKENKAMER
    756  VERSLAG_VAN_EEN_COMMISSIEDEBAT
    612  VERSLAG_VAN_EEN_WETGEVINGSOVERLEG
    522  GELEIDENDE_BRIEF
    437  BRIEF_LID__FRACTIE
    416  MEMORIE_VAN_TOELICHTING_INITIATIEFVOORSTEL
    391  VOORSTEL_VAN_WET_INITIATIEFVOORSTEL
    379  VERSLAG_COMMISSIE_VERZOEKSCHRIFTEN_EN_DE_BURGERINITIATIEVEN
    362  JAARVERSLAG
    346 

## Example

Let's say that our interest is more specific:
finding what  Raad van State  has to say about  proposed laws (wetsvoorstellen).

...and, in the process also learn what the kinds of documents there are in each dossier.
 
There is also the [advice on the raad van state site](https://www.raadvanstate.nl/adviezen/),
(for a more data-like form, see also our [extras_datacollect_raadvanstate](extras_datacollect_raadvanstate.ipynb)),
but there it is not placed in the context of the law it's referring to.
This interface should at least gives us the law's name.

In [11]:
# We start by selecting dossiers where there already _is_ RvS advice.
#  - this is a decent filter for wetsvoorstellen
#  - and filters out wetsvoorstellen that don't need this advice (e.g. begroting)
# ...but we are about to find out
# - there are other things that RVS advises on, like finances (see e.g. 36200) 
# - there are law changes that RVS does not advise on (e.g. TODO)

sorted_dossiers = sorted(all_dossiers,  key=lambda d:d.nummer,  reverse=True )

count = 0
for i, dossier in enumerate( sorted_dossiers ):
    nummer_and_toevoeging = ('%s-%s'%(dossier.nummer, dossier.toevoeging or '')).rstrip('-')

    #if (dossier.nummer%100) == 0: # ignore a few specific special cases for now,   just because they're large to print
    #    continue

    ## In our stated interest:  first see if it has RvS advice
    sorted_docs      = sorted(dossier.documenten,  key=lambda d:d.volgnummer )
    has_raadvanstate = False
    for document in sorted_docs:
        try:
            # these come from an enum, try  list( tkapi.document.DocumentSoort )  to see a list
            if document.soort in (DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE, 
                                  #DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_NADER_RAPPORT, # seems to be begrotingstuff?  (TODO: check)
                                  DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_REACTIE_VAN_DE_INITIATIEFNEMERS,
                                ):
                has_raadvanstate = True
        except ValueError: # there's some invalid / non-covered soort values in the data
            pass # ignore
        # we can filter on more, but we may not need to?

    if not has_raadvanstate:
        continue
    # if execution gets here, it's probably interesting to us.
    
    count += 1
    #if len(sorted_docs)>500:
    #    print( "\n\n== Dossier %s == %s =="%( dossier.nummer, dossier.titel) )
    #    print(' LARGE: %d documents'%len(sorted_docs))
    #    print(' %s ({{kamerdossier|%d}}'%(dossier.titel, dossier.nummer))
    #    continue

    print( "\n== %r == Dossier %s == %d docs == %s =="%( dossier.id, nummer_and_toevoeging, len(dossier.documenten), dossier.titel) )
    for document in sorted_docs:
        try:
            if 0: # just to make the summaries a little easier to read
                if document.soort in (DocumentSoort.MOTIE, DocumentSoort.AMENDEMENT, DocumentSoort.BRIEF_REGERING, DocumentSoort.VERSLAG_VAN_EEN_ALGEMEEN_OVERLEG,
                                    DocumentSoort.MEMORIE_VAN_TOELICHTING_INITIATIEFVOORSTEL,
                                    ):
                    continue
        except ValueError:
            print( "soort not known by tkapi")
            continue

        try:
            docsoort = document.soort
        except ValueError: # this seems to be internal inconsistency
            continue

        show_all_docs = False
        if show_all_docs or docsoort in (DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE, 
                                DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_REACTIE_VAN_DE_INITIATIEFNEMERS):

            print( '#%s'%(document.volgnummer, ), document.soort.name)
            #print( 'soort', document.soort.name, '(%s)'%document.soort.value )
            print( '  onderwerp    ', document.onderwerp )     # for wetsvoorstel-dossiers, seems to often be the same as soort plus some detail (who a letter is from, who )
            print( '  citeertitel  ', document.titel_citeer ) # for wetsvoorstel-dossiers, this often seems to name the law. Or a related one, see e.g. 36195
            print( '  titel        ', document.titel )              # for wetsvoorstel-dossiers, this seems to often name the law, plus sometimes some reason
            #print( 'versies', document.versies )
            print( '  url          ', document.bestand_url )
            if 0:        # It may be interesting to know the document is part of multiple dossiers and/or multiple zaken
                print( '  zaken         ', document.zaken )
                #nums = document.dossier_nummers
                #nums.pop(dossier.nummer)
                #if len(nums)>0:
                #  print( "  also in dossiers: %s"%nums )

            print()

    #if i > 1000: # show only a bunch, not all
    #    print("break %d"%i)
    #    break
print( 'Interesting cases: %d'%count )


== '6abacd05-acdf-4938-ab33-36f88d8ea469' == Dossier 36468 == 6 docs == Voorstel van wet van het lid Dijk houdende verandering in de Grondwet, strekkende tot opneming van bepalingen inzake het correctief referendum ==
#5 ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_REACTIE_VAN_DE_INITIATIEFNEMERS
  onderwerp     Advies Afdeling advisering Raad van State en Reactie van de initiatiefnemer
  citeertitel   
  titel         Voorstel van wet van het lid Dijk houdende verandering in de Grondwet, strekkende tot opneming van bepalingen inzake het correctief referendum
  url           https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Document(87f1dfa6-1f5a-4d6a-96da-72e52ad0ae45)/TK.DA.GGM.OData.Resource()


== '87a7ff74-bf81-4e71-a69e-2439ce536c2c' == Dossier 36346 == 7 docs == Voorstel van wet van het lid Van Houwelingen betreffende het houden van een raadplegend referendum over het Nederlandse lidmaatschap van de Europese Unie (Wet raadplegend referendum Nederlands EU-lidmaatschap) ==
#4 