# Purpose of this notebook

Figuring out how to get data out of EUR-Lex. 

Currently aimed specifically at the court judgments, and then mainly the text.

<!-- -->

There are [a few different ways to access different parts of EUR-Lex data](https://eur-lex.europa.eu/content/welcome/data-reuse.html),
including a RESTful API, a SOAP API (requires registration), and a SPARQL endpoint.

Probably the most flexible is the SPARQL endpoint,
particularly when looking for specific selections of documents, specific relations, and such.
At the same time, SPARQL presents a bit of a learning curve unless you're already hardcore into RDF.

SPARQL results refer to a work that is mostly the content text as HTML, e.g. http://publications.europa.eu/resource/cellar/1e3100ce-8a71-433a-8135-15f5cc0e927c.0002.02/DOC_1

Actually, the public-facing web page describing the thing (by CELEX), e.g. https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A61996CJ0080
gives even better detail,
- links to the underlying document
- ...for all translated languages
- the text
- more metadata, like classification, related documents

...so for first experiments, and before learning SPARQL, we could read of details from there.
If we do, we still need a source of CELEX identifers to know what to fetch. The SPARQL endpoint is still quite useful for that.

In [1]:
import pprint, time, random

import wetsuite.datacollect.eurlex
import wetsuite.helpers.notebook
import wetsuite.helpers.localdata
import wetsuite.helpers.etree
import wetsuite.helpers.net
import wetsuite.datasets

# Mention what kind of documents there are

And how many of each there are.

TODO: The list ought to be based on something live instead, but this is a decent introduction for now.

In [2]:
doctypes = [
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ABSTRACT_JUR",
				"label":"Abstract"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACKNOWLEDGE_RECP",
				"label":"Acknowledgement receipt"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT",
				"label":"Act"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT_ADOPT_INTERNATION",
				"label":"Acts adopted by bodies created by international agreements"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT_DRAFT",
				"label":"Draft act"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT_LEGIS",
				"label":"Legislative acts"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT_LEGIS_NO",
				"label":"Non-legislative acts"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT_OTHER",
				"label":"Other acts"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ACT_PREP",
				"label":"Preparatory act - (pl. Preparatory acts)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ADD",
				"label":"Addendum - (pl. Addenda)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ADOPT_TEXT",
				"label":"Texts adopted"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE",
				"label":"Agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_ADDIT",
				"label":"Additional agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_AMEND",
				"label":"Amendment to an agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_EUMS",
				"label":"Agreement between Member States - (pl. Agreements between Member States)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_EUMS_DRAFT",
				"label":"Draft agreement between Member States"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_INTERINSTIT",
				"label":"Interinstitutional agreement - (pl. Interinstitutional agreements)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_INTERINSTIT_DRAFT",
				"label":"Draft interinstitutional agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_INTERNAL",
				"label":"Internal agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_INTERNATION",
				"label":"International agreement - (pl. International agreements)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_INTERNATION_DRAFT",
				"label":"Draft international agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_PROT",
				"label":"Protocol to the agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_UBEREINKOM",
				"label":"Agreement between Member States - (pl. Agreements between Member States)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_UBEREINKOM_DRAFT",
				"label":"Draft agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AGREE_VEREINB",
				"label":"Agreement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AID_STATE",
				"label":"State aid"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AMEND_PROP",
				"label":"Amended proposal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AMEND_PROP_DEC",
				"label":"Amended proposal for a decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AMEND_PROP_DIR",
				"label":"Amended proposal for a directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/AMEND_PROP_REG",
				"label":"Amended proposal for a regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ANNEX",
				"label":"Annex - (pl. Annexes)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ANNEX_SUM",
				"label":"Annex summary"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ANNOUNC",
				"label":"Announcements"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ARCHIVE_FILE",
				"label":"Archive file"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ARRANG",
				"label":"Arrangement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ASSENT",
				"label":"Assent"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ASSENT_REQ",
				"label":"Request for assent"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BALANCE",
				"label":"Balance"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BUDGET",
				"label":"Budget - (pl. Budgets)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BUDGET_DRAFT",
				"label":"Draft budget"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BUDGET_DRAFT_PRELIM",
				"label":"Preliminary draft budget"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BUDGET_DRAFT_PRELIM_SUPPL",
				"label":"Preliminary draft supplementary budget"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BUDGET_DRAFT_SUPPL_AMEND",
				"label":"Draft supplementary and amending budget"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/BUDGET_SUPPL_AMEND",
				"label":"Supplementary and amending budget"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CALL_EXPR_INTEREST",
				"label":"Call for expression of interest"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CALL_PROP",
				"label":"Call for proposals"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CASE",
				"label":"Case"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CASE_LAW",
				"label":"Reports of cases"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CENSURE",
				"label":"Censure"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CENSURE_RJ",
				"label":"Rejected motion of censure"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CENSURE_WDW",
				"label":"Withdrawn motion of censure"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CITIZ_SUM",
				"label":"Citizens’ summary"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMMUNIC",
				"label":"Communication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMMUNIC_DRAFT",
				"label":"Draft communication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMMUNIC_POSIT",
				"label":"Communication concerning the position of the Council"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMPOS",
				"label":"Common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMPOS_ACCEPT",
				"label":"Acceptance of common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMPOS_AMEND",
				"label":"Amendment to common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMPOS_DRAFT",
				"label":"Draft common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMPOS_RJ",
				"label":"Rejection of the common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/COMPOS_RJ_CONFIRM",
				"label":"Confirmation of the rejection of the common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CONCL",
				"label":"Conclusions"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CONS_TEXT",
				"label":"Consolidated text"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CONVENTION",
				"label":"Convention"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM",
				"label":"Corrigendum - (pl. Corrigenda)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DATPRO",
				"label":"Provisional data"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC",
				"label":"Decision - (pl. Decisions)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DECLAR",
				"label":"Declaration"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DECLAR_DRAFT",
				"label":"Draft declaration"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_DEL",
				"label":"Delegated decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_DEL_DRAFT",
				"label":"Draft delegated decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_DRAFT",
				"label":"Draft decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_ENTSCHEID",
				"label":"Decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_FRAMW",
				"label":"Framework decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_FRAMW_DRAFT",
				"label":"Draft framework decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_IMPL",
				"label":"Implementing decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_IMPL_DRAFT",
				"label":"Draft implementing decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_NC",
				"label":"Decision by national courts in the field of European Union law"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DEC_REVIEW",
				"label":"Decision to review"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DIR",
				"label":"Directive - (pl. Directives)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DIR_DEL",
				"label":"Delegated directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DIR_DEL_DRAFT",
				"label":"Draft delegated directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DIR_DRAFT",
				"label":"Draft directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL",
				"label":"Implementing directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL_DRAFT",
				"label":"Draft implementing directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/EXCH_LET",
				"label":"Exchange of letters"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/EXCH_RATE",
				"label":"Exchange rate"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/EXCH_RATE_MRO",
				"label":"Interest rate — euro exchange rates"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_OJ_TOC",
				"label":"Request form for the table of contents of the Official Journal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_PPF",
				"label":"Form for print production file"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_REQ_PRE_PRESS",
				"label":"Request form for pre-press of publication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_REQ_PUB",
				"label":"Request form for publication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_REQ_PUB_ACCEPT",
				"label":"Request form for the acceptance of publication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_REQ_PUB_REFUS",
				"label":"Request form for the refusal of publication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/FORM_REQ_READY_PRESS",
				"label":"Request form for the ready-for-press of publication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/GARNISHEE_ORDER",
				"label":"Attachment order"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/GLOS_TERM",
				"label":"Glossary term"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/GUIDELINE",
				"label":"Guideline - (pl. Guidelines)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/GUIDELINE_GEN",
				"label":"General guidelines"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/GUIDELINE_LIGNES",
				"label":"Guideline"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/IMPACT_ASSESS",
				"label":"Impact assessment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/IMPACT_ASSESS_INCEP",
				"label":"Inception impact assessment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/IMPACT_ASSESS_SUM",
				"label":"Summary of impact assessment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO",
				"label":"Information"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO_COMMUNIC",
				"label":"Information"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO_GEN",
				"label":"General information - (pl. General informations)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO_JUDICIAL",
				"label":"Judicial information"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO_JUR",
				"label":"Information"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO_SPECIAL",
				"label":"Special information"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INFO_SUPPL",
				"label":"Supplementary information"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INI",
				"label":"Initiative"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/INTRO_TEXT",
				"label":"Introductory text"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ITEM_A_LIST",
				"label":"List of ‘A’ items"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ITEM_A_LIST_ADP",
				"label":"List of adopted ‘A’ items"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ITEM_A_LIST_PROVIS",
				"label":"Provisional list of ‘A’ items"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ITEM_A_NOTE",
				"label":"‘A’ item note"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ITEM_IA_NOTE",
				"label":"‘I/A’ item note"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ITEM_I_NOTE",
				"label":"‘I’ item note"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_ACTION",
				"label":"Joint action"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_ACTION_DRAFT",
				"label":"Draft joint action"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_COMMUNIC",
				"label":"Joint communication"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_DEC",
				"label":"Joint decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_IMPACT_ASSESS",
				"label":"Joint impact assessment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_IMPACT_ASSESS_SUM",
				"label":"Joint impact assessment summary"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_PAPER_GREEN",
				"label":"Joint Green Paper"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_PAPER_WHITE",
				"label":"Joint White Paper"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_PROP_DEC",
				"label":"Joint proposal for a decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_PROP_DIR",
				"label":"Joint proposal for a directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_PROP_REG",
				"label":"Joint proposal for a regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_REPORT",
				"label":"Joint report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_SWD",
				"label":"Joint staff working document"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_TEXT",
				"label":"Joint text"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_TEXT_APR",
				"label":"Approved joint text"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JOINT_TEXT_RJ",
				"label":"Rejected joint text"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JUDG",
				"label":"Judgment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/JUDG_EXTRACT",
				"label":"Judgment (extracts)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/LEGIS_SUM",
				"label":"Summaries of EU legislation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/LET",
				"label":"Letter"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/LET_AMEND",
				"label":"Letter of amendment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/LIST",
				"label":"List"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/MEAS_NATION_IMPL",
				"label":"National implementing measures"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/",
				"label":"National transposition measures"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/MEMORANDUM",
				"label":"Memorandum"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/MEMORANDUM_UNDERST",
				"label":"Memorandum of understanding"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/MINUTES",
				"label":"Minutes"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/MINUTES_DRAFT",
				"label":"Draft minutes"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTE",
				"label":"Note"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTE_COVER",
				"label":"Cover note"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTE_INFO",
				"label":"Information note"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTICE",
				"label":"Notice"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTICE_AWARD",
				"label":"Award notice"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTICE_CONTRACT",
				"label":"Contract notice"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTICE_INFO",
				"label":"Notice - (pl. Notices)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTICE_MODEL_PP",
				"label":"Model notice for public procurement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTICE_READER",
				"label":"Notice to readers"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/NOTIF",
				"label":"Notification"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OJ",
				"label":"Official Journal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OJ_SPECIAL",
				"label":"Special edition of the Official Journal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN",
				"label":"Opinion - (pl. Opinions)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_ADDIT",
				"label":"Additional opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_AG",
				"label":"Opinion of the Advocate General"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_AMEND_EP",
				"label":"Opinion on the European Parliament’s amendments"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_CASE",
				"label":"Advocate general’s opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_DISPUT_LB",
				"label":"Opinion disputing a legal base"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_EXPLOR",
				"label":"Exploratory opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_IMPACT_ASSESS",
				"label":"Opinion on impact assessment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_JUR",
				"label":"Opinion of the Court"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_NO_PROP_AMEND",
				"label":"Opinion not proposing amendment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_NO_REPORT",
				"label":"Opinion without report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_PROP_AMEND",
				"label":"Opinion proposing amendment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_PROP_RJ",
				"label":"Opinion proposing rejection"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OPIN_QUAL_APR",
				"label":"Opinion with qualified approval"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ORDER",
				"label":"Order"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ORDER_EXTRACT",
				"label":"Order (extracts)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OWNINI_OPIN",
				"label":"Own-initiative opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OWNINI_OPIN_ADDIT",
				"label":"Additional own-initiative opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OWNINI_REPORT",
				"label":"Own-initiative report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/OWNINI_RES",
				"label":"Own-initiative resolution"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PAPER_GREEN",
				"label":"Green Paper"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PAPER_REFLEC",
				"label":"Reflection paper"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PAPER_WHITE",
				"label":"White Paper"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PETITION",
				"label":"Petition"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PLANNING_DOC",
				"label":"Planning document"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/POSIT",
				"label":"Position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/POSIT_ACCEPT",
				"label":"Acceptance of position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/POSIT_AMEND",
				"label":"Amendment to position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/POSIT_RJ",
				"label":"Rejection of position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/POSIT_RJ_CONF",
				"label":"Confirmation of the rejection of the position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PRESS_REL",
				"label":"Press release"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROCEED_MINUTES",
				"label":"Minutes of proceeding"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROC_ADMIN",
				"label":"Administrative procedures"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROC_INTERNAL",
				"label":"Rules of procedure"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROC_JURIS",
				"label":"Jurisdictional procedure - (pl. Jurisdictional procedures)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROC_RULES",
				"label":"Rules of procedure"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROGRAM",
				"label":"Programme"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_ACT",
				"label":"Proposal for an act"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_AMEND",
				"label":"Amendments to the proposal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_COMPOS",
				"label":"Proposal for a common position"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DEC",
				"label":"Proposal for a decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DECLAR",
				"label":"Proposal for a declaration"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DEC_FRAMW",
				"label":"Proposal for a framework decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DEC_IMPL",
				"label":"Proposal for an implementing decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DEC_NO_ADDRESSEE",
				"label":"Proposal for a decision without addressee"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DIR",
				"label":"Proposal for a directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DIR_IMPL",
				"label":"Proposal for an implementing directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_DRAFT",
				"label":"Draft proposal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_JOINT_ACTION",
				"label":"Proposal for a joint action"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_OPIN",
				"label":"Proposal for an opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_RECO",
				"label":"Proposal for a recommendation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_REG",
				"label":"Proposal for a regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_REG_IMPL",
				"label":"Proposal for an implementing regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROP_RES",
				"label":"Proposal for a resolution"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROT",
				"label":"Protocol"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PROT_DRAFT",
				"label":"Draft protocol"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/PUB_GEN",
				"label":"General publications"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/QUEST_ORAL",
				"label":"Oral question - (pl. Oral questions)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/QUEST_TIME",
				"label":"Question at question time - (pl. Questions at question time)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/QUEST_WRITTEN",
				"label":"Written question - (pl. Written questions)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO",
				"label":"Recommendation - (pl. Recommendations)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_DEC",
				"label":"Recommendation for a decision"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_DIR",
				"label":"Recommendation for a directive"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_DRAFT",
				"label":"Draft recommendation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_OPIN",
				"label":"Recommendation for an opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_RECO",
				"label":"Recommendation for a recommendation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_REG",
				"label":"Recommendation for a regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECO_RES",
				"label":"Recommendation for a resolution"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RECRUIT",
				"label":"Recruitment"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REFERRAL_LET",
				"label":"Referral letter"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG",
				"label":"Regulation - (pl. Regulations)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG_DEL",
				"label":"Delegated regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG_DEL_DRAFT",
				"label":"Draft delegated regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG_DRAFT",
				"label":"Draft regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG_FINANC",
				"label":"Financial regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG_IMPL",
				"label":"Implementing regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REG_IMPL_DRAFT",
				"label":"Draft implementing regulation"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPLY_OPIN_NP",
				"label":"Reply to national Parliaments’ opinion"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT",
				"label":"Report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_AAR",
				"label":"Annual activity report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_ANNUAL",
				"label":"Annual report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_ANNUAL_BUDGET",
				"label":"Annual report on the EU general budget"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_ANNUAL_DAS",
				"label":"Annual report with the statement of assurance"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_ANNUAL_EDF",
				"label":"Annual report on the activities funded by the European Development Fund"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_ANNUAL_SPECIF",
				"label":"Specific annual report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_PERIOD",
				"label":"Periodic report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_SEMEST",
				"label":"Six-monthly report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_SPECIAL",
				"label":"Special report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_SPECIF",
				"label":"Specific report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/REPORT_VALID",
				"label":"Validation report"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RES",
				"label":"Resolution - (pl. Resolutions)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RES_DRAFT",
				"label":"Draft resolution"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RES_LEGIS",
				"label":"Legislative resolution"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ROADMAP",
				"label":"Roadmap"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/ROADMAP_EFC",
				"label":"Evaluation and Fitness Check Roadmap"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/RULING",
				"label":"Ruling"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/SIGN_OJ_ATC",
				"label":"Signature file for the authentic Official Journal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STAT",
				"label":"Statement"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STATUTE",
				"label":"Statute"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STAT_REASON",
				"label":"Statement of reasons"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STAT_REASON_DRAFT",
				"label":"Draft statement of reasons"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STRATEGY_COMMON",
				"label":"Common strategy"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STU",
				"label":"Study"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STU_ANNEX",
				"label":"Annex to a study"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STU_EVL",
				"label":"Evaluation study"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STU_EVL_ANNEX",
				"label":"Annex to an evaluation study"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STU_EVL_SUM_EXE",
				"label":"Executive summary of an evaluation study"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/STU_SUM_EXE",
				"label":"Executive summary of a study"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/SUM",
				"label":"Summary"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/SUM_JUR",
				"label":"Summary"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/SWD",
				"label":"Staff working document"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/THIRDPARTY_PROCEED",
				"label":"Third-party proceedings"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/TOC_OJ",
				"label":"Contents"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/TOC_OJ_SPECIAL",
				"label":"Contents of the Special edition of the Official Journal"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/TRANSFER_APPROPR",
				"label":"Transfer of appropriations"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/TREATY",
				"label":"Treaty - (pl. Treaties)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/TREATY_DRAFT",
				"label":"Draft treaty"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/VIEW",
				"label":"View"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/VIEWPOINT",
				"label":"Viewpoint"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/VIEW_AG",
				"label":"View of the Advocate General"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/WARN",
				"label":"Warning - (pl. Warnings)"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/WEBPAGE",
				"label":"Webpage"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/WEBSITE",
				"label":"Website"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/WORK_BUNDLE",
				"label":"Work bundle"},
				{"value":"http://publications.europa.eu/resource/authority/resource-type/WORK_DOC",
				"label":"Working document"}
			]


In [3]:
for cd in doctypes:
    value = cd.get('value')
    typ = value.rsplit('/',1)[1]
    label = cd.get('label')

    ret = wetsuite.datacollect.eurlex.fetch_by_resource_type( typ )

    print( "%6s   %-30s   %s"%(
       wetsuite.helpers.format.kmgtp( len(ret.get('results').get('bindings')) ),
       typ,
       label,
    ) )
    time.sleep(1)

    #@ ~23K JUDG

  1.3K   ABSTRACT_JUR                     Abstract
     2   ACKNOWLEDGE_RECP                 Acknowledgement receipt
     9   ACT                              Act
     0   ACT_ADOPT_INTERNATION            Acts adopted by bodies created by international agreements
    43   ACT_DRAFT                        Draft act
  3.8K   ACT_LEGIS                        Legislative acts
     0   ACT_LEGIS_NO                     Non-legislative acts
   364   ACT_OTHER                        Other acts
    2K   ACT_PREP                         Preparatory act - (pl. Preparatory acts)
    21   ADD                              Addendum - (pl. Addenda)
    2K   ADOPT_TEXT                       Texts adopted
    17   AGREE                            Agreement
     0   AGREE_ADDIT                      Additional agreement
    62   AGREE_AMEND                      Amendment to an agreement
     5   AGREE_EUMS                       Agreement between Member States - (pl. Agreements between Member States)
     

# JUDGments fetch

## Fetch judgment identifiers

As a first step, we can ask the API for all CELEX identifiers of a document type, via `wetsuite.datacollect.eurlex.fetch_by_resource_type()`.

In [2]:
# first we figure out the CELEX identifiers that exist for this type  (also the workid they point to, though we don't use that right now)
judg_celexes = wetsuite.helpers.localdata.LocalKV('eurlex_judg_celex_workid.db', key_type=str,value_type=str)   # stores CELEX -> work id       (mostly just for the CELEX)

# later we will also fetch the documents for them, into:
judg_docs_nl = wetsuite.helpers.localdata.LocalKV('eurlex_judg_nl.db', key_type=str,value_type=bytes)           # stores url -> html document
#judg_docs_en = wetsuite.helpers.localdata.LocalKV('eurlex_judg_en.db', key_type=str,value_type=bytes)          # not currently used, avoid a bunch of fetching by commenting out

In [3]:
# Fetch the CELEX identifiers for all JUDGments, update our knowledge of them.
# Note that as of this writing there are 20K+ results  (which we will soon learn refers to roughly 4GB worth of HTML content)
judg_dict = wetsuite.datacollect.eurlex.fetch_by_resource_type('JUDG') 
for work in judg_dict['results']['bindings']:
    try:
        celex  = work['celex']['value']
        workid = work['work']['value']
        judg_celexes.put(celex, workid)
    except KeyError as ke:
        print( 'missing %s: %s'%(str(ke), work) )

# describe how many CELEXes we have now
judg_celexes.summary(True)

{'size_bytes': 3014656,
 'size_readable': '2.9MiB',
 'num_items': 23998,
 'avgsize_bytes': 126,
 'avgsize_readable': '126B'}

## Fetch the according content - judgment documents

In [4]:
# Fetch the web pages for all those CELEXes, for one or more languages
pbar = wetsuite.helpers.notebook.progress_bar( len(judg_celexes), description='fetching pages...') # create progress bar reference so that during the progress we can update description...
count_cached, count_fetched = 0, 0                                                                 # ...with these counts

for celex in judg_celexes:
    # the /ALL/ page gives more metadata than e.g. AUTO, TXT, though we might be interested in fetching specific-language 
    for lang, store, url in (
        ('nl', judg_docs_nl, 'https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:%s'%celex),
        #('en', judg_docs_en, 'https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:%s'%celex),
    ):
        try:
            _, was_cached = wetsuite.helpers.localdata.cached_fetch( store, url )
            if was_cached:
                count_cached += 1
            else:
                count_fetched += 1
                time.sleep( 2 ) # some backoff to be nicer to the servers
        except Exception as e:
            # it seems the server will report overloads as 404, so running it another time should work.
            print( f'{str(e)} for {url}' )
            time.sleep( 10 ) # more backoff to be nicer to the servers
    
    pbar.value += 1
    pbar.description = f'{count_fetched} fetched, {count_cached} cached'

# describe and how many documents we have now
display( judg_docs_nl.summary(True) )

fetching pages...:   0%|          | 0/23998 [00:00<?, ?it/s]

Response not OK, status=404 for url='https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:62006TJ0060' for https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:62006TJ0060


{'size_bytes': 4291088384,
 'size_readable': '4GiB',
 'num_items': 24127,
 'avgsize_bytes': 177854,
 'avgsize_readable': '174KiB'}

# REGulation fetch

Same idea as above, for REGulations.

In [25]:
# for comments, see above
reg_celexes = wetsuite.helpers.localdata.LocalKV('eurlex_reg_celex_workid.db', key_type=str,value_type=str)   
reg_docs_nl = wetsuite.helpers.localdata.LocalKV('eurlex_reg_nl.db', key_type=str,value_type=bytes)           
#reg_docs_en = wetsuite.helpers.localdata.LocalKV('eurlex_reg_en.db', key_type=str,value_type=bytes)           


# Identifiers
reg_dict = wetsuite.datacollect.eurlex.fetch_by_resource_type('REG') 
for work in reg_dict['results']['bindings']:
    try:
        celex  = work['celex']['value']
        workid = work['work']['value']
        reg_celexes.put(celex, workid)
    except KeyError as ke:
        pass
        #print( 'missing %s: %s'%(str(ke), work) )
display( reg_celexes.summary(True) )


# Documents
pbar = wetsuite.helpers.notebook.progress_bar( len(reg_celexes), description='fetching pages...')
count_cached, count_fetched = 0, 0
for celex in reg_celexes:
    # the /ALL/ page gives more metadata than e.g. AUTO, TXT, though we might be interested in fetching specific-language 
    for lang, store, url in (
        ('nl', reg_docs_nl, 'https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:%s'%celex),
        #('en', reg_docs_en, 'https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:%s'%celex),
    ):
        try:
            _, was_cached = wetsuite.helpers.localdata.cached_fetch( store, url )
            if was_cached:
                count_cached += 1
            else:
                count_fetched += 1
                # it seems the server will report overloads as 404, so running it another time should work, and backoff is nicer to the servers
                time.sleep( 2 )
        except Exception as e:
            print( e, url )
            time.sleep( 10 ) # more backoff to be nicer to the servers

    pbar.value += 1
    pbar.description = f'{count_fetched} fetched, {count_cached} cached'

display( reg_docs_nl.summary(True) )

{'size_bytes': 16121856,
 'size_readable': '15MiB',
 'num_items': 130303,
 'avgsize_bytes': 124,
 'avgsize_readable': '124B'}

fetching pages...:   0%|          | 0/130303 [00:00<?, ?it/s]

{'size_bytes': 22525677568,
 'size_readable': '21GiB',
 'num_items': 130303,
 'avgsize_bytes': 172872,
 'avgsize_readable': '169KiB'}

# DIRectives fetch

In [26]:
dir_celexes = wetsuite.helpers.localdata.LocalKV('eurlex_dir_celex_workid.db', key_type=str,value_type=str)   
dir_docs_nl = wetsuite.helpers.localdata.LocalKV('eurlex_dir_nl.db', key_type=str,value_type=bytes)           
#dir_docs_en = wetsuite.helpers.localdata.LocalKV('eurlex_dir_en.db', key_type=str,value_type=bytes)           


# Identifiers
dir_dict = wetsuite.datacollect.eurlex.fetch_by_resource_type('DIR') 
for work in dir_dict['results']['bindings']:
    try:
        celex  = work['celex']['value']
        workid = work['work']['value']
        dir_celexes.put(celex, workid)
    except KeyError as ke:
        pass
        #print( 'missing %s: %s'%(str(ke), work) )
display( dir_celexes.summary(True) )


# Documents
pbar = wetsuite.helpers.notebook.progress_bar( len(dir_celexes), description='fetching pages...')
count_cached, count_fetched = 0, 0
for celex in dir_celexes:
    # the /ALL/ page gives more metadata than e.g. AUTO, TXT, though we might be interested in fetching specific-language 
    for lang, store, url in (
        ('nl', dir_docs_nl, 'https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:%s'%celex),
        #('en', dir_docs_en, 'https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:%s'%celex),
    ):
        #print(url)
        try:
            _, was_cached = wetsuite.helpers.localdata.cached_fetch( store, url )
            if was_cached:
                count_cached += 1
            else:
                count_fetched += 1
                # it seems the server will report overloads as 404, so running it another time should work, and backoff is nicer to the servers
                time.sleep( 2 )
        except Exception as e:
            print( e, url )
            time.sleep( 10 ) # more backoff to be nicer to the servers
    
    pbar.value += 1
    pbar.description = f'{count_fetched} fetched, {count_cached} cached'

display( dir_docs_nl.summary(True) )

{'size_bytes': 532480,
 'size_readable': '520KiB',
 'num_items': 4142,
 'avgsize_bytes': 129,
 'avgsize_readable': '129B'}

fetching pages...:   0%|          | 0/4142 [00:00<?, ?it/s]

{'size_bytes': 1265901568,
 'size_readable': '1.2GiB',
 'num_items': 4142,
 'avgsize_bytes': 305626,
 'avgsize_readable': '298KiB'}

# Parse documents we fetched

## An aside on metadata

The viewable documents put things into templates in quite a structured way, to the point we can fairly easily use that for metadata-like extraction.

It should be noted that there are also a number of metadata details that is explicitly present as RDFa metadata embedded in these documents (which originate from ELI as a larger project),
which is a more regularized form of certain metadata, and also semantic data.

RDF. 
The `a` on `RDFa` means 'RDF encoded in HTML attributes'. 

The following is **not** the right way to get RDFa out, just an indication indication of what is present:
<!--
In JavaScript
it = document.evaluate('//meta[@about]', document);
node = it.iterateNext(); 
while (node) { 
  console.log( node );
  node = it.iterateNext(); 
}
-->

In [27]:
# applied to a single example case
import bs4

htmlbytes = wetsuite.helpers.net.download( 'https://eur-lex.europa.eu/eli/dir/1965/1/oj' )

soup = bs4.BeautifulSoup( htmlbytes, features='lxml' )
for meta in soup.select('meta'):
    if meta.get('about'):
        print(meta)
        # or a quick and dirty way to make that a little more readable:
        #print( meta.get('about'), meta.get('typeof') or '', meta.get('property') or '', meta.get('datatype') or '', meta.get('resource') or '' )

<meta about="http://data.europa.eu/eli/dir/1965/1/oj" typeof="eli:LegalResource"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:uri_schema" resource="http://data.europa.eu/eli/%7Btypedoc%7D/%7Byear%7D/%7Bnatural_number%7D/oj"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" content="31965L0001" lang="" property="eli:id_local"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:type_document" resource="http://publications.europa.eu/resource/authority/resource-type/DIR"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:passed_by" resource="http://publications.europa.eu/resource/authority/corporate-body/CONSIL"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" content="DG03/F/02" property="eli:responsibility_of"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:is_about" resource="http://eurovoc.europa.eu/1638"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:is_about" resour

## JUDGment test parse

In [22]:
# During debug, we are not storing anything yet,
#   just randomly picking a handful of HTML documents to see whether our extraction is happy

for random_url, random_doc in judg_docs_nl.random_sample( 3 ): 
    try:
        print( random_url )
        parsed = wetsuite.datacollect.eurlex.extract_html(random_doc)  
        pprint.pprint( parsed ) 
    except Exception as e:
        print( random_url )
        raise

https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:62015CJ0070
{'celex': '62015CJ0070',
 'classifications': {'Case law directory code': [['4.06.02.01',
                                                  'Binnenlands beleid van de '
                                                  'Europese Unie',
                                                  '/',
                                                  'Ruimte van vrijheid, '
                                                  'veiligheid en recht',
                                                  '/',
                                                  'Justitiële samenwerking in '
                                                  'burgerlijke zaken',
                                                  '/',
                                                  'Rechterlijke bevoegdheid , '
                                                  'erkenning en '
                                                  'tenuitvoerlegging van '
           

## JUDGment real parse

Did that look decent and not error out?

Then we can probably run it on the whole set, and store the results.

In [23]:
# 25K items, might take half an hour
judg_nl_parsed_store = wetsuite.helpers.localdata.MsgpackKV('eurlex-judg-nl-struc.db', key_type=str)  
if False: # is an effective "please redo everything", e.g. after you've changed wetsuite.datacollect.eurlex.extract_html
    judg_nl_parsed_store.truncate()  
    judg_nl_parsed_store._put_meta('description_short','The dutch translation of metadata and text for European JUDGments in EUR-LEX. (preliminary version)')
    judg_nl_parsed_store._put_meta('description','''

    The dutch translation of metadata and text for European JUDGments in EUR-LEX. (preliminary version)

    For example, https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:62011TJ0385 becomes (with a lot of abbreviating omissions marked with ... ):

    {
    'celex': '62011TJ0385',
    'classifications': {'Case law directory code': [['3.02.03.02',
                                                    'EEG/EG - Procedures voor het Hof * Procedures voor het Hof',
                                                    '/',
                                                    'Beroep tot nietigverklaring',
                                                    '/',
                                                    'Beroep tot nietigverklaring - Beroepen van natuurlijke of rechtspersonen * Beroepen van natuurlijke of rechtspersonen',
                                                    '/',
                                                    'Handelingen die hen individueel raken',
                                                    '3.02.03.03',
                                                    'EEG/EG - Procedures voor het Hof * Procedures voor het Hof',
                                                    '/',
                                                    'Beroep tot nietigverklaring',
                                                    '/',
                                                    'Beroep tot nietigverklaring - Beroepen van natuurlijke of rechtspersonen * Beroepen van natuurlijke of rechtspersonen',
                                                    '/',
                                                    'Handelingen die hen rechtstreeks raken'],
                                                    ...
                                                    ],
                        'Subject matter': ['Dumping',
                                            'Externe betrekkingen',
                                            'Handelspolitiek']
                        },
    'contents': [('BG',  'HTML', 'https://eur-lex.europa.eu/legal-content/BG/TXT/HTML/?uri=CELEX:62011TJ0385'),
                ('ES',  'HTML', 'https://eur-lex.europa.eu/legal-content/ES/TXT/HTML/?uri=CELEX:62011TJ0385'),
                ...
                ],
    'dates': {'Date lodged': '2011-07-21', 'Date of document': '2014-01-16'},
    'doctrine': {'Notes relating to the decision': [
                                '1. Idot, Laurence: Mesures de défense commerciale et procédures de contournement, Europe 2014 Mars Comm. nº 3 p.34 (FR)' ]},
    'ecli': 'ECLI:EU:T:2014:7',
    'linked': {'Case affecting': [('CELEX:32011R0443', 'Confirms 32011R0443'),
                                ('CELEX:32011R0444', 'Confirms 32011R0444')],
                'Instruments cited in case law': [('CELEX:12008E263',  '12008E263-L4: N 65 69 70 73'),
                                                ('CELEX:12008E296',  '12008E296: N 157'),
                                                ('CELEX:31991Q0530', '31991Q0530-A114: N 65'),
                                                ('CELEX:31991Q0530', '31991Q0530-A48P2: N 91'),
                                                ...
                                                ],
                'Link':   'Select all documents mentioning this document',
                'Treaty': 'Verdrag tot oprichting van de Europese Economische Gemeenschap'},
    'misc': {'Authentic language': 'Engels',
            'Author': 'Gerecht',
            'Country or organisation from which the request originates': 'Derde landen',
            'Form': 'Arrest'},
    'proc': {'Applicant': ['Particulier'],
            'Defendant': ['Instellingen, Raad'],
            'Judge-Rapporteur': ['van der Woude'],
            'Type of procedure': ['Beroep tot nietigverklaring - ongegrond']},
    'text': [('',
            ['ARREST VAN HET GERECHT (Vierde kamer)',
                '16\xa0januari 2014\xa0(',
                '*1',
                ')',
                'Dumping — Subsidies — Invoer van biodiesel van oorsprong uit Verenigde Staten — Ontwijking — Artikel\xa013 verordening (EG) '
                ...
                ])],
    'titles': {'englishTitle': 'Judgment of the General Court (Fourth Chamber) of '
                                '16 January 2014. # BP Products North America Inc. '
                                'v Council of the European Union. # Dumping - '
                                'Subsidies - Imports of biodiesel originating in '
                                'the United States - Circumvention - Article 13 of '
                                'Regulation (EC) No 1225/2009 - Article 23 of '
                                'Regulation (EC) No 597/2009 - Slightly modified '
                                'like product - Legal certainty - Misuse of powers '
                                '- Manifest errors of assessment - Obligation to '
                                'state reasons - Equal treatment - Principle of '
                                'sound administration. # Case T-385/11.',
                'originalTitle': 'Arrest van het Gerecht (Vierde kamer) van 16 '
                                'januari 2014.  BP Products North America Inc. '
                                'tegen Raad van de Europese Unie.  Dumping - '
                                'Subsidies - Invoer van biodiesel van oorsprong '
                                'uit Verenigde Staten - Ontwijking - Artikel 13 '
                                'verordening (EG) nr. 1225/2009 - Artikel 23 van '
                                'verordening (EG) nr. 597/2009 - Enigszins '
                                'gewijzigd soortgelijk product - Rechtszekerheid '
                                '- Misbruik van bevoegdheid - Kennelijke '
                                'beoordelingsfouten - Motiveringsplicht - Gelijke '
                                'behandeling - Beginsel van behoorlijk bestuur.  '
                                'Zaak T-385/11.',
                'title': 'Arrest van het Gerecht (Vierde kamer) van 16 januari '
                        '2014.  BP Products North America Inc. tegen Raad van de '
                        'Europese Unie.  Dumping - Subsidies - Invoer van '
                        'biodiesel van oorsprong uit Verenigde Staten - '
                        'Ontwijking - Artikel 13 verordening (EG) nr. 1225/2009 - '
                        'Artikel 23 van verordening (EG) nr. 597/2009 - Enigszins '
                        'gewijzigd soortgelijk product - Rechtszekerheid - '
                        'Misbruik van bevoegdheid - Kennelijke beoordelingsfouten '
                        '- Motiveringsplicht - Gelijke behandeling - Beginsel van '
                        'behoorlijk bestuur.  Zaak T-385/11.'}
    }

    '''+wetsuite.datasets.generated_today_text())

for url, docbytes in wetsuite.helpers.notebook.ProgressBar( judg_docs_nl.items() ):
    if url not in judg_nl_parsed_store:
        try:
            parsed = wetsuite.datacollect.eurlex.extract_html( docbytes )  # that function is where most of the scraping code sits
            judg_nl_parsed_store.put( url, parsed )
        except Exception as e: 
            print( 'ERROR for %r: %s'%( url, e ) )
            raise # during debug we want to fail on everything so we know about all issues

  0%|          | 0/24127 [00:00<?, ?it/s]

## REGulation parse

Basically the same as above (so look for code comments above), but for regulations instead.

TODO: actually finish

In [28]:
# 130K items, might take two hours
reg_nl_parsed_store = wetsuite.helpers.localdata.MsgpackKV('eurlex-reg-nl-struc.db', key_type=str)
if False: # is an effective "please redo everything", e.g. after you've changed wetsuite.datacollect.eurlex.extract_html
    reg_nl_parsed_store.truncate()  
    reg_nl_parsed_store._put_meta('description_short','The dutch translation of metadata and text for European REGulations in EUR-LEX. (preliminary version)')
    reg_nl_parsed_store._put_meta('description','''
                                  
    For example, https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:32008R0114 becomes (with abbreviating omissions marked with ... ):

    {'celex': '32008R0114',
      'titles': {'title': 'Verordening (EG) nr.\xa0114/2008 van de Commissie van 6 februari 2008 tot wijziging van Verordening (EG) nr.\xa0883/2006 houdende uitvoeringsbepalingen van Verordening (EG) nr.\xa01290/2005 van de Raad met betrekking tot het bijhouden van de rekeningen van de betaalorganen, de declaraties van uitgaven en ontvangsten en de voorwaarden voor de vergoeding van uitgaven in het kader van het ELGF en het ELFPO',
      'englishTitle': 'Commission Regulation (EC) No\xa0114/2008 of 6 February 2008 amending Regulation (EC) No\xa0883/2006 laying down detailed rules for the application of Council Regulation (EC) No\xa01290/2005 as regards the keeping of accounts by the paying agencies, declarations of expenditure and revenue and the conditions for reimbursing expenditure under the EAGF and the EAFRD',
      'originalTitle': 'Verordening (EG) nr.\xa0114/2008 van de Commissie van 6 februari 2008 tot wijziging van Verordening (EG) nr.\xa0883/2006 houdende uitvoeringsbepalingen van Verordening (EG) nr.\xa01290/2005 van de Raad met betrekking tot het bijhouden van de rekeningen van de betaalorganen, de declaraties van uitgaven en ontvangsten en de voorwaarden voor de vergoeding van uitgaven in het kader van het ELGF en het ELFPO'},
      'dates': {'Date of document': '2008-02-06',
      'Date of effect': '2008-02-08',
      'Date of end of validity': '2014-09-03'},
      'misc': {'Author': 'Europese Commissie', 'Form': 'Verordening'},
      'proc': {},
      'linked': {'Treaty': 'Verdrag tot oprichting van de Europese Gemeenschap',
      'Legal basis': [['CELEX:32005R1290', '32005R1290 - A42P7']],
      'Link': '',
      'Modifies': 'Relation\nAct\nComment\nSubdivision concerned\nFrom\nTo\n\n\n\n\nModifies \n\n32006R0883\n\n  wijziging\n artikel 16.2\n08/02/2008',
      'Modified by': 'Relation\nAct\nComment\nSubdivision concerned\nFrom\nTo\n\n\n\n\nRepealed by\n\n32014R0907',
      'Instruments cited': [['CELEX:32002R1605', '32002R1605']]},
      'doctrine': {},
      'classifications': {'EUROVOC descriptor': ['financiering van de EU',
        'EU-fonds',
        'beheer',
        'inkomsten',
        'landbouwuitgaven',
        'afwijking van het EU-recht',
        'duurzame ontwikkeling'],
      'Subject matter': ['Coördinatie der structurele instrumenten',
        'Economische, sociale en territoriale samenhang'],
      'Directory code': ['03.20.10.00 \nLandbouw\n / \nStructuurfondsen voor de landbouw\n / \nAlgemeen',
        '14.50.00.00 \nRegionaal beleid en coördinatie van structurele middelen\n / \nCoördinatie van structurele middelen']},
      'contents': [ ['BG',  'HTML',  'https://eur-lex.europa.eu/legal-content/BG/TXT/HTML/?uri=CELEX:32008R0114'],
                    ['ES',  'HTML',  'https://eur-lex.europa.eu/legal-content/ES/TXT/HTML/?uri=CELEX:32008R0114'],
                    ['CS',  'HTML',  'https://eur-lex.europa.eu/legal-content/CS/TXT/HTML/?uri=CELEX:32008R0114'],
                    ...
                  ],
      'text': [['',
        ['7.2.2008\xa0\xa0\xa0',
        'NL',
        'Publicatieblad van de Europese Unie',
        'L 33/6',
        'VERORDENING',
        ' (EG) ',
        'N',
        'r. 114/2008 ',
        'VAN DE COMMISSIE',
        'van 6 februari 2008',
        'tot wijziging van Verordening (EG) nr. 883/2006 houdende uitvoeringsbepalingen van Verordening (EG) nr. 1290/2005 van de Raad met betrekking tot het bijhouden van de rekeningen van de betaalorganen, de declaraties van uitgaven en ontvangsten en de voorwaarden voor de vergoeding van uitgaven in het kader van het ELGF en het ELFPO',
        'DE COMMISSIE VAN DE EUROPESE GEMEENSCHAPPEN,',
        'Gelet op het Verdrag tot oprichting van de Europese Gemeenschap,',
        ...
        ]]]} 
    
    '''+wetsuite.datasets.generated_today_text())

for url, docbytes in wetsuite.helpers.notebook.ProgressBar( reg_docs_nl.items() ):
    if url not in reg_nl_parsed_store:
        try:
            parsed = wetsuite.datacollect.eurlex.extract_html( docbytes )  # that function is where most of the scraping code sits
            reg_nl_parsed_store.put( url, parsed )
        except Exception as e: 
            print( 'ERROR for %r: %s'%( url, e ) )
            raise # during debug we want to fail on everything so we know about all issues


  0%|          | 0/130303 [00:00<?, ?it/s]

## DIRective test parse

Basically the same as above (so look for code comments above), but for regulations instead.

TODO: actually finish

In [29]:
# K items, might take
dir_nl_parsed_store = wetsuite.helpers.localdata.MsgpackKV('eurlex-dir-nl-struc.db', key_type=str)   
if False: # is an effective "please redo everything", e.g. after you've changed wetsuite.datacollect.eurlex.extract_html
    dir_nl_parsed_store.truncate() 
    dir_nl_parsed_store._put_meta('description_short','The dutch translation of metadata and text for European DIRectives in EUR-LEX. (preliminary version)')
    dir_nl_parsed_store._put_meta('description','''

    The dutch translation of metadata and text for European DIRectives in EUR-LEX. (preliminary version)

    For example, https://eur-lex.europa.eu/legal-content/NL/ALL/?uri=CELEX:32002L0084 becomes (with abbreviating omissions marked with ... ):

    {'celex': '32002L0084',
      'titles': {'title': 'Richtlĳn 2002/84/EG van het Europees Parlement en de Raad van 5 november 2002 houdende wĳziging van de richtlĳnen op het gebied van maritieme veiligheid en voorkoming van verontreiniging door schepen (Voor de EER relevante tekst)',
      'englishTitle': 'Directive 2002/84/EC of the European Parliament and of the Council of 5 November 2002 amending the Directives on maritime safety and the prevention of pollution from ships (Text with EEA relevance)',
      'originalTitle': 'Richtlĳn 2002/84/EG van het Europees Parlement en de Raad van 5 november 2002 houdende wĳziging van de richtlĳnen op het gebied van maritieme veiligheid en voorkoming van verontreiniging door schepen (Voor de EER relevante tekst)'},
      'dates': {'Date of document': '2002-11-05',
      'Date of effect': '2002-11-29',
      'Date of transposition': '2003-11-23',
      'Date of end of validity': 'No end date'},
      'misc': {'Author': 'Europees Parlement, Raad van de Europese Unie',
      'Form': 'Richtlijn',
      'Addressee': 'De vijftien lidstaten: België, Denemarken, Duitsland, Ierland, Griekenland, Spanje, Frankrijk, Italië, Luxemburg, Nederland, Oostenrijk, Portugal, Finland, Zweden, Verenigd Koninkrijk',
      'Additional information': 'uitbreiding naar de EER door 22003D0178, relevant voor de EER, richtlijn houdende wijziging, COD 2000/0237',
      'Authentic language': 'Spaans, Deens, Duits, Grieks, Engels, Frans, Italiaans, Nederlands, Portugees, Fins, Zweeds, IJslands, Noors'},
      'proc': {'Procedure number': ['2000/0237/COD'],
      'Link': ['European Parliament - Legislative observatory\n\u200b']},
      'linked': {'Treaty': 'Verdrag tot oprichting van de Europese Gemeenschap',
      'Legal basis': [['CELEX:11997E080', '11997E080 - P2'],
        ['CELEX:11997E251', '11997E251']],
      'Proposal': [['CELEX:52000PC0489(02)', '52000PC0489(02)']],
      'Link': '',
      'Modifies': 'Relation\nAct\nComment\nSubdivision concerned\nFrom\nTo\n\n\n\n\nModifies \n\n31993L0075\n\n  vervanging\n artikel 12\n29/11/2002\n\n\n\nModifies \n\n31993L0075\n\n  vervanging\n artikel 2.E)\n29/11/2002\n\n\n\nModifies \n\n31993L0075\n\n  aanvulling\n artikel 11\n29/11/2002\n\n\n\nModifies \n\n31993L0075\n\n  vervanging\n artikel 2.H)\n29/11/2002\n\n\n\nModifies \n\n31993L0075\n\n  vervanging\n artikel 2.I)\n29/11/2002\n\n\n\nModifies \n\n31993L0075\n\n  vervanging\n artikel 2.F)\n29/11/2002\n\n\n\nModifies \n\n31993L0075\n\n  vervanging\n artikel 2.G)\n29/11/2002\n\n\n\nModifies \n\n31994L0057\n\n  wijziging\n artikel 2.D)\n29/11/2002\n\n\n\nModifies \n\n31994L0057\n\n  aanvulling\n artikel 8.2\n29/11/2002\n\n\n\nModifies \n\n31994L0057\n\n  vervanging\n artikel 7.1\n29/11/2002\n\n\n\nModifies \n\n31995L0021\n\n  aanvulling\n artikel 19\n29/11/2002\n\n\n\nModifies \n\n31995L0021\n\n  vervanging\n artikel 19.C)\n29/11/2002\n\n\n\nModifies \n\n31995L0021\n\n  wijziging\n artikel 2.1\n29/11/2002\n\n\n\nModifies \n\n31995L0021\n\n  vervanging\n artikel 18.1\n29/11/2002\n\n\n\nModifies \n\n31995L0021\n\n  wijziging\n artikel 2.2\n29/11/2002\n\n\n\nModifies \n\n31996L0098\n\n  vervanging\n artikel 17\n29/11/2002\n\n\n\nModifies \n\n31996L0098\n\n  vervanging\n artikel 18\n29/11/2002\n\n\n\nModifies \n\n31996L0098\n\n  wijziging\n artikel 2.N)\n29/11/2002\n\n\n\nModifies \n\n31996L0098\n\n  wijziging\n artikel 2.C)\n29/11/2002\n\n\n\nModifies \n\n31996L0098\n\n  wijziging\n artikel 2.D)\n29/11/2002\n\n\n\nImplicit repeal \n\n31997L0034\n\n \n\n29/11/2002\n\n\n\nModifies \n\n31997L0070\n\n  vervanging\n artikel 9\n29/11/2002\n\n\n\nModifies \n\n31997L0070\n\n  aanvulling\n artikel 8\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  wijziging\n artikel 6.2\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  wijziging\n artikel 6.3\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  wijziging\n artikel 6.1\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 2.F)\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 8\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 9\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 2.C)\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 2.A)\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 2.B)\n29/11/2002\n\n\n\nModifies \n\n31998L0018\n\n  vervanging\n artikel 2.D)\n29/11/2002\n\n\n\nModifies \n\n31998L0041\n\n  aanvulling\n artikel 12\n29/11/2002\n\n\n\nModifies \n\n31998L0041\n\n  wijziging\n artikel 2.3\n29/11/2002\n\n\n\nModifies \n\n31998L0041\n\n  vervanging\n artikel 13\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  aanvulling\n artikel 17\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  vervanging\n artikel 2.O)\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  vervanging\n artikel 16\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  wijziging\n bijlage 1\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  vervanging\n artikel 2.D)\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  vervanging\n artikel 2.E)\n29/11/2002\n\n\n\nModifies \n\n31999L0035\n\n  vervanging\n artikel 2.B)\n29/11/2002\n\n\n\nModifies \n\n32000L0059\n\n  wijziging\n artikel 2.B)\n29/11/2002\n\n\n\nModifies \n\n32000L0059\n\n  vervanging\n artikel 14.1\n29/11/2002\n\n\n\nModifies \n\n32000L0059\n\n  aanvulling\n artikel 15\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.22\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.23\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 23.1\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.24\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  toevoeging\n artikel 22.4\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.18\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.16\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.21\n29/11/2002\n\n\n\nModifies \n\n32001L0025\n\n  vervanging\n artikel 1.17\n29/11/2002\n\n\n\nModifies \n\n32001L0096\n\n  vervanging\n artikel 14.1\n29/11/2002\n\n\n\nModifies \n\n32001L0096\n\n  wijziging\n artikel 3.2)\n29/11/2002\n\n\n\nModifies \n\n32001L0096\n\n  toevoeging\n artikel 15.3\n29/11/2002',
      'Modified by': 'Relation\nAct\nComment\nSubdivision concerned\nFrom\nTo\n\n\n\n\nImplicitly repealed by\n\n32002L0059\n\n gedeeltelijke intrekking\nartikel 2\n05/02/2004\n\n\n\nModified by\n\n32008L0106\n\n gedeeltelijke intrekking\n\n\n\n\n\nImplicitly repealed by\n\n32009L0015\n\n gedeeltelijke intrekking\nartikel 3\n17/06/2009\n\n\n\nImplicitly repealed by\n\n32009L0016\n\n gedeeltelijke intrekking\nartikel 4\n01/01/2011\n\n\n\nModified by\n\n32009L0045\n\n gedeeltelijke intrekking\n\n\n\n\n\nModified by\n\n32014L0090\n\n gedeeltelijke intrekking\n\n\n\n\n\nImplicitly repealed by\n\n32017L2110\n\n gedeeltelijke intrekking\nartikel 9\n20/12/2017\n\n\n\nImplicitly repealed by\n\n32019L0883\n\n gedeeltelijke intrekking\nartikel 10\n27/06/2019',
      'Affected by case': [['CELEX:62005CJ0143',
        'Proceedings concerning failure by Member States 62005CJ0143']],
      'Instruments cited': [['CELEX:31987D0373', '31987D0373'],
        ['CELEX:31995R3051', '31995R3051'],
        ['CELEX:31999D0468', '31999D0468'],
        ['CELEX:32002R2099', '32002R2099']]},
      'doctrine': {},
      'classifications': {'EUROVOC descriptor': ['Internationale Maritieme Organisatie',
        'voorkoming van verontreiniging',
        'verontreiniging door schepen',
        'internationale norm',
        'veiligheid op zee',
        'comité (EU)'],
      'Subject matter': ['Milieu', 'Vervoer'],
      'Directory code': ['07.30.30.00 \nVervoerbeleid\n / \nZeevervoer\n / \nZeeveiligheid',
        '15.10.20.20 \nMilieuzaken, consumentenbelangen en bescherming van de gezondheid\n / \nMilieuzaken\n / \nVerontreiniging en hinder\n / \nWaterbeheer en -bescherming']},
      'contents': [['BG', 'HTML', 'https://eur-lex.europa.eu/legal-content/BG/TXT/HTML/?uri=CELEX:32002L0084'],
                  ['ES', 'HTML', 'https://eur-lex.europa.eu/legal-content/ES/TXT/HTML/?uri=CELEX:32002L0084'],
                  ['CS', 'HTML', 'https://eur-lex.europa.eu/legal-content/CS/TXT/HTML/?uri=CELEX:32002L0084'],
                  ...
                  ],
      'text': [['',
        ['Richtlijn 2002/84/EG van het Europees Parlement en de Raad',
        'van 5 november 2002',
        'houdende wijziging van de richtlijnen op het gebied van maritieme veiligheid en voorkoming van verontreiniging door schepen',
        '(Voor de EER relevante tekst)',
        'HET EUROPEES PARLEMENT EN DE RAAD VAN DE EUROPESE UNIE,',
        'Gelet op het Verdrag tot oprichting van de Europese Gemeenschap, inzonderheid op artikel 80, lid 2,',
        'Gezien het voorstel van de Commissie(1),',
        'Gezien het advies van het Economisch en Sociaal Comité(2),',
        "Gezien het advies van het Comité van de Regio's(3),",
        'Volgens de procedure van artikel 251 van het Verdrag(4),',
        'Overwegende hetgeen volgt:',
    ]]]}
    '''+wetsuite.datasets.generated_today_text())

for url, docbytes in wetsuite.helpers.notebook.ProgressBar( dir_docs_nl.items() ):
    if url not in dir_nl_parsed_store:
        try:
            parsed = wetsuite.datacollect.eurlex.extract_html( docbytes )  # that function is where most of the scraping code sits
            dir_nl_parsed_store.put( url, parsed )
        except Exception as e: 
            print( 'ERROR for %r: %s'%( url, e ) )
            raise # during debug we want to fail on everything so we know about all issues                         

  0%|          | 0/4142 [00:00<?, ?it/s]