The GDPR in XML for all 24 EU languages
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
consolidated-with-corrected-preamble
consolidated
corr-2018-formatted
corr-2018-original
formatted
original
.gitignore
README.md
formex-05.55-20141201.xd
gdpr-articles.xsl
gdpr-recitals.xsl
generate.sh
style.css

README.md

GDPR in XML

This repository contains Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), a.k.a., the GDPR, in the 24 languages of the EU in various forms:

original/: The original, untouched XML versions of the GDPR [source]

formatted/: The XML files from original after having run them through xmllint --format to reformat/reindent.

corr-2018-original/: The untouched XML versions of the corrections (corrigendum) published in 2018. [source]

corr-2018-formatted/: The XML files from corr-2018-original after having run them through xmllint --format.

consolidated: The XML versions of the consolidated GDPR (i.e., with corrections integrated), without the preamble and its 173 recitals. [source]

consolidated-with-corrected-preamble: The XML files from consolidated, with the preambles from formatted added, and the preamble corrections merged in.

gdpr-articles.xsl and gdpr-recitals.xsl are (ugly) XSL transformations that can be used with e.g. xsltproc to generate HTML versions of the files in consolidated-with-corrected-preamble. These are used by the bash script generate.sh to create https://gdpr.dataskydd.net/. The script creates site/{bg,cs,da,..}/{art,rec}/index.html, site/index.html, and copies style.css into it.

Why?

I wanted to make nice HTML versions of the GDPR. EUR-Lex has terrible HTML versions generated from XML, but doesn't provide the actual XML.

publications.europa.eu provides the XML files, but makes it slightly annoying to get direct links, and the links look like they'd make Tim Berners-Lee shed a tear. The links eventually resolve to zip files with hopeless names like 3e485e15-11bd-11e6-ba9a-01aa75ed71a1.zip. Within the zip files, the XML files are identically named for each language: DOC_1_1.xml and DOC_2_1.xml. (The corrections and consolidated XML files as provided by the EU Publications Office suffer from the same issues.)

The files in original/ are the ones from publications.europa.eu. They are identical to (i.e., have the same hash as) the corresponding files in the zip archive of the Official Journal of the European Union, L 119, 4 May 2016. E.g., JOx_FMX_EN_2016.ZIP (found on the EU Open Data Portal) contains JOLFMX_2016_119_1_EN.zip which in turn contains L_2016119EN.01000101.xml and L_2016119EN.01000101.doc.xml, which are identical to this repository's original/L_2016119EN.01000101.xml and original/L_2016119EN.01000101.doc.xml.

So, the purpose of this repo is just to make it slightly less cumbersome if you want to work with multiple language versions of the GDPR. Feel free to use the XSL files for any purpose.

The XML files are © European Union, 2016, here legally reused.