Skip to content
This repository has been archived by the owner on Jun 10, 2023. It is now read-only.

Create e-reporting (IPR) with RIVM data #7

Open
thijsbrentjens opened this issue Jun 19, 2014 · 41 comments
Open

Create e-reporting (IPR) with RIVM data #7

thijsbrentjens opened this issue Jun 19, 2014 · 41 comments

Comments

@thijsbrentjens
Copy link
Member

No description provided.

@thijsbrentjens
Copy link
Member Author

Create (harmonised) data for:
Zones & Agglomeration (INSPIRE theme: AM, Dataflow B in IPR)
Stations (INSPIRE theme: EF, IPR Dataflow D)
Aggregated results / statistics (IPR Dataflow F)

See the website: http://www.eionet.europa.eu/aqportal/datamodel
for technical details and XML schema

@justb4
Copy link
Contributor

justb4 commented Aug 21, 2014

Ok, getting more grips on this after seeing the AM and EF XSDs:
https://github.com/Geonovum/sospilot/tree/master/data/inspire/xsd and the AQ XSDs:
https://github.com/Geonovum/sospilot/tree/master/data/eionet/xsd

There is also an RIVM example: https://github.com/Geonovum/sospilot/blob/master/data/eionet/xsd/REP_D-NL_RIVM_20140805_B-002.xml
It is probably XML invalid as it does not contain a geometry field (think required), but it is a start.

The User Guide is also useful:
http://www.eionet.europa.eu/aqportal/guidelines/UserGuide2_AQD_XML_v3.0_publication.pdf

@justb4
Copy link
Contributor

justb4 commented Aug 25, 2014

The RIVM report for "(B) Information on zones and agglomerations (Article 6)" https://github.com/Geonovum/sospilot/blob/master/data/eionet/xsd/REP_D-NL_RIVM_20140805_B-002.xml is quite complete. With one small change it validates against the AQD AM schema: AirQualityReporting.xsd http://dd.eionet.europa.eu/schemas/id2011850eu-1.0/AirQualityReporting.xsd. The change only deals with a GML issue where deprecatedTypes.xsd is required in the schemaLocation:

 http://www.opengis.net/gml/3.2 http://schemas.opengis.net/gml/3.2.1/deprecatedTypes.xsd

Only the geometry (am:geometry element) refers to a Shapefile in EPSG:28992:

         <am:geometry
                xlink:href="http://cdr.eionet.europa.eu/nl/eu/aqd/b/envu9_csq/airquality_zone_agglomeration_v2013.shp"/>

while Germany uses real GML geometries for the am:geometry field like:

  <am:geometry>
    <gml:Polygon gml:id="ZON.PG.DEZAXX0001O" srsName="urn:ogc:def:crs:EPSG::4326">
      <gml:exterior>
        <gml:LinearRing>
          <gml:posList srsDimension="2">52.703983225045874 13.989109895305489 52.705805777614614 13.98454702572468 .....

The question is what we should do with this? Replace Shapefile xlink with inline GML geometry?
The refered Shapefile is valid and can be read into PostGIS and shown via WMS, see
Heron viewer: http://sensors.geonovum.nl/heronviewer (Zones and Agglomeration layer).
Is thus Dataflow B complete already?

@justb4
Copy link
Contributor

justb4 commented Aug 25, 2014

And also REP_D-NL_RIVM_20140805_D-002.xml (http://cdr.eionet.europa.eu/nl/eu/aqd/d/envu9_j7q/ Dataflow D validates OK, refers to B for zones).

@justb4
Copy link
Contributor

justb4 commented Aug 28, 2014

In summary: the question is: what is there specifically to do (for us) in this issue? Several Dataflows like B and D have been uploaded to the Eionet AQ Portal on aug 5, 2014, but also E (Measurements 223MB). There is feedback, like for E: "NL has reported both hourly and daily values. This results in 'double' observation objects. It is not intended to report aggregated data so they should report in this case only hourly data." I don't know how these uploaded dataflow files were made (handmade/generated/ETL tools?) and who is now working on the feedbacks.

How to proceed? I will ask Hans/RIVM.

justb4 added a commit that referenced this issue Sep 3, 2014
@justb4
Copy link
Contributor

justb4 commented Sep 3, 2014

Ok learned more from existing dataflow reports, EEA workshops, IRCELINE approach and more:

  • all data for generating AQ Reporting should come from services (WFS, REST, SOS, ...)
  • some data preparation in advance is always required, preferably putting all data in PostGIS, using VIEWs etc
  • most of the reporting GML is "Boilerplate" and has repetitive (GML) patterns: reporting headers up to INSPIRE identifiers
  • there is lots of commonality in input data: organizational contact, id-prefixes, reporting authorities
  • this common data results in "Boilerplate" GML elements in dataflow reports
  • there is thus an opportunity for large scale reuse of both common input data and common elements
  • the actual reporting XML has a zillion options, but the Guidelines in the UserGuide: http://www.eionet.europa.eu/aqportal/guidelines/UserGuide2_AQD_XML_v3.0_publication.pdf provide mandatory (GML) encoding rules
  • a generic ETL tool will be hard to steer towards these specific encoding rules: e.g. using a FeatureCollection where the first Feature contains the Reporting Header with xlinks to the actual Features within the same dataflow doc.
  • the xlinks do not follow standard ruling, i.e. each xlink contains a concatenated INSPIRE id (i.s.o. an internal "hash#" URI)
  • there are many reports available on Eionet, so together with the UserGuide it is quite clear how they should look like

All in all, I did a PoC using a proven technology, although probably never used in the domain of (INSPIRE) GML Application Schema's: Python Templating. Python web frameworks like Django have a long and proven history of using Templating for generating HTML, XML or any other document from input structures applied to a Template to render a Document. There are numerous examples of Templating Languages: Django, Mako, Jinja2, Genshi, Mustache, see https://wiki.python.org/moin/Templating. Most Templating Languages like Mustache are not even bound to Python.

Stetl http://stetl.org is a Python ETL framework, based on Input, Filter (transform) and Output modules. XML transformation up to now was mostly done using an XSLTFilter, also for harmonizing INSPIRE data from local sources. However: XSLT has the disadvantage of being verbose, passive/recursive/matching-driven, complex and without hardly any variable/control/function structuring possibilities other than proprietary XSLT-processor bindings or built-in functions.

So I start trying an approach using Python Templating: https://wiki.python.org/moin/Templating, first within Stetl with some simple examples: using the Python built-in String Templating: https://github.com/justb4/stetl/tree/master/examples/basics/9_string_templating. But after that I made a choice from the zillion Python Templating Languages with one that is the used most and has a very active development community: Jinja2 http://jinja.pocoo.org. Jinja2 is for example extensively used in the widely known Django framework and in the Open Data world within CKAN: http://docs.ckan.org/en/latest/theming/templates.html. Learning Jinja2, I had never used Python Templating, only Java JSP which is a bit similar, took just a couple of hours and browsing examples on the Web. The development of a Jinja2TemplatingFilter in Stetl was therefore quite trivial, about 30 lines of code, all TemplatingFilters are in: https://github.com/justb4/stetl/blob/master/stetl/filters/templatingfilter.py. Jinja2 allows standard Template control structuring like loops, i.e. for looping over Features, but also a concept of "globals", variables to be applied globally. This proved to be very convenient for common "Boilerplate" data like organisations, telephone numbers etc. A worked out example is at:
https://github.com/justb4/stetl/tree/master/examples/basics/10_jinja2_templating
In this example a simple (XML) and advanced (GML) transformation is illustrated. The advanced example shows: macros, globals and Filters.

Applying the Jinja2TemplatingFilter to RIVM AQ Reporting proved to be almost trivial. A PoC is done for mapping the WFS RIVM Stations FeatureType to a Dataflow D report. The example can be found at https://github.com/Geonovum/sospilot/tree/master/src/aq-report. This Dataflow D report was created in just a few hours and contains no custom code, just some Templates with about 50 lines of Jinja2 code! This could be enhanced further with macros etc. , but it shows a very promising approach for AQ Reporting and IMO even INSPIRE harmonization. Jinja2 is so common that almost all IDEs like IDEA and Eclipse will support its syntax. Plus it is blazingly fast. So far: Enter The Jinja2!

I see many advantages in this approach:

  • very few lines of code to maintain (dataflowD PoC only 50 lines)
  • Jinja2 is a very standard Templating language with many examples on the web
  • very suitable in batch/back-end automated environments
  • Python/Jinja2 programmers are widely available
  • reuse over reports: by adding macros for common structures like INSPIRE id's
  • wide Jinja2 IDE support (syntax highlighting, debugging etc)

@thijsbrentjens
Copy link
Member Author

This seems very promising and nice work. We could try this for the other dataflows with GML as well, so the whole process could be automated (instead of using the manual reporting tooling of EIONET as is done now).

@thijsbrentjens
Copy link
Member Author

A short remark about these xlinks to the features in Dataflow D: is there an encoding rule for the values in xlink:href? Shouldn't the values match with gml:id values (in that GML document)? In that case, either the INSPIRE namespace should be in the gml:id or the xlink values should be changed.

@justb4
Copy link
Contributor

justb4 commented Sep 4, 2014

The xlinks are encoded according to the UserGuide, see link above or the aqportal. Yes normally I would expect a #gmlid and a gml id in the target element. Well, why use xlinks at all? This is a featurecollection...Also strange is that the ReportingHeader is a feature and not in the general sectio of the FC...
Just van den Broecke @Nexus10

Thijs Brentjens notifications@github.com wrote:

A short remark about these xlinks to the features in Dataflow D: is there an encoding rule for the values in xlink:href? Shouldn't the values match with gml:id values (in that GML document)? In that case, either the INSPIRE namespace should be in the gml:id or the xlink values should be changed.


Reply to this email directly or view it on GitHub.

@justb4
Copy link
Contributor

justb4 commented Sep 4, 2014

Thanks! I think this can work. Main problem is getting the source data from RIVM. Best is to build web services for this source data. Probably most can be effected with data in PostGIS and WFS + SOS. Both can deliver data as (geo)json like in my example. By using VIEWs we can join tables and make selections (or via table join service ;-))...
Just van den Broecke @Nexus10

Thijs Brentjens notifications@github.com wrote:

This seems very promising and nice work. We could try this for the other dataflows with GML as well, so the whole process could be automated (instead of using the manual reporting tooling of EIONET as is done now).


Reply to this email directly or view it on GitHub.

@thijsbrentjens
Copy link
Member Author

About the xlinks: this could be a flaw in the UserGuide then. It doesn't make sense. But these xlinks are not very useful here at first sight indeed.

About getting the source data: using web services that provide the source data (also useful as a "simple" version of the data) would be a nice approach. Not sure about the table join service here, but who knows :). If the database and some extra views in it are sufficient, then that would be good as well.

@justb4
Copy link
Contributor

justb4 commented Sep 4, 2014

xlink-in-aqipr

UserGuide page 34 e.v. I think we also see the flaw in INSPIRE id: a nested structure i.s.o. a hierarchical id...The document would not even validate IMO.

@justb4
Copy link
Contributor

justb4 commented Sep 4, 2014

But rendering an INSPIRE id struct could be our first Jinja2 macro :-).

@thijsbrentjens
Copy link
Member Author

For Dataflow B we need some more input of RIVM: what data sources to use to create the dataflow?

justb4 added a commit that referenced this issue Sep 5, 2014
justb4 added a commit that referenced this issue Sep 5, 2014
@thijsbrentjens
Copy link
Member Author

RIVM offers a WFS with aqd_zones (http://acceptatie.inspire.rivm.nl/geoserver/wfs?request=GetFeature&typeName=inspire:aqd_zone&outputformat=JSON). This is in accept, we need to check if we can use this data, since it is not in RIVM's production WFS.

@thijsbrentjens
Copy link
Member Author

Alternatively, we could use the shapefile as offered in the AQPortal: http://cdr.eionet.europa.eu/nl/eu/aqd/b/envu9_csq

@thijsbrentjens
Copy link
Member Author

For dataflow B, the pollutants need to be mapped to the vocabulary of AQ. The definitions can be found at: http://dd.eionet.europa.eu/vocabulary/aq/pollutant/view
Available as Linked Data, e.g. in RDF-XML or JSON-LD:

http://dd.eionet.europa.eu/vocabulary/aq/pollutant/rdf

and:

http://dd.eionet.europa.eu/vocabulary/aq/pollutant/json

@justb4
Copy link
Contributor

justb4 commented Sep 8, 2014

At sensors.geonovum.nl we have already a WFS (and WMS) based on the above Shapefiles, see http://sensors.geonovum.nl/gs/wfs?request=GetFeature&typeName=sensors:zones&outputformat=JSON . We can come quite far, but we are still lacking zone data attributes, like pollutants for the Zones. Also the corresponding properties are different for both WFSs and each lacking in data. For example for Zone Heerlen/Kerkrade the RIVM WFS has these properties (e.g. population and area are null, zone_type should be 'agg' or 'nonagg' etc):

{"properties": {
    "inspireid": "http://data.rivm.nl/inspire/so/ef/aqd-zone/NL0320/0",
    "zone_code": "NL0320",
    "versionid": 0,
    "predecessor": null,
    "beginlifespanversion": "2013-05-06T11:03:35.926Z",
    "endlifespanversion": null,
    "zone_name": "Heerlen/Kerkrade",
    "zone_type": "airQualityManagementZone",
    "application_start_date": "2001-06-20T22:00:00Z",
    "application_end_date": null,
    "documentation_of_predecessors": null,
    "resident_population": null,
    "resident_population_ref_year": null,
    "area_of_zone_value": null,
    "area_of_zone_uom": "sqm",
    "designated_pollutant": null,
    "protection_target": "Health",
    "timeextensionexemption": "NO2-annual",
    "environmental_domain": "air",
    "plan": null,
    "legalbasis": "Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe",
    "relatedzone": null,
    "authority_name": "Ministerie van I&M",
    "webaddress": "http://www.rijksoverheid.nl/ministeries/ienm",
    "responsible_person_name": "Inge van der Veen",
    "address": "Plesmanweg 1-6 2597JG Den Haag",
    "telephone_number": "+31704560000",
    "email": null
}}

While the Geonovum "Sensors"WFS has (missing e.g. point of contact):

{"properties": {
    "objectid": 9,
    "geometry_l": 81872.10739,
    "geometry_a": 1.74366211597E8,
    "zone_code": "NL0320",
    "zone_name": "Heerlen/Kerkrade",
    "zone_name_": "Agglomeratie Heerlen/Kerkrade",
    "start_year": 2011,
    "end_year": null,
    "zone_type": "agg",
    "zone_popul": 231870,
    "zone_pop_1": 2013,
    "zone_area_": 174366,
    "zone_area1": 174,
    "zone_prede": null
}}

Both WFSs e.g. are missing the pollutants (while those may be tied to the Stations who are tied to zones). An overall database-schema within RIVM would help tremendously.

Though the Aq-Portal provides CSVs from the XMLs, this would be just temporary, e.g. for Zone B 3 CSVs:
Dataflow B: General AQ zones information as CSV
http://cdr.eionet.europa.eu/Converters/run_conversion?file=nl/eu/aqd/d/envu9_j7q/REP_D-NL_RIVM_20140805_D-002.xml&conv=469&source=remote

Dataflow B: Pollutant and protection targets as CSV
http://cdr.eionet.europa.eu/Converters/run_conversion?file=nl/eu/aqd/d/envu9_j7q/REP_D-NL_RIVM_20140805_D-002.xml&conv=470&source=remote

Dataflow B: Competent Authorities for AQ zones
http://cdr.eionet.europa.eu/Converters/run_conversion?file=nl/eu/aqd/d/envu9_j7q/REP_D-NL_RIVM_20140805_D-002.xml&conv=471&source=remote

@thijsbrentjens
Copy link
Member Author

Okay, I'll have a lookt at these new zones. We need information from the RIVM on their data sources for the missing properties I think. But for a first version, I'll give it a try with what we have.

@justb4
Copy link
Contributor

justb4 commented Sep 8, 2014

Yes, a good approach. I think a challenge is to convert GeoJSON to GML
Geometries. For this I plan to add a Custom Jinja2 Filter in Stetl:
http://jinja.pocoo.org/docs/dev/api/#custom-filters

Probably using Python OGR http://www.gdal.org/classOGRGeometry.html to
read GeoJSON and export to GML Geometry. Like examples in
http://pcjericks.github.io/py-gdalogr-cookbook/geometry.html

    geojson = """{"type":"Point","coordinates":  [108420.33,753808.59]}"""
    geom = ogr.CreateGeometryFromJson(geojson)
    gml_str = geom.ExportToGML  (options)

The filter expression for each zone object becomes something like

  <am:geometry>
      {{ zone.geometry | geojson2gml(version=2.1.2) }}
  </am:geometry>

I will only do the Stetl-part within Stetl, you can go ahead with flow
B. Hope to have something today. You now can use latest Stetl via sudo
pip install stetl (v1.0.6).

On 08-09-14 11:46, Thijs Brentjens wrote:

Okay, I'll have a lookt at these new zones. We need information from the
RIVM on their data sources for the missing properties I think. But for a
first version, I'll give it a try with what we have.


Reply to this email directly or view it on GitHub
#7 (comment).

@thijsbrentjens
Copy link
Member Author

Okay, I leave the geom for what is it now. I'm using ogr2ogr's -sql option to join the information from the CSV to the GeoJSON file, that seems to work. Would that be easy / interesting to use with Stetl?

Example command:

ogr2ogr -sql "select * from OGRGeoJSON a left join 'zonesattr.csv'.zonesattr b on a.zone_code = b.zone_code" -f "GeoJSON" zones-joined.json zones.json

Edit: OGR2OGR shortens the attribute names, like is done in Shapefile column names. Maybe this is not the best approach for joining the CSV file, but we could change this when it is clear how RIVM could / would deliver the data

@thijsbrentjens
Copy link
Member Author

Note that for joining there are different codes used sometimes: e.g. in our WFS we have NL0201 for Midden, in the CSV file Zones_NL-001_upd.csv it is NL0200.
I can fix them now, but this is something we need to discuss with RIVM.

@thijsbrentjens
Copy link
Member Author

Note that this also means that the joined data might not contain correct values.

justb4 added a commit that referenced this issue Sep 8, 2014
justb4 added a commit that referenced this issue Sep 8, 2014
justb4 added a commit that referenced this issue Sep 8, 2014
…utants from codelist values - PM2.5 added
@justb4
Copy link
Contributor

justb4 commented Sep 8, 2014

Good to see your first version of the Zones to Dataflow B ETL!

Pollutant codes: I checked in a sort of hack but we may get those
globals from a service. This is the relevant and standard Jinja2 code:

         {% set zone_pollutants = 

feature.properties.zone_pollutant.split(';') %}
aqd:pollutants
{% for zone_pollutant in zone_pollutants %}
aqd:Pollutant
<aqd:pollutantCode xlink:href="{{
globs.pollutant_defs[zone_pollutant].pollutant_code }}"/>
<aqd:protectionTarget xlink:href="{{
globs.pollutant_defs[zone_pollutant].protection_target }}"/>
/aqd:Pollutant
{% endfor %}
/aqd:pollutants

with globs defined as:

     "pollutant_defs": {
         "BaP-H": {
             "pollutant_code": 

"http://dd.eionet.europa.eu/vocabulary/aq/pollutant/29",
"protection_target":
"http://dd.eionet.europa.eu/vocabulary/aq/protectiontarget/H"
},
"Benzene-H": {
"pollutant_code":
"http://dd.eionet.europa.eu/vocabulary/aq/pollutant/20",
"protection_target":
"http://dd.eionet.europa.eu/vocabulary/aq/protectiontarget/H"
},

But possibly there is a better solution....I checked in a bit too much:
I had several other .xls's from the RIVM FTP server.

On 08-09-14 17:13, Thijs Brentjens wrote:

Nice work Just. I tested it and it seems to work fine. One major thing
left is dealing with matching the pollutant codes to URIs as defined in
the vocabulary.


Reply to this email directly or view it on GitHub
#7 (comment).

kind regards / met vriendelijke groet,

--Just

Just van den Broecke just@justobjects.nl
Just Objects B.V. tel +31 65 4268627 Skype: justb4
The Netherlands http://www.justobjects.nl

justb4 added a commit that referenced this issue Sep 9, 2014
…ts and axis ordering, example in Dataflow-D ETL
@justb4
Copy link
Contributor

justb4 commented Sep 9, 2014

Refined the GML macros, with the usual GML-mess: GML3 vs GML2 encoding and Axis Ordering. Next is to migrate most/all GML macro's to Jinja2 Filters in Stetl that use Python OGR. Also updated validate.sh to include dataflow-B. dataflow-D output now validates against AQD/INSPIRE/GML schemas.

@justb4
Copy link
Contributor

justb4 commented Sep 9, 2014

Tip: if you commit to GitHub and provide the issue number in the commit message, that message will appear here, for example:

  git commit -m "issue #7 - refinement macros-gml.jinja2: cater for GML2/GML3-constructs and axis ordering, example in Dataflow-D ETL"

justb4 added a commit that referenced this issue Sep 9, 2014
@thijsbrentjens
Copy link
Member Author

Thanks for the tip, I just forgot that with the previous commit.

Regarding the pollutant codes : I'd think direct support for looking up the codes in the SKOS vocabularies would be elegant (I found some python libs for that), but I can generate the "codelist" to use that in the globs for now.

@justb4
Copy link
Contributor

justb4 commented Sep 10, 2014

SKOS-based data, where (URL) is this service? I was under the impression that the component-codes like 'NO2-H' were RIVM-specific. But eventually we should be able to generate reports from live services: WFS, SOS, REST, SPARQL, whatever.

Now data can be applied to a Jinja2 template via either standard input data (JSON file) via a Jinja2 Context and or "globals" (also JSON file) via Jinja2 Environment. I think "globals" should be kept to a minimum. There are two limitations right now (in Stetl):

  • only one input (GeoJSON) File and Globals (JSON) File possible to configure
  • only file-based input and Globals

Within Jinja2 (and in Stetl) the input file is passed as a "Jinja2 Context", in Python a dict (hashmap), for example "features" as used in our templates is in fact a key from a dict. same for the globals ("globs" or whatever is named as top-key). Useful Stetl-extensions thus could be the following:

  • allow multiple input and globals files to be configured
  • allow also http-based input data and globals, basically the "Resource" approach

Another possibility, a bit-more involved is to develop "smart" Jinja2 Filters, that will actually invoke an external web service like a SPARQL end-point....

justb4 added a commit that referenced this issue Sep 10, 2014
… Geometry i.s.o. via macro for Dataflow-D
@justb4
Copy link
Contributor

justb4 commented Sep 10, 2014

Another update: the Jinja2 Filter to generate GML from GeoJSON geometry has been improved in latest GitHub Stetl and is used in Dataflow-D jinja2 template, as follows:

    <ef:geometry>
            {# Generate a Point (or any other) GML geometry from a GeoJSON geometry using the geojson2gml
              Jinja2 custom Filter.
             By specifying a target_crs we can even reproject from the source CRS.
             The gml_format=GML2|GML3 determines the general GML form: e.g. pos/posList or coordinates. gml_longsrs=YES|NO
             determines the srsName format like EPSG:4326 or urn:ogc:def:crs:EPSG::4326 (long).
             gml_longsrs=YES will also do XY swapping (lat/lon) for lat/lon based projections.
            Generate gml id first (gml:id is GML3-specific and optional) #}
           {% set gml_id = 'STA_G-%s' % feature.properties.local_id %}
          {{ feature.geometry | geojson2gml(source_crs=crs, target_crs=4258, gml_id=gml_id, gml_format='GML3', gml_longsrs='YES') }}
    </ef:geometry>

The output then becomes like:

      <ef:geometry>
          <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" 
             gml:id="STA_G-STA-NL00235">
              <gml:pos>51.43500137 4.36028624</gml:pos>
         </gml:Point>
      </ef:geometry>

justb4 added a commit that referenced this issue Sep 10, 2014
… Geometry i.s.o. via macro for Dataflow-D - fixed output
@justb4
Copy link
Contributor

justb4 commented Sep 10, 2014

Ok, in latest Stetl GH version it is possible with Jinja2 filter to configure:

  • input file: file or URL (either should return JSON or GeoJSON), multiple files is possible but these are not concatenated but sequentially streamed into the Chain as this is Stetl default (e.g. directory with GML files)
  • globals file: multiple files and/or URLs can be configured (either should return JSON or GeoJSON) in the globals_file_path

it seems to make more sense to use the globals for "reference-data" or data to be expanded/joined while the input-data is the core data. But experience will tell....

See example (Example 3, bottom) at https://github.com/justb4/stetl/blob/master/examples/basics/10_jinja2_templating/etl.cfg

Maybe problem is that some services don't return JSON but XML...

thijsbrentjens added a commit that referenced this issue Sep 11, 2014
…t, to map the codes of RIVM to the harmonised codes
@thijsbrentjens
Copy link
Member Author

Yesterday I have created an XSLT to extract the notations and URIs to use in the Jinja globs. It transforms the RDF from http://dd.eionet.europa.eu/vocabularies?expand=true&expanded=&folderId=1, e.g. for pollutants: http://dd.eionet.europa.eu/vocabulary/aq/pollutant/view and the RDF http://dd.eionet.europa.eu/vocabulary/aq/pollutant/rdf

@justb4
Copy link
Contributor

justb4 commented Sep 11, 2014

Mooi werk, eleganter met de parts-split en lookup van pollutant def en protection target def via Jijna2 template. De geometry zou nu ook via nieuwe Filter (laatste Stetl GH versie) moeten kunnen worden ingevuld, spannend, nog niet voor MultiPolygon geprobeerd, zal iets moeten worden als

       {% set gml_id = 'ZON_G-%s' % feature.properties.local_id %}
       {{ feature.geometry | geojson2gml(source_crs=crs, target_crs=4258, gml_id=gml_id, gml_format='GML3', gml_longsrs='YES') }}

Is het gelijk in INSPIRE ETRS89...

@thijsbrentjens
Copy link
Member Author

An exact match for the code "Benzene" (as used in RIVM values) seems to be missing in the vocabulary. We need to discuss with RIVM what to do here.

thijsbrentjens added a commit that referenced this issue Sep 11, 2014
@justb4
Copy link
Contributor

justb4 commented Sep 11, 2014

Great! Dataflow-B now with MultiSurface's. You can run ./validate.sh for schema validation. Apart from Benzene there is a validation issue with empty am:beginLifespanVersion. Looking at the existing examples I placed under https://github.com/Geonovum/sospilot/tree/master/data/eionet/aq-report, I see that the date of report-generation is used, e.g. for the 5 aug 14 Dataflow-B report:

        <am:beginLifespanVersion>2014-08-05T10:04:00+01:00</am:beginLifespanVersion>

Maybe there is a Jinja2 'current_date' template or we could add one, or via a macro.

@justb4
Copy link
Contributor

justb4 commented Sep 11, 2014

Is Benzene niet http://dd.eionet.europa.eu/vocabulary/aq/pollutant/20 (Benzene (air)? De pollutant code is welliswaar C6H6 maar dat is Benzeen (hexagon van 6 koolstof-atomen, met ieder 1 H-atoom). Heb ik toch nog wat aan mijn scheikunde studie :-).

justb4 added a commit that referenced this issue Sep 11, 2014
@thijsbrentjens
Copy link
Member Author

Correct, Benzene is C6H6. The thing is: how to map this automatically using the codes RIVM provides? I'd say let's create an exception for now and try to find out why RIVM uses their codes.

@justb4
Copy link
Contributor

justb4 commented Sep 15, 2014

On 15-09-14 11:18, Thijs Brentjens wrote:

Correct, Benzene is C6H6. The thing is: how to map this automatically
using the codes RIVM provides? I'd say let's create an exception for now
and try to find out why RIVM uses their codes.


Reply to this email directly or view it on GitHub
#7 (comment).

Yes, that is why I assumed that the mapping from like
"BaP-H;Benzene-H;CO-H;NO2-H;O3-H;O3-V;PM10-H;PM2.5-H;SO2-H" was
RIVM-internal/specific.

justb4 added a commit that referenced this issue Oct 7, 2014
@justb4
Copy link
Contributor

justb4 commented Oct 7, 2014

De laatste XML RIVM AQ bestanden van RSpoor toegevoegd en naar CSV omgezet.
Zie https://github.com/Geonovum/sospilot/tree/master/src/aq-report/input/rspoor

Begin gemaakt met Dataflow-C AQD_AssessmentRegime ETL. Is te doen. Voornaamste 2 onduidelijkheden:

#. de mapping van Pollutant naar een Eionet Codelist URI, bijv "BaP" moet worden http://dd.eionet.europa.eu/vocabulary/aq/pollutant/5029 ("BaP in PM10") maar er matchen meerdere URIs
#. hoe de data te verkrijgen voor elementen binnen aqd:environmentalObjective, dus bijv

                <aqd:environmentalObjective>
                    <aqd:EnvironmentalObjective>
                        <aqd:objectiveType xlink:href="http://dd.eionet.europa.eu/vocabulary/aq/objectivetype/TV"/>
                        <aqd:reportingMetric
                                xlink:href="http://dd.eionet.europa.eu/vocabulary/aq/reportingmetric/aMean"/>
                        <aqd:protectionTarget
                                xlink:href="http://dd.eionet.europa.eu/vocabulary/aq/protectiontarget/H"/>
                    </aqd:EnvironmentalObjective>
                </aqd:environmentalObjective>

justb4 added a commit that referenced this issue Oct 7, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants