-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEP data workflow: air quality data #17
Comments
Italian Air quality data Data transformation
The Data transformation phase was applied only to the dataset related to the PM10 pollutant. Data loading
Possible integration in the French Air pollution datasets |
Here is the direct link to the data for France, 2019 and PM10. Now we have to find a way to automate the "Download CSV" button. Regarding the Nominatim API for geocoding, the problem is that it does not return the LAU (commune code), only the postal code, which is not the same (see example). Also, the file on FTP does not seem to be UTF-8. |
Several themes regarding Air Quality are mixed together in ISPRA website Dataset structure also differs from file to file. Mappings will be provided shortly NB: We could switch to other more easy to manage data sources, such as the same EEA portal used to fetch French data if we just wanted to test the pipeline process. Different data sources like ISPRA give us a better use case to test data integration process as well. |
About ISPRA data file harmonization. There are three sections:
|
@FranckCo To automate the download, I saw that an API is called to get the resource that has this format: Regarding the service for geocoding, in addition to the Municipality field, is also needed the LAU? I could add it, I just need to have the table of the French LAUs, these are the ones found on the Eurostat website. I checked the files on the server and converted the non-UTF-8 ones, reorganized them and the pollutant files are now here. The Italian ones are in CSV format, if necessary I will also upload those in Excel format. |
Dear Franck, another option is to use European source both for Territorial metadata (Nuts3/Lau) and for Air Pollution Dataset retrieval. We tested Francesca's url above with "Italy" instead of "France" and it works. |
Update regarding territorial representation mismatch in AIR Quality data:
These codes must be converted into NUTS3 + LAU via custom ETL for each file.
Both files must have two additional columns where actual NUTS3 and LAU codes are stored |
Added the LAU and NUTS3 fields to the French Air pollution datasets using python script (which was sent to Franck). |
SEP process steps Step 1; Data acquisition Source files are loaded from ftp repository to MySQL local database Common data model (table or file structure) as specified here #17 (comment) Input files are uploaded from ftp source as is. CREATE ALGORITHM=UNDEFINED DEFINER= This query has 3 nested sub queries, one for french data and two for Italian data, each for a single measure type. The resultset has the form of a data cube. Codelists and measures have been normalized and linked to their corresponding metadata CREATE ALGORITHM=UNDEFINED DEFINER= This query integrates files from different countries as one single harmonized view. MySQL is used as data repository for monolith, the tool for mappings. The sparql resultset can be formatted in csv, json and rdf and sent to the subsequent stages of the pipeline. |
Design and implement data workflow.
The text was updated successfully, but these errors were encountered: