Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEP data workflow: Italian Air Pollution datasets #18

Closed
pafrance opened this issue Nov 24, 2021 · 3 comments
Closed

SEP data workflow: Italian Air Pollution datasets #18

pafrance opened this issue Nov 24, 2021 · 3 comments
Labels
sep pilot Required for the Supporting Environment Policies pilot

Comments

@pafrance
Copy link
Collaborator

pafrance commented Nov 24, 2021

Data extraction
Step1: data source website
Step2: Select DATA panel. Data are organized in a set of tables
Step3: Scroll to the requested table, named “Tabella 1 – PM10. Stazioni di monitoraggio: dati e parametri statistici per la valutazione della qualità dell'aria (2019)”
Step4: Download link available on the left bottom at the end of the table . Downloaded data are in xls format
The downloaded file is not compliant with the required Data Structure.

Data transformation
The downloaded file has the following Data Structure:
“Regione”,”Provincia”,”Comune”,”Nome della stazione Tipo di zona”,”Tipo di stazione”,”Giorni di superamento di 50 µg/m3”,”Valore medio annuo³ [µg/m³]”,”Rendimento [%]”,”Rispetta copertura minima”,”sufficiente distribuzione temporale nell'anno”,”numero_dati_validi”,”TIPO DI DATI 4”,”Codice zona”,”Nome zona”

  1. Data need to be filtered in order to be compliant to the requested Data Structure,
  2. NUTS3 variable has been added through a transformation from municipality_id Variable, using data from ISTAT LAU archive
  3. Provided metadata for NUTS3 transformation need to be downloaded and merged.
    Metadata are referenced in a time series and Variable regarding year 2019 has been used in the script.
  4. Metadata regarding pollutant type, data reference time and aggregation type have been added in the datafile.

Data Load
The transformed file has been uploaded into INTERSTAT GraphDB repository sep-test
GraphDB allows direct link to the resources by a permalink, but the raw data needs a little reworking to be accessed directly.

Further data files available
Same procedure can be used to import other data from Data Source Website
AMBIENT AIR QUALITY: NITROGEN DIOXIDE NO2
AMBIENT AIR QUALITY: TROPOSPHERIC OZONE O3
AMBIENT AIR QUALITY: PARTICULATE PM2.5
These files have not been uploaded to GraphDB repository yet

Transformation script in R language
processing_ETL_AIR.R.txt

@FranckCo FranckCo added the sep pilot Required for the Supporting Environment Policies pilot label Nov 25, 2021
@FranckCo FranckCo changed the title SEP data workflow: ITALIAN Air Pollution datasets SEP data workflow: Italian Air Pollution datasets Nov 25, 2021
@francescadag
Copy link
Contributor

francescadag commented Dec 13, 2021

The Italian Air Pollution dataset concerning PM 10 is available on the FTP area of the project.
The AMBIENT AIR QUALITY: NITROGEN DIOXIDE NO2 is available on the FTP area of the project.
The AMBIENT AIR QUALITY: TROPOSPHERIC OZONE O3 is available on the FTP area of the project.
The AMBIENT AIR QUALITY: PARTICULATE PM2.5 is available on the FTP area of the project.
All files are available in tabular format.

@pafrance
Copy link
Collaborator Author

French Data are available from European Environmental Agency.
We downloaded a sample of data (PM10) now available on the ftp area of the project.
Please, we need a confirmation wether we got the right datasets or not.

The source data model is consistent with respect to both the Italian and French data except for the data conversion of geo spatial coordinates to administrative units.
But the target data model used for context broker ingestion is not compatible, as it is used for smart data models.
We would like to explore this issue separately (Issue 21)
I would like to leave this issue for data preparation and integration, because dataset can be integrated as they are, but if we consider smart model, there's a need to extract and convert data to the target model, adding a step in the process.

@francescadag
Copy link
Contributor

The French dataset about the PM10 mentioned in the previous post and which was uploaded to the FTP server in its initial version contains the geographic coordinates; it has been enriched with the Municipality value through a script in java using the specific service/API.
The file obtained is available on the FTP area of the project.
After receiving the confirmation that the extrapolated French dataset is correct, in the same way we will extrapolate also the datasets of the other pollutants and I will also add to them the field that refers to the Municipality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sep pilot Required for the Supporting Environment Policies pilot
Projects
None yet
Development

No branches or pull requests

3 participants