Skip to content

Small Python utility used to generate historical Air Quality Index datasets scraping https://aqicn.org

License

Notifications You must be signed in to change notification settings

AlFontal/aqi-stations-scraper

Repository files navigation

AQI Stations Scraper

Update datasets python

I made this small Python utility in order to keep an updated record of the historical data for the Air Quality Index for the >180 Japanese Air Monitoring stations, as I needed this data for my PhD research. I am currently scraping from the site aqicn.org, which collects data from over 12,000 air monitoring stations.

Since I couldn't find an API to access the historical data (at the time of writing, you can only fetch current AQI values for any given location through the current API) and I had been wanting to test the web scraping capabilities of the selenium package for a while, I developed a (quite hacky) way of automatically fetching all of the individual csv files with the complete historical data, which can be found in the data/japan-aqi directory.

I wanted to test the CI/CD capabilities of Github Actions too (see the .github/workflows/actions.yml directory for the instructions), so I set up a scheduled trigger to run the workflow every Sunday at 2:00 AM UTC and update the datasets with new data.

Running locally

In case you want to run a local instance, you will need to first clone the repo and generate a .env file in the root directory including the following variables which will then be used when doing the requests to the site:

USER_FULL_NAME = 'Your name'
USER_EMAIL = 'Your email'
USER_ORGANIZATION = 'Your org'

To reproduce the environment you will need to use poetry to install the dependencies, which you can install either by running (recommended):

curl -sSL https://install.python-poetry.org | python3 -

Or if you want to use pipx:

pipx install poetry

You can check the official poetry docs to see the up-to-date installation instructions.

With a working version of poetry running in your system, just run:

poetry install

Which will install the dependencies defined in pyproject.toml. You should now be able to use:

poetry run python japan_aqi.py

To run the scraping script which should download the available files at the time.

About

Small Python utility used to generate historical Air Quality Index datasets scraping https://aqicn.org

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages