Air Quality Data Management System (AQDMS)

This repository provides a modular Data Management System (DMS) for air quality sensors. It ingests data from multiple sensor manufacturers via their APIs, standardizes and aggregates the data, applies calibration models for correction, performs quality checks, and stores the processed data in a database through an automated pipeline. It also includes Apache Superset for data visualization, allowing users to create dashboard by connecting to sample database.

Prerequisite

Make sure you have the following installed:

Git
Docker For Windows: Install Docker Desktop or Docker Engine inside WSL2 For Linux: Install Docker Engine

Setup

Clone the repository

git clone https://github.com/CSTEPBLR/AQDMS.git

Configure settings An .env.example file is included for reference. Replicate it as .env file and update values as needed.

cp .env.example .env

Append the following to .env to get appropriate permissions to make changes to Airflow DAGs.

echo "AIRFLOW_UID=$(id -u)" >> .env

Start docker to get airflow, postgres, pgadmin and superset up and running

docker compose up -d

Access services using following links and enter your credentials as updated in .env:

Airflow UI: http://localhost:8080 
Postgres UI: http://127.0.0.1:5050/login
Superset UI: http://localhost:8088

Project structure

.
├── README.md
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── .env.example
├── dags
|    ├── sample_dag.py
|    └── api_dag.py
├── data
|    ├── sample_raw_data
|    ├── sample_calibration_model
|    ├── init-db.sql
│    └── load_sample_metadata.sql
└── src
    ├── common
    |    ├── api_client
    |    ├── config
    |    └── db
    ├── ingestion
    |    └── manufacturer
    ├── processing
    |    ├── staging
    |    |    └── manufacturer
    |    ├── standardize
    |    ├── aggregate
    |    └── calibrate
         └── calibrate_aggregated_data
    ├── quality_check
    └── visualization

Pipeline Overview

1. ingestion      → fetch and store raw manufacturer data  (through API/ sample json)
2. processing     → process raw data in 4 steps: stage → standardize → aggregate → calibrate  
3. quality_check  → apply checks on calibrated data  
4. visualization  → visualize quality checked data

Configuration

Shared configuration helpers can be found in src/common/config folder. DB related configurations can be found in src/common/db folder. API keys, DB credentials and docker service credentials are externalized via .env

Manufacturer-specific logic is available here:

- ingestion/manufacturer/
- processing/stage/manufacturer/

If new manufacturers need to be added, update required configs and API clients.

Using Data pipeline:

This repo supports 2 options -

Running the pipeline with sample data (case when API credentials of a manufacturer is not available).
Running full pipeline with valid API credentials.

There are 2 DAGs available for this scenario. A dag called sample_dag is created for ingesting preloaded sample data of raw sensor data from 2 manufacturers - AQMS and Sensit Ramp. This is for demonstration and dashboarding purpose only. Production data is intentionally excluded to keep repo lightweight.

sample_dag uses USE_SAMPLE_DATA=true flag that skips external API calls and loads pre-generated sample raw data from local JSON files to traverse through all steps of the pipeline. Enable dag on http://localhost:8080 to get started.

A second dag called api_dag can ingest data directly from API and complete all stages of the pipeline. For api_dag, airflow configurations need to be added. Please refer to README.md in dags folder.

Database Initialization

Database schema and required metadata tables are predefined in data/init-db.sql. It inserts sample metadata required to run data pipeline through data/load_sample_metadata.sql.

For more info refer to README in data folder.

Additional configuration:

If using pgAdmin UI, register for a database using credentials from .env.
If using superset dashboard,
- generate a superset secret_key in .env file using openssl rand -base64 42 in terminal
- connect to postgres database using SQLAlchemy URI option. postgresql+psycopg2://postgres_user:postgres_pwd@postgres_host:5432/postgres_db

NOTE

Users are advised to develop their own machine learning (ML) calibration models for each sensor used in low-cost sensor devices from individual manufacturers. The ML calibration models provided is for reference purposes only and must not be used as a standard or production calibration models in any application. Open source code users are responsible for generating and placing the calibration ML models in the correct path.

Known issues:

Airflow UI may automatically forward the port to a random local port. Make sure to check alternate url (depends on forwarded localhost, example: http://localhost:49677/).
Airflow will give import error for api_dag. Add db_config to load api_dag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Air Quality Data Management System (AQDMS)

Prerequisite

Setup

Project structure

Pipeline Overview

Configuration

Using Data pipeline:

Database Initialization

Additional configuration:

NOTE

Known issues:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
dags		dags
data		data
docker		docker
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

License

CSTEPBLR/AQDMS

Folders and files

Latest commit

History

Repository files navigation

Air Quality Data Management System (AQDMS)

Prerequisite

Setup

Project structure

Pipeline Overview

Configuration

Using Data pipeline:

Database Initialization

Additional configuration:

NOTE

Known issues:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages