Please Fork and Star our work by visiting our GitHub Repository before using or downloading our dataset
1. Forking our repository allows you to create your own copy of our repository, which you can modify and use as you wish.
2. Starring our repository is a way for people to show their support and appreciation for our work.
https://github.com/ICCC-Platform/Air-Pollution-Image-Dataset-From-India-and-Nepal
Introduction: This dataset contains images of Air Pollution for different cities in India and Nepal. The dataset is divided into two folders: Combined_Dataset and Country_wise_Dataset.
Total number of image dataset: 12,240
Image Size: 224*224
Air Quality Index (AQI) Class and its defination used in the dataset.
There are a total of six classes of Air Pollution, which we represent in our dataset as follows:
-
Good (0-50): Air quality is considered satisfactory and air pollution poses little or no risk.
-
Moderate (51-100): Air quality is acceptable; however, for some pollutants, there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.
-
Unhealthy for Sensitive Groups (101-150): Members of sensitive groups may experience health effects, but the general public is unlikely to be affected.
-
Unhealthy (151-200): Some members of the general public may experience health effects; members of sensitive groups may experience more serious health effects.
-
Very Unhealthy (201-300): Health alert: The risk of health effects is increased for everyone.
-
Hazardous (301-500): Health warning of emergency conditions: Everyone is more likely to be affected.
Reference:
https://airtw.epa.gov.tw/ENG/Information/Standard/AirQualityIndicator.aspx
Cities of India
- ITO, Delhi
- Dimapur, Nagaland
- Spice Garden, Bengaluru
- Knowledge Park III, Greater Noida
- New Ind Town, Faridabad
- Borivali East, Mumbai
- Oragadam, Tamil Nadu
City of Nepal
- Biratnagar
Combined dataset:
The combined dataset folder contains two subfolders.
- All_img: This subfolder contains all the collected images from all AQI classes.
- IND_and_NEP: This subfolder contains six different subfolders representing six different classes of AQI.
The csv file in this folder contains all the data and its parameters. It is labeled as
Location, Filename, Year, Month, Day, Hour, AQI, PM2.5, PM10, O3, CO, SO2, NO2, and AQI_Class
Country_wise_Dataset:
This folder contains two subfolders representing the countries from which the dataset was collected.
- India: This subfolder contains the subfolder representing the names of all cities from where data were collected. Each subfolder of cities contains folders representing the data collected for each respective AQI class, as well as a csv file. which contains the details of each image, like we mentioned above. Such as,
Location, Filename, Year, Month, Day, Hour, AQI, PM2.5, PM10, O3, CO, SO2, NO2, and AQI_Class
- Nepal: We managed to collect the image dataset from Nepal. This subfolder contains the subfolder representing the name of the city from where data were collected. This subfolder of the city contains folders representing the data collected for each AQI class and also a csv file. which contains the details of each image, like we mentioned above. Such as,
Location, Filename, Year, Month, Day, Hour, AQI, PM2.5, PM10, O3, CO, SO2, NO2, and AQI_Class
////////////////////////////////////////////////////////////////////////////////
Dataset Collection Process:
1. Visit the site: The first step in collecting the air pollution data was to personally visit the site. This involved physically going to the location and capturing images and videos of the area.
2. Note current parameters: While visiting the site, various parameters related to air pollution were noted. These included measurements of PM2.5, PM10, NO2, SO2, CO, etc. These parameters were noted by referring to publicly available data sources such as the Central Pollution Control Board (CPCB) website. For India we used https://app.cpcbccr.com/AQI_India/ and for Nepal we used: https://www.tomorrow.io/weather/NP/4/Biratnagar/079711/hourly/
3. Preprocess images: Once the images and videos were captured, they were preprocessed to remove any images that were blurry, overexposed, or had other quality issues. Only the images that met the desired quality criteria were selected for further analysis.
4. Extract frames from videos: In addition to the images, videos were also captured at the site. These videos were processed to extract frames that were suitable for further analysis. Frames that were too blurry or otherwise of low quality were discarded.
5. Log data: Finally, all the data collected during the site visit, including the images, videos, and air pollution parameters, were logged in a structured format.
//////////////////////////////////////////////////////////////////////////////
Instructions on how to use the AQI image dataset:
- Download the dataset from Kaggle and extract the zip file to a folder of your choice. Please Visit this link to download the Dataset: https://doi.org/10.34740/KAGGLE/DS/3152196
https://www.kaggle.com/datasets/adarshrouniyar/air-pollution-image-dataset-from-india-and-nepal
-
The dataset is divided into two folders: the Combined_Dataset and Country_wise_Dataset. Each folder contains subfolders and CSV files.
-
To access the images in the Combined_Dataset folder, go to the folder corresponding to the class of AQI you are interested in. For example, if you are interested in the 'Unhealthy' class, go to the 'Unhealthy' folder. Inside this folder, You will find a number of images representing different cities.
-
To access the data in the Country_wise_Dataset folder, go to the folder of the country you are interested in, either India or Nepal. Inside each country folder, you will find subfolders representing different cities. Each city folder contains a CSV file that lists the AQI values and other parameters for the city.
-
You can use this dataset to train machine learning models to predict AQI for different cities. You can also use it for research on air pollution in different cities. For reference to use this dataset you can visit this link:
https://www.kaggle.com/code/momo88/vgg16-translearning-for-image-based-aqi-estimation
- If you use this dataset for any purpose, please cite it as the source of the data in any publications or presentations, resulting from the use of this dataset.
Citation Request: You can cite our dataset as follows
APA:
Utomo, S.; Rouniyar, A.; Hsu, H.-C.; Hsiung, P.-A. Federated Adversarial Training Strategies for Achieving Privacy and Security in Sustainable Smart City Applications. Future Internet 2023, 15, 371. https://doi.org/10.3390/fi15110371
Sapdo Utomo, Adarsh Rouniyar, Guo Hao Jiang, Chun Hao Chang, Kai Chun Tang, Hsiu-Chun Hsu, and Pao-Ann Hsiung. 2023. Eff-AQI: An Efficient CNN-Based Model for Air Pollution Estimation: A Study Case in India. In Proceedings of the 2023 ACM Conference on Information Technology for Social Good (GoodIT '23). Association for Computing Machinery, New York, NY, USA, 165–172. https://doi.org/10.1145/3582515.3609531
Adarsh Rouniyar, Sapdo Utomo, John A, & Pao-Ann Hsiung. (2023). Air Pollution Image Dataset from India and Nepal [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/3152196
Bibtex:
@article{utomo2023federated, title={Federated Adversarial Training Strategies for Achieving Privacy and Security in Sustainable Smart City Applications}, author={Utomo, Sapdo and Rouniyar, Adarsh and Hsu, Hsiu-Chun and Hsiung, Pao-Ann}, journal={Future Internet}, volume={15}, number={11}, pages={371}, year={2023}, publisher={MDPI} }
@inproceedings{utomo2023effaqi, author = {Utomo, Sapdo and Rouniyar, Adarsh and Jiang, Guo Hao and Chang, Chun Hao and Tang, Kai Chun and Hsu, Hsiu-Chun and Hsiung, Pao-Ann}, title = {Eff-AQI: An Efficient CNN-Based Model for Air Pollution Estimation: A Study Case in India}, year = {2023}, isbn = {9798400701160}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3582515.3609531}, doi = {10.1145/3582515.3609531}, booktitle = {Proceedings of the 2023 ACM Conference on Information Technology for Social Good}, pages = {165–172}, numpages = {8}, keywords = {efficient model, image-based AQI estimation, novel dataset, air pollution estimation, air pollution in India}, location = {Lisbon, Portugal}, series = {GoodIT '23} }
@misc{rouniyar2023air, title={Air Pollution Image Dataset from India and Nepal}, url={https://www.kaggle.com/ds/3152196}, DOI={10.34740/KAGGLE/DS/3152196}, publisher={Kaggle}, author={Adarsh Rouniyar and Sapdo Utomo and John A and Pao-Ann Hsiung}, year={2023} }
///////////////////////////////////////////////////////////////////////////
Collected Image Data Distribution for Each AQI Class
///////////////////////////////////////////////////////////////////////////
IMPORTANT!!! It is Instructed to Read our License file before using our dataset.
///////////////////////////////////////////////////////////////////////////
Contributors
- Adarsh Rouniyar
- Sapdo Utomo
- Dr. John A.
- Dr. Pao-Ann Hsiung
If you have any queries, please do contact us.
- Adarsh Rouniyar
Email: adarsh@csie.io
- Dr. John A.
Email: johnmtech@gmail.com
- Dr. Pao-Ann Hsiung
Email: pahsiung@gmail.com , pahsiung@ccu.edu.tw