Today, 3.6 billion people are living in cities. By 2030, that number is expected to reach 5 billion. That means that by 2030, 60 percent of the world’s population will live in cities.
Considering this forecast and the increasing constitution of Megacities, air quality as become one of the most important indicators of lifequality.
For this reason, we have collected data from thre different cities: Madrid, Berlin and Hong Kong. For each of them we gathered:
- historical pollution data for the following pollutants: CO(mg/m3), NO_2(µg/m3), NOx(µg/m3), O_3(µg/m3), PM10(µg/m3), PM25(µg/m3), SO_2(µg/m3). Max granularity is daily.
- yearly population
- public transport length (km)
-
FOLDERS:
-
Hong Kong Data:
- HK_pollution_data: all .csv's with the historical pollution info of Hong Kong
- HK pollution index.ipynb: notebook were the pollution csv have been merged in single df, cleaned and formated
- Hong Kong Population.ipynb: notebook were HK's population has been webscrapped
-
pollution_madrid: all .csv's with the historical pollution info of Madrid
-
pickles:
- berlin_pollution: exported pkl with cleaned berlin pollution
- berlin_population: exported pkl with cleaned berlin population
- combined_dataframe: pkl with single df that concat merged_pollution and merged_population
- hk_pollution: exported pkl with cleaned hong kong pollution
- hk_population: exported pkl with cleaned hong kong population
- madrid_pollution: exported pkl with cleaned madrid pollution
- madrid_population: exported pkl with cleaned madrid population
- merged_pollution: pkl with single df concat berlin_pollution, madrid_pollution and hk_pollution
- merged_population: pkl with single df concat berlin_population, madrid_population and hk_population
- transport_km: pkl with a single df with cleaned public transport lenghts from madrid, berlin and hong kong
-
-
FILES:
- .gitignore
- DS_Store
- Berlin-Data: notebook that automatically downloads, opens and cleans data about berlin's pollution from guvernamental website
- DataFrame_Pollution_population: notebook that imports merged_pollution, merged_population and transport_km and concats them in a single df
- Live_poll_traff_data: program that by giving a city it gives you live data about the traffic and the air quality
- Merged pollution: notebook that imports berlin_pollution, madrid_pollution and hk_pollution and concats them in a single df
- Merged population: notebook that imports berlin_population, madrid_population and hk_population and concats them in a single df
- Plotting_Pollution: notebook that imports merged pollution and plots the evolution of the pollutants in time for each of the three cities.
- Plotting_all: notebook that imports combined_dataframe and plots some graphs to see correlations
- Plotting_population: notebook that imports combined_dataframe and plots the evolution of population in time
- README.md: you are reading me! ;)
- pollution_mad: notebook were the pollution .csv's have been merged in single df, cleaned and formated
- population_berlin: notebook were Berlin's population has been obtained through API, cleaned and formated in single df
- population_madrid: notebook were Madrid's population has been webscrapped, cleaned and formated in single df
- public_transport_cities: notebook were public transport information for the three cities has been webscrapped, cleaned and formated in single df
The analyses show:
- There has been an increase in population in the three selected cities
- The concentration of the pollutants analyzed is higher in Hong Kong than the in Madrid and Berlin
- For each pollutant in each city, there is seasonality related to the pollutant's emission
Considerations: Please do not take this results as fully representative, due to the complex topic and the limited amount of data we could work with and analyze together.
Presentation to be found here: https://docs.google.com/presentation/d/1uDoccWTmvSmVFLjEgQGNZ63esCxVR9VeySCDRvy5U4w/edit?usp=sharing
- Pandas
- Numpy
- requests
- json
- BeautifulSoup
- re
- matplotlib
-
Honk Kong:
- Pollution: https://cd.epic.epd.gov.hk/EPICDI/air/station/
- Population: https://www.ceicdata.com/en/indicator/hong-kong/population
- Public transport: https://www.gov.hk/en/about/abouthk/factsheets/docs/transport.pdf
-
Berlin:
- Pollution: https://luftdaten.berlin.de/station/overview/active
- Population: https://www.destatis.de/DE/Service/OpenData/api-webservice.html
- Public transport: https://en.wikipedia.org/wiki/Berlin
-
Madrid:
-
European standard for air quality: https://ec.europa.eu/environment/air/quality/standards.htm
-
Live data:
- Pollution: https://waqi.info
- Traffic: https://www.tomtom.com/de_de/
- Coordinates: https://nominatim.openstreetmap.org
- appnope==0.1.0
- asn1crypto==1.2.0
- attrs==19.2.0
- backcall==0.1.0
- beautifulsoup4==4.8.1
- bleach==3.1.0
- certifi==2019.9.11
- cffi==1.13.0
- chardet==3.0.4
- colorama==0.4.1
- cryptography==2.8
- cycler==0.10.0
- decorator==4.4.0
- defusedxml==0.6.0
- entrypoints==0.3
- idna==2.8
- ipykernel==5.1.2
- ipython==7.8.0
- ipython-genutils==0.2.0
- jedi==0.15.1
- Jinja2==2.10.3
- joblib==0.13.2
- jsonschema==3.0.2
- jupyter-client==5.3.3
- jupyter-core==4.5.0
- kiwisolver==1.1.0
- langdetect==1.0.7
- MarkupSafe==1.1.1
- matplotlib==3.1.1
- mistune==0.8.4
- mkl-fft==1.0.14
- mkl-random==1.1.0
- mkl-service==2.3.0
- nbconvert==5.6.0
- nbformat==4.4.0
- notebook==6.0.1
- numpy==1.17.2
- pandas==0.25.1
- pandocfilters==1.4.2
- parso==0.5.1
- pexpect==4.7.0
- pickleshare==0.7.5
- prometheus-client==0.7.1
- prompt-toolkit==2.0.10
- ptyprocess==0.6.0
- pycountry==19.8.18
- pycparser==2.19
- Pygments==2.4.2
- PyMySQL==0.9.3
- pyOpenSSL==19.0.0
- pyparsing==2.4.2
- pyrsistent==0.15.4
- ySocks==1.7.1
- python-dateutil==2.8.0
- pytz==2019.3
- pyzmq==18.1.0
- requests==2.22.0
- scikit-learn==0.21.3
- scipy==1.3.1
- Send2Trash==1.5.0
- six==1.12.0
- soupsieve==1.9.3
- SQLAlchemy==1.3.10
- terminado==0.8.2
- testpath==0.4.2
- tornado==6.0.3
- traitlets==4.3.3
- urllib3==1.24.2
- wcwidth==0.1.7
- webencodings==0.5.1
- xlrd==1.2.0