Skip to content

IRONHACK Project Nr.1 work related on the public transportation and pollution

Notifications You must be signed in to change notification settings

andres2203/pollution_correlation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pollution Correlation with public transportation and growing population

IRONHACK Project work related on the public transportation

1. Description

Today, 3.6 billion people are living in cities. By 2030, that number is expected to reach 5 billion. That means that by 2030, 60 percent of the world’s population will live in cities.

Considering this forecast and the increasing constitution of Megacities, air quality as become one of the most important indicators of lifequality.

For this reason, we have collected data from thre different cities: Madrid, Berlin and Hong Kong. For each of them we gathered:

  • historical pollution data for the following pollutants: CO(mg/m3), NO_2(µg/m3), NOx(µg/m3), O_3(µg/m3), PM10(µg/m3), PM25(µg/m3), SO_2(µg/m3). Max granularity is daily.
  • yearly population
  • public transport length (km)

2. Folders and Files

  • FOLDERS:

    • Hong Kong Data:

      • HK_pollution_data: all .csv's with the historical pollution info of Hong Kong
      • HK pollution index.ipynb: notebook were the pollution csv have been merged in single df, cleaned and formated
      • Hong Kong Population.ipynb: notebook were HK's population has been webscrapped
    • pollution_madrid: all .csv's with the historical pollution info of Madrid

    • pickles:

      • berlin_pollution: exported pkl with cleaned berlin pollution
      • berlin_population: exported pkl with cleaned berlin population
      • combined_dataframe: pkl with single df that concat merged_pollution and merged_population
      • hk_pollution: exported pkl with cleaned hong kong pollution
      • hk_population: exported pkl with cleaned hong kong population
      • madrid_pollution: exported pkl with cleaned madrid pollution
      • madrid_population: exported pkl with cleaned madrid population
      • merged_pollution: pkl with single df concat berlin_pollution, madrid_pollution and hk_pollution
      • merged_population: pkl with single df concat berlin_population, madrid_population and hk_population
      • transport_km: pkl with a single df with cleaned public transport lenghts from madrid, berlin and hong kong
  • FILES:

    • .gitignore
    • DS_Store
    • Berlin-Data: notebook that automatically downloads, opens and cleans data about berlin's pollution from guvernamental website
    • DataFrame_Pollution_population: notebook that imports merged_pollution, merged_population and transport_km and concats them in a single df
    • Live_poll_traff_data: program that by giving a city it gives you live data about the traffic and the air quality
    • Merged pollution: notebook that imports berlin_pollution, madrid_pollution and hk_pollution and concats them in a single df
    • Merged population: notebook that imports berlin_population, madrid_population and hk_population and concats them in a single df
    • Plotting_Pollution: notebook that imports merged pollution and plots the evolution of the pollutants in time for each of the three cities.
    • Plotting_all: notebook that imports combined_dataframe and plots some graphs to see correlations
    • Plotting_population: notebook that imports combined_dataframe and plots the evolution of population in time
    • README.md: you are reading me! ;)
    • pollution_mad: notebook were the pollution .csv's have been merged in single df, cleaned and formated
    • population_berlin: notebook were Berlin's population has been obtained through API, cleaned and formated in single df
    • population_madrid: notebook were Madrid's population has been webscrapped, cleaned and formated in single df
    • public_transport_cities: notebook were public transport information for the three cities has been webscrapped, cleaned and formated in single df

3. Results

The analyses show:

  • There has been an increase in population in the three selected cities
  • The concentration of the pollutants analyzed is higher in Hong Kong than the in Madrid and Berlin
  • For each pollutant in each city, there is seasonality related to the pollutant's emission

Considerations: Please do not take this results as fully representative, due to the complex topic and the limited amount of data we could work with and analyze together.

Presentation to be found here: https://docs.google.com/presentation/d/1uDoccWTmvSmVFLjEgQGNZ63esCxVR9VeySCDRvy5U4w/edit?usp=sharing

4. Python libraries used:

  • Pandas
  • Numpy
  • requests
  • json
  • BeautifulSoup
  • re
  • matplotlib

5. Sources for the information collected:

6. Environment dependancies:

  • appnope==0.1.0
  • asn1crypto==1.2.0
  • attrs==19.2.0
  • backcall==0.1.0
  • beautifulsoup4==4.8.1
  • bleach==3.1.0
  • certifi==2019.9.11
  • cffi==1.13.0
  • chardet==3.0.4
  • colorama==0.4.1
  • cryptography==2.8
  • cycler==0.10.0
  • decorator==4.4.0
  • defusedxml==0.6.0
  • entrypoints==0.3
  • idna==2.8
  • ipykernel==5.1.2
  • ipython==7.8.0
  • ipython-genutils==0.2.0
  • jedi==0.15.1
  • Jinja2==2.10.3
  • joblib==0.13.2
  • jsonschema==3.0.2
  • jupyter-client==5.3.3
  • jupyter-core==4.5.0
  • kiwisolver==1.1.0
  • langdetect==1.0.7
  • MarkupSafe==1.1.1
  • matplotlib==3.1.1
  • mistune==0.8.4
  • mkl-fft==1.0.14
  • mkl-random==1.1.0
  • mkl-service==2.3.0
  • nbconvert==5.6.0
  • nbformat==4.4.0
  • notebook==6.0.1
  • numpy==1.17.2
  • pandas==0.25.1
  • pandocfilters==1.4.2
  • parso==0.5.1
  • pexpect==4.7.0
  • pickleshare==0.7.5
  • prometheus-client==0.7.1
  • prompt-toolkit==2.0.10
  • ptyprocess==0.6.0
  • pycountry==19.8.18
  • pycparser==2.19
  • Pygments==2.4.2
  • PyMySQL==0.9.3
  • pyOpenSSL==19.0.0
  • pyparsing==2.4.2
  • pyrsistent==0.15.4
  • ySocks==1.7.1
  • python-dateutil==2.8.0
  • pytz==2019.3
  • pyzmq==18.1.0
  • requests==2.22.0
  • scikit-learn==0.21.3
  • scipy==1.3.1
  • Send2Trash==1.5.0
  • six==1.12.0
  • soupsieve==1.9.3
  • SQLAlchemy==1.3.10
  • terminado==0.8.2
  • testpath==0.4.2
  • tornado==6.0.3
  • traitlets==4.3.3
  • urllib3==1.24.2
  • wcwidth==0.1.7
  • webencodings==0.5.1
  • xlrd==1.2.0

About

IRONHACK Project Nr.1 work related on the public transportation and pollution

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •