Skip to content

Data Science project that aims to examine the links between respiratory health and pollution sources.

Notifications You must be signed in to change notification settings

drbulu/healthyAir_DSc_proj

Repository files navigation

title output
Healthy Air: Project README
html_document
toc toc_depth
true
4

Healthy Air: Project README

Introduction

About

This is a data science project to investigate how respiratory health evolves and its relationship with the type and production of various pollutants. Also of interest is the impact of other factors that may define or influence this relationship.

Background summary:

Respiratory ailments such as asthma constitute an important long term public health concern (some background here). Asthma has been linked to a number of factors including particulates a b, extreme weather events and even economic status.

Hypothesis:

The basic hypotheses that guide this project are that:

  • Respirtory health in a given region varies over time and is influenced by the production of certain pollutants.

  • The quantities of these pollutants are in turn linked to particular economic activities

  • The relationship between respiratory health and pollution is affected by other underlying elements such as meterological and demographic factors.

Project Structure

Overview

These hypotheses will be evaluated through judicious analysis of potentially useful open data, from which important insights and relationships can be extracted and utilised. The following general framework represents the different groups of activities that will be used to investigate the hypothesis, communicate the results and leverage insights gained from the analysls:

  1. Raw data aquisition and preparation

  2. Exploratory data analysis

  3. Statistical analysis and modelling

  4. Prediction modelling and machine learning

  5. The development of reports and other data products

The following sections contain links to project documents pertaining to each part of the framework:

1: Data preparation

  1. Asthma Data:

    This dataset captures the prevalence of asthma over time in the US by region, stratified by region (state) a number of potentially interesting groups. This is the quantity (response variable) that we are interested in predicting in the context of other factors.

    • Data preparation strategy overview.
    • Data preparation implemetation overview. Updated based on preliminary data analysis below.
  2. Traffic Data:

    This data measures rural and urban traffic volumes (in millions of vehicle miles) and is also stratified by region.

  3. Pollution Data:

    This data set is a representation of the trends in the emission of seven pollutants by different activities across different states in the US over time.

2: Exploratory data analysis

A. Preliminary

Exploratory analysis using graphs and other visualisation tools is quite exciting and insightful. However, sometimes we need to perform the comparatively boring task of checking the success and completeness of our data.

Therefore, before we get to the exciting task of constructing exploratory visualisations, we need to check how complete the data preparation is thus far. This will enable us to read in our data correctly prior to subsequent analysis, and will help to highlight any quirks to beware of or any further processing that might be required prior to analysis.

Conceivably, the results of this stage of the analysis could be fed back into the data preparation step in order implement further refinements as required.

  1. Asthma Data:
  1. Traffic Data:
  • Analysis in progress...
  1. Pollution Data:
  • Analysis in progress...

B. In depth

3: Statistical analysis and data modelling

4: Predictive modelling and machine learning

5: Data products

About

Data Science project that aims to examine the links between respiratory health and pollution sources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages