title | output | ||||||
---|---|---|---|---|---|---|---|
Healthy Air: Project README |
|
This is a data science project to investigate how respiratory health evolves and its relationship with the type and production of various pollutants. Also of interest is the impact of other factors that may define or influence this relationship.
Respiratory ailments such as asthma constitute an important long term public health concern (some background here). Asthma has been linked to a number of factors including particulates a b, extreme weather events and even economic status.
The basic hypotheses that guide this project are that:
-
Respirtory health in a given region varies over time and is influenced by the production of certain pollutants.
-
The quantities of these pollutants are in turn linked to particular economic activities
-
The relationship between respiratory health and pollution is affected by other underlying elements such as meterological and demographic factors.
These hypotheses will be evaluated through judicious analysis of potentially useful open data, from which important insights and relationships can be extracted and utilised. The following general framework represents the different groups of activities that will be used to investigate the hypothesis, communicate the results and leverage insights gained from the analysls:
-
Raw data aquisition and preparation
-
Exploratory data analysis
-
Statistical analysis and modelling
-
Prediction modelling and machine learning
-
The development of reports and other data products
The following sections contain links to project documents pertaining to each part of the framework:
-
Asthma Data:
This dataset captures the prevalence of asthma over time in the US by region, stratified by region (state) a number of potentially interesting groups. This is the quantity (response variable) that we are interested in predicting in the context of other factors.
- Data preparation strategy overview.
- Data preparation implemetation overview. Updated based on preliminary data analysis below.
-
Traffic Data:
This data measures rural and urban traffic volumes (in millions of vehicle miles) and is also stratified by region.
- Data preparation strategy and implemetation overview.
-
Pollution Data:
This data set is a representation of the trends in the emission of seven pollutants by different activities across different states in the US over time.
- Data preparation strategy and implemetation overview.
Exploratory analysis using graphs and other visualisation tools is quite exciting and insightful. However, sometimes we need to perform the comparatively boring task of checking the success and completeness of our data.
Therefore, before we get to the exciting task of constructing exploratory visualisations, we need to check how complete the data preparation is thus far. This will enable us to read in our data correctly prior to subsequent analysis, and will help to highlight any quirks to beware of or any further processing that might be required prior to analysis.
Conceivably, the results of this stage of the analysis could be fed back into the data preparation step in order implement further refinements as required.
- Asthma Data:
- Preliminary data analysis 01: First look at processed asthma data sets.
- Preliminary data analysis 02: Analysis of the impact of data processing improvements.
- Graphical data exploration: Initial examination of broad trends in aggregate data.
- Traffic Data:
- Analysis in progress...
- Pollution Data:
- Analysis in progress...