Lead Alert public data repository
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
notebooks
plots
presentation
techreport
.gitignore
README.md

README.md

leadalert

http://leadalert.io/

Lead Alert public data repository

Created and Maintained by:

Background

Lead has been known to be harmful to humans even in small doses for over 50 years. Exposure to even low levels of lead can result in damage to the central and peripheral nervous system, learning disabilities, impaired function of blood cells, stunted growth, cardiovascular effects, and many other problems. Despite regulations restricting lead in building materials, children are still experiencing lead poisoning at alarming rates. While the Flint water crisis has brought renewed attention to this issue, investigations have shown many communities across the country with rates of lead poisoning in children exceeding that of Flint. The persistence of this major public health problem and the creation of new relevant datasets create an opportunity to apply new thinking and techniques to solve it.

The creation of new datasets give us a chance to redefine the prioritization method of infrastructure improvements. Prior studies have largely focused on an individual city - Flint, Michigan. In this paper, we attempt to using machine learning to predict areas of California where communities are at highest risk of lead exposure.

This project was undertaken as part of W210: Synthetic Capstone within the Master of Information and Data Science program at the UC Berkeley School of Information. The project was conceived as a product delivered to water system managers throughout California via a website, LeadAlert.io. The website contains additional results and descriptions of the work, as well as visualizations of the data used. The public Github repository, https://github.com/RobMulla/leadalert, contains the collection of aggregated data sets, source code, modeling artifacts, and associated reference material

Repo Structure

  • /data: Contains data sources in tsv format used in during the analysis.
  • /notebooks: Example code of our models including XGBoost and SMOTE oversampling.
  • /plots: Graph visualizations from the exploratory data analysis.
  • /presentation: Our final presentation.
  • /techreport: PDF file of our final technical report.