Google Summer of Code 2018

Danny Wilson edited this page Jan 31, 2018 · 19 revisions

About SUMSarizer

SUMSarizer makes it easier to measure cookstove adoption. About 40% of the global population relies on burning traditional fires to cook their food. The inhaled smoke from these fires will kill about 3-4 million people in 2018 -- more than AIDS, malaria, and tuberculosis combined. That makes traditional cooking one of today's greatest public health crises. Improved cookstoves, which burn traditional fuels like wood and charcoal but reduce smoke emissions, have been promoted by many governments and organizations as a tool to help alleviate this crisis.

The governments and organizations that promote these cookstoves want to make sure that improved cookstoves are highly adopted. To measure adoption, researchers deploy stove use monitoring systems (SUMS), which are sensors that record long-term data about the temperature of the cookstoves. The data from these sensors is time-series of temperatures and timestamps that can span weeks or years.

Stakeholders want to reduce the millions or billions of temperature readings into individual cooking events. Spikes in temperature relate to cooking, but often in complex ways. Many papers have been published trying to define explicit algorithmic definitions of cooking, but there are always caveats, so these explicit algorithms almost always fall short of accurately defining cooking events. Take, for example, the chart below which shows cooking (red) and not cooking (black). Defining an explicit algorithm to reliably predict cooking events across a wide variety of stove designs and cooking techniques is very difficult. For example, in some cultural circumstances, a monotonic reduction in temperature over the 30-minute period signifies that the stove is not longer being used, but the same data in another culture may actually represent a cooking mode (e.g. very low-power simmering).

Cooking labels

SUMSarizer is a tool to simplify analyzing SUMS data from raw temperatures into cooking events. SUMSarizer comprises a GUI and Super Learner-powered analysis to enable coding naive users to apply supervised ensemble machine learning to the cookstove event detection problem. User "brush" a small subset of the time series data to label what they believe to be cooking, then SUMSarizer uses those labels as a training set to label the remainder of the dataset. This system is useful for generating powerful context-specific models for identifying cooking events across a variety of cultural and sensor idiosyncrasies.

Contacting SUMSarizer

You can contact SUMSarizer's team through our forum here. Also, for more general questions, about DIAL's Google Summer of Code program, ask your questions in this general forum. The best way to contact the SUMSarizer team is via email at The SUMSarizer mentors are located in the Pacific Standard Time timezone.

Getting Started as an Applicant

If you are interested in working on SUMSarizer, please contact us. To get started developing SUMSarizer, clone our develop repository. The SUMSarizer uses the Flask framework, but if you're more interested in improving the machine learning, we use R for that.

Writing your GSoC application

Please use the GSoC website to submit your application to the DIAL Open Source Center, and remember to reference SUMSarizer in the title of your application!

2018 Project Ideas

Here are some ideas for the 2018 GSoC.

Import arbitrary time series data into SUMSarizer

Skills: Flash, Python, Postgres

Difficulty: easy

Contact: Danny Wilson

Description: currently SUMSarizer is only setup to import data from a particular and very-common model of SUMS (the Maxim iButton). We would like to improve the data importing interface to ingest data from other common SUMS models and arbitrary time series data. This involves building a library of how to parse files (usually CSVs) based on SUMS make and model, then importing those data into the SUMSarizer database.

Output summary files and figures of cooking events in addition to point-wise predictions

Skills: Flash, Python, Postgres, D3, R

Difficulty: moderate

Contact: Danny Wilson

Description: SUMSarizer outputs the probability that any particular temperature sample represents cooking. However, many users of SUMSarizer want summary tables that show the cookstove, start time, and duration of all runs of contiguous positive classifications ("events"). If you work on this project, you will work on making SUMSarizer's outputs more useful by creating summary tables and plots that make SUMSarizer's outputs more useful to a stakeholder that is non-technical.

Implement Super Learner 3 in SUMSarizer

Skills: R, Python, Postgres

Difficulty: difficult

Contact: Jeremy Coyle

Description: SUMSarizer uses an older implementation of Super Learner that does not accommodate multi-stage learning very well. For example, Stage 1 learning establishes the probability that any individual temperature observation is cooking without regard to the status of its neighbors, then Stage 2 aggregates islands of observations into the probability of a multi-sample event using information from Stage 1. Super Learner 3 implements this kind of multi-stage learning explicitly.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.