<div align="right">Python 2.7 Jupyter Notebook</div>

# Rapid Measurement of the Sustainable Development Goals with Satellite Imagery and Machine-Learning
<br>




## About this notebook

> This notebook describes a methodology for using satellite imagery & household surveys to measure the implementation of selected socio-economic indicators used to monitor the Sustainable Development Goals (SDGs).

The first aim is to compute an indicator related to SDG 1.1 

“By 2030, eradicate extreme poverty for all people everywhere, currently measured as people living on less than $1.25 a day”. 

## Benefits

The benefit of this project is to be able to estimate a socio-economic indicator at locations where household surveys have not been conducted or are outdated.


## Methodology

The project is inspired by and seeks to adapt the methodology outlined in the article "Combining satellite imagery and machine learning to predict poverty" in 2016 by Neal Jean, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, and Stefano Ermon.

See their work at: http://sustain.stanford.edu/predicting-poverty/ 

and some of their code at: https://github.com/nealjean/predicting-poverty


## Expected Outcomes
The outcome of the project is intended to be a documented methodology and open source code to compute selected SDG indicators. 

It is expected that the methodology will continue to be refined and further developed for other indicators in an open and collaborative manner by academia, private sector, governments and the open source community.



## Introduction

We know little about poor countries and even less about the poor people who live in these countries. As a result, econometric models used for testing hypothesis are much more sophisticated than the quality of the data would justify.

We need a more detailed characterization of households in order to target socio-economic policies for achieving the Sustainable Development Goals by 2013 agreed upon by the internatinal community. 

Understanding the status and the progress of households is mostly conducted at wide scales, often national or across large administrative regions. This is insufficient to properly target and monitor policy interventions in a way to maximize their effectiveness. 

#### 1.  Backgroud 


- A common methodology for meassuring the socio-economic situation on the ground is to conduct household surveys. However, hosehold surveys are designed to be representative at large geographic scales. They are also expensive to conduct.

- As a result, measures of poverty and inequality are limited by poor data availability, and by limitations of the survey itself (poor recollection, errors in estimating non-monetary income, reticence at reporting income, among others). 

- The time it takes to collect information also create problems since samples occur across days, weeks or sometimes months but they cannot track seasonal fluctuations in monetary poverty that result, for example, from agricultural cycles or seasonal construction or the inability of poor households to smooth income during a given year.

- At the same time, there are a large number of datasets that track different aspects of importance for development policy. 

- However, matching various data sources requires a lot of careful work on the underlying survey designs, variable definitions, and other aspects.

####  2. Research Questions


Phase 1: Can we estimate the DHS Wealth index using a satellite image for a location where a survey has not been conducted?

Phase 2: What other socio-economic indicators can be estimated using a similar methodology?

Phase 3: Can the above be done with alternative datasources such as mobile data, social network data, unmanned aerial vehicles?
Phase 4: Can we compute indicators frequently (e.g. weekly, monthly, etc.)?

####  3. Hypothesis



Hypothesis 1: For a given location, the DHS Wealth Index is can be associated to features appearing on a digital satellite image for that location.

Hypothesis 2: The model descibing the association can be applied to other locations.

Note: Consider conditions and characteristics at different locations e.g. latidude, longitude, altitude, similar development status, architectural similarities, etc. which could make a model suitable for similar countries, but not applicable to others.

####  4. Testable predictions


To test the predictions made with this methodology:

- We will identify a country or territory where a good hosehold survey is available, and where good satellite imagery is also available.

- We will separate the dataset in 70% training and 30% test set.

- For a given variable e.g. DHS Wealth Index we will use machine learning to find the best model to correlate the survey dta with the satellite image (e.g. based on a vector describing the colors and positions of the pixels in an image).

- The above modelling will be carried out at various test countries.


####  5. Gather data to test predictions
In the cells below, please enter your approach to obtain necesary data.

We suggest these datasets:

- Household survey data:

The Demographic and Health Survey programme:
http://dhsprogram.com/What-We-Do/Survey-Types/DHS.cfm

- Satellite imagery:

National Oceanic and Atmospheric Administration:
http://www.noaa.gov/


Expected challenges regarding the suitability of the satellite imagery include:
- image resolution,
- cloud obstructions,
- lack of images in the areas of interest, etc.





This is the core section, where the modelling takes place.

####  6. Execute the analysis

Your code goes here.

- This is where the magic happens. In this phase, deploy your code to solve the problem. Kindly comment it extensively.



In [7]:
import sys




####  7. Develop general theories

Assuming that your analysis was successful and that you were able to develop a general theory about your solutino.


What changes would you propose in order to benefit from the insights gained?


General Theory (theoretical):

- Enter here your theory



### 8. Submit your notebook

Please make sure that you:
- Perform a final "Save and Checkpoint";
- Download a copy of the notebook in ".ipynb" format to your local machine using "File", "Download as", and "IPython Notebook (.ipynb)"; and
- Create a pull request on Github to let us know of your submission.

> **Note**:

> Thank you!