Detecting Mines in the Democratic Republic of Congo via Satellite Imagery
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
duckworthd
features_api
jc.fan
katie
krishna
model
storage_api
visualization_api
.gitignore
Final Report | Detecting Mines via Satellite Imagery.pdf
README.md

README.md

Team Repository for DataKind's Mining Detection in the DRC

Shared code for Mining Detection. Follow up with caitlin@datakind.org with questions for the project team / about the project materials.

Structure

This repo contains one folder per user and subproject. See README.md files in each directory for details and points of contact.

Workflow

  1. Install packages.

Follow instructions in features_api, storage_api, and visualization_api. These libraries provide the core tools for downloading satellite imagery, serializing it to disk, and visualizing it.

  1. Download training data.

The following command downloads satellite imagery and rasterized "masks" indicating where mines are located. Results are stored in /tmp/ipis.

$ python -m bin.download_ipis --base_dir="/tmp/ipis"
  1. Process data.

This script ingests a folder of satellite images and masks which are stored in dataset api then features for the model, removes clouded datapoints, and saves the resultant vectors in a .npz (gzipped numpy) array.

$ python ./model/export_data.py --data_export_path /path/to/processed/data --data_input_path /path/to/downloaded/training/data
  1. Train model.

Using the .npz files generated in the previous step, export_model.py trains a random forest model and serializes it to disk. It takes data arguments, one for the actual training data, and a second which points to a test dataset (.npz format) to report model performance. The model is trained to predict if a mine lies under a pixel or not.

$ python ./model/export_model.py --train_data_path /path/to/processed/data.npz --test_data_path /path/to/processed/test/data.npz --export_model_path /path/to/dir
  1. Make predictions. Store to disk.

The following command uses a stored model to make predictions.

python ./model/inference.py --data_path /path/to/data --model_path /path/to/model

The --data_path option is a dataset in the Dataset API format. This command makes predictions on each image and saves the results to the same dataset. This adds two new source ids to the dataset 'landsat_inference_timeagg' and 'landsat_inference'. 'landsat_inference_timeagg' aggregates the predictions for the same pixel location in time by taking the majority prediction across time. The 'landsat_inference' source has predictions of the same location at different times stored as an extra dimension in the image array. This means that the image in the landsat_inference_timeagg source has shape [n_x, n_y] while the image in the landsat_inference has shape [n_time, n_x, n_y].

  1. Visualize predictions.

Visualizations are presented in Jupyter notebook. Open the notebook with the following command,

$ jupyter notebook "visualization_api/notebooks/Visualization API Demo.ipynb"

Follow the TODO statements in the notebook to apply visualize predictions made in the previous step. By default, predictions generated via model/inference.py are stored in the 'landsat8_inference_timeagg' source id of the dataset.