Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Team Repository for DataKind's Mining Detection in the DRC

Shared code for Mining Detection. Follow up with with questions for the project team / about the project materials.


This repo contains one folder per user and subproject. See files in each directory for details and points of contact.


  1. Install packages.

Follow instructions in features_api, storage_api, and visualization_api. These libraries provide the core tools for downloading satellite imagery, serializing it to disk, and visualizing it.

  1. Download training data.

The following command downloads satellite imagery and rasterized "masks" indicating where mines are located. Results are stored in /tmp/ipis.

$ python -m bin.download_ipis --base_dir="/tmp/ipis"
  1. Process data.

This script ingests a folder of satellite images and masks which are stored in dataset api then features for the model, removes clouded datapoints, and saves the resultant vectors in a .npz (gzipped numpy) array.

$ python ./model/ --data_export_path /path/to/processed/data --data_input_path /path/to/downloaded/training/data
  1. Train model.

Using the .npz files generated in the previous step, trains a random forest model and serializes it to disk. It takes data arguments, one for the actual training data, and a second which points to a test dataset (.npz format) to report model performance. The model is trained to predict if a mine lies under a pixel or not.

$ python ./model/ --train_data_path /path/to/processed/data.npz --test_data_path /path/to/processed/test/data.npz --export_model_path /path/to/dir
  1. Make predictions. Store to disk.

The following command uses a stored model to make predictions.

python ./model/ --data_path /path/to/data --model_path /path/to/model

The --data_path option is a dataset in the Dataset API format. This command makes predictions on each image and saves the results to the same dataset. This adds two new source ids to the dataset 'landsat_inference_timeagg' and 'landsat_inference'. 'landsat_inference_timeagg' aggregates the predictions for the same pixel location in time by taking the majority prediction across time. The 'landsat_inference' source has predictions of the same location at different times stored as an extra dimension in the image array. This means that the image in the landsat_inference_timeagg source has shape [n_x, n_y] while the image in the landsat_inference has shape [n_time, n_x, n_y].

  1. Visualize predictions.

Visualizations are presented in Jupyter notebook. Open the notebook with the following command,

$ jupyter notebook "visualization_api/notebooks/Visualization API Demo.ipynb"

Follow the TODO statements in the notebook to apply visualize predictions made in the previous step. By default, predictions generated via model/ are stored in the 'landsat8_inference_timeagg' source id of the dataset.


Detecting Mines in the Democratic Republic of Congo via Satellite Imagery



No releases published


No packages published