Team Repository for DataKind's Mining Detection in the DRC
Shared code for Mining Detection. Follow up with firstname.lastname@example.org with questions for the project team / about the project materials.
This repo contains one folder per user and subproject. See README.md files in each directory for details and points of contact.
- Install packages.
Follow instructions in
These libraries provide the core tools for downloading satellite imagery,
serializing it to disk, and visualizing it.
- Download training data.
The following command downloads satellite imagery and rasterized "masks"
indicating where mines are located. Results are stored in
$ python -m bin.download_ipis --base_dir="/tmp/ipis"
- Process data.
This script ingests a folder of satellite images and masks which are stored in dataset api then features for the model, removes clouded datapoints, and saves the resultant vectors in a .npz (gzipped numpy) array.
$ python ./model/export_data.py --data_export_path /path/to/processed/data --data_input_path /path/to/downloaded/training/data
- Train model.
Using the .npz files generated in the previous step, export_model.py trains a random forest model and serializes it to disk. It takes data arguments, one for the actual training data, and a second which points to a test dataset (.npz format) to report model performance. The model is trained to predict if a mine lies under a pixel or not.
$ python ./model/export_model.py --train_data_path /path/to/processed/data.npz --test_data_path /path/to/processed/test/data.npz --export_model_path /path/to/dir
- Make predictions. Store to disk.
The following command uses a stored model to make predictions.
python ./model/inference.py --data_path /path/to/data --model_path /path/to/model
The --data_path option is a dataset in the Dataset API format. This command makes predictions on each image and saves the results to the same dataset. This adds two new source ids to the dataset 'landsat_inference_timeagg' and 'landsat_inference'. 'landsat_inference_timeagg' aggregates the predictions for the same pixel location in time by taking the majority prediction across time. The 'landsat_inference' source has predictions of the same location at different times stored as an extra dimension in the image array. This means that the image in the landsat_inference_timeagg source has shape [n_x, n_y] while the image in the landsat_inference has shape [n_time, n_x, n_y].
- Visualize predictions.
Visualizations are presented in Jupyter notebook. Open the notebook with the following command,
$ jupyter notebook "visualization_api/notebooks/Visualization API Demo.ipynb"
TODO statements in the notebook to apply visualize predictions
made in the previous step. By default, predictions generated via
model/inference.py are stored in the 'landsat8_inference_timeagg' source id
of the dataset.