# Download of input data
This jupyter notebook provides an overview in how to download the required inputs of the following manuscript: 

[Automatic detection, mapping, and characterization of boulders on planetary surfaces from high-resolution satellite images](https://link-to-accepted-paper).

It includes an explanation in how to download the: 
+ raw inputs (raster images and labeled boulders) required to trained the Mask R-CNN neural network.
+ pre-processed raw inputs (in a format that can be directly imported in detectron2). 
+ model setup and weights of the model that gave the best model performances, which is required for the prediction of boulders in new images.

## Getting prepared

Let's assume that you work on a Linux or UNIX machine. If this is not the case, I would advice you to install [Git for Windows](https://gitforwindows.org/) on your Windows computer. 

Let's save all of the inputs in a temporary directory in your home folder `~/tmp/BOULDERING`. 

In [2]:
from pathlib import Path

In [3]:
home_p = Path.home()
work_dir= home_p / "tmp" / "BOULDERING"
work_dir.mkdir(parents=True, exist_ok=True)

## Install gdown
Let's use the `gdown` Python library to download the inputs from my GoogleDrive. Let's install it quickly within this Jupyter Notebook. 

In [4]:
!pip install gdown
import gdown

Collecting gdown
  Downloading gdown-4.7.1-py3-none-any.whl (15 kB)
Collecting filelock (from gdown)
  Downloading filelock-3.12.2-py3-none-any.whl (10 kB)
Collecting PySocks!=1.5.7,>=1.5.6 (from requests[socks]->gdown)
  Downloading PySocks-1.7.1-py3-none-any.whl (16 kB)
Installing collected packages: PySocks, filelock, gdown
Successfully installed PySocks-1.7.1 filelock-3.12.2 gdown-4.7.1


## Download raw inputs (Size: 8.8 GB)
Contains all of the rasters (planetary images), and labeled boulders. There are multiple shapefiles provided in the raw data:
+ a boulder-mapping file, which is the manually digitized outline of boulders.
+ a ROM file (stands for Region of Mapping), which depicts the image patches on which the boulder mapping has been conducted.
+ a global-tiles file, which shows all of the image patches within the raster. <br>

**Structure**

```
.
└── raw_data/
    ├── earth/
    │   └── image_name/
    │       ├── shp/
    │       │   ├── <image_name>-ROM.shp
    │       │   ├── <image_name>-boulder-mapping.shp
    │       │   └── <image_name>-global-tiles.shp
    │       └── raster/
    │           └── <image_name>.tif
    ├── mars/
    │   └── image_name/
    │       ├── shp/
    │       │   ├── <image_name>-ROM.shp
    │       │   ├── <image_name>-boulder-mapping.shp
    │       │   └── <image_name>-global-tiles.shp
    │       └── raster/
    │           └── <image_name>.tif
    └── moon/
        └── image_name/
            ├── shp/
            │   ├── <image_name>-ROM.shp
            │   ├── <image_name>-boulder-mapping.shp
            │   └── <image_name>-global-tiles.shp
            └── raster/
                └── <image_name>.tif
```

There are multiple locations/images per planetary body. 

In [11]:
url_raw_inputs = "https://drive.google.com/uc?id=10EJPATaMdS82jKOFR7rZ6o5fT6mSIhdu"
gdown.download(url_raw_inputs, (work_dir / "raw_data_BOULDERING.zip").as_posix(), quiet=True)

# only work for Linux or UNIX machine (for Windows user, you can unzip the folder manually)
!unzip ~/tmp/BOULDERING/raw_data_BOULDERING.zip -d ~/tmp/BOULDERING/

'/home/nilscp/tmp/BOULDERING/raw_data_BOULDERING.zip'

## Pre-processed raw inputs
The steps to pre-process raw inputs are described in the PREPROCESSING_BOULDERING jupyter-notebook. If you don't plan in adding more training data, you can directly used the pre-processed data. If you want to add more labeled boulder data, you need to download the raw inputs, and adopt a similar structure as shown in the previous code cell. 

Size: 252.8 MB

```
.
└── preprocessed_inputs/
    ├── json/ (image patches + labeled boulder datasets in a format that can be imported in Detectron2)
    ├── pkl/ (additional information that can be loaded in Python. Pickle format.)
    ├── preprocessing/ (contain training, validation and test image patches and corresponding instance and semantic segmentation masks)
    │   ├── train/
    │   │   ├── images/
    │   │   └── labels/
    │   ├── validation/
    │   │   ├── images/
    │   │   └── labels/
    │   └── test/
    │       ├── images/
    │       └── labels/
    └── shp/ (does not contain anything, can be deleted)
```

In [10]:
url_pre_processed_inputs = "https://drive.google.com/uc?id=131sJ2PFiUvBfYhuxAbZXWyK_dXyjNcRh"
gdown.download(url_pre_processed_inputs, (work_dir / "Apr2023-Mars-Moon-Earth-mask-5px.zip").as_posix(), quiet=True)

# only work for Linux or UNIX machine (for Windows user, you can unzip the folder manually)
!unzip ~/tmp/BOULDERING/Apr2023-Mars-Moon-Earth-mask-5px.zip -d ~/tmp/BOULDERING/
# let's change the name so it gives more sense
!mv ~/tmp/BOULDERING/Apr2023-Mars-Moon-Earth-mask-5px ~/tmp/BOULDERING/preprocessed_inputs

Archive:  /home/nilscp/tmp/BOULDERING/Apr2023-Mars-Moon-Earth-mask-5px.zip
replace /home/nilscp/tmp/BOULDERING/Apr2023-Mars-Moon-Earth-mask-5px/json/Apr2023-Mars-Moon-Earth-mask-5px.json? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C


## Model setup, weights and augmentation files

In [12]:
url_model_setup = "https://drive.google.com/uc?id=1O-EH_VmzpI3s4V1ouSFXg8jhbNjZpqlj"
url_model_setup_base = "https://drive.google.com/uc?id=19aYv6aPvpbCD8EXvcfHB1abkibLTxTbB"
url_aug_setup = "https://drive.google.com/uc?id=1SHSQgbN9hUyu-mCRKvUO9J8HgryOgh1X"
url_model_weights = "https://drive.google.com/uc?id=1hTufdIEHo06M0ZzDPZ1MxQRKzxE0VVkO"

#url_model_weights = "https://drive.google.com/uc?id=1ln9FXZNEniuJ2y1KLkH8sn9LlVAUTH3M"
gdown.download(url_model_setup, (work_dir / "model_setup.yaml").as_posix(), quiet=True)
gdown.download(url_model_setup_base, (work_dir / "base_setup.yaml").as_posix(), quiet=True)
gdown.download(url_aug_setup, (work_dir / "augmentation_setup.json").as_posix(), quiet=True)
gdown.download(url_model_weights, (work_dir / "model_weights.pth").as_posix(), quiet=True)

'/home/nilscp/tmp/BOULDERING/model_weights.pth'

In [13]:
!mkdir ~/tmp/BOULDERING/best_model
!mv ~/tmp/BOULDERING/model_setup.yaml ~/tmp/BOULDERING/best_model/ # model setup (overwrite parameters in base setup)
!mv ~/tmp/BOULDERING/base_setup.yaml ~/tmp/BOULDERING/best_model/ # base setup required (loading first)
!mv ~/tmp/BOULDERING/augmentation_setup.json ~/tmp/BOULDERING/best_model/ # includes the different augmentations used during training.
!mv ~/tmp/BOULDERING/model_weights.pth ~/tmp/BOULDERING/best_model/ # model weights (after training is done).