Skip to content

Latest commit

 

History

History
29 lines (23 loc) · 1.34 KB

preprocessing.md

File metadata and controls

29 lines (23 loc) · 1.34 KB

Preprocessed versions of raw datasets has to be generated before any neural network training:

python deeposlandia/datagen.py -D mapillary -s 224 -a -p ./any-data-path -t 18000 -v 2000 -T 5000

The previous command will generates a set of 224 * 224 images based on Mapillary dataset. The raw dataset must be in ./any-data-path/input. If the -a argument is specified, the preprocessed dataset will be stored in ./any-data-path/preprocessed/224_aggregated, otherwise it will be stored in ./any-data-path/preprocessed/224_full. The aggregation is applied on dataset labels, that can be grouped in Mapillary case (and only in Mapillary case) to reduce their number from 65 to 11.

Additionally, the preprocessed dataset may contain less images than the raw dataset: the -t, -v and -T arguments refer respectively to training, validation and testing image quantities. The amount indicated as an example correspond to raw dataset size.

For AerialImage dataset, a limited set of image sizes are supported. As smaller tiles will be generated by cutting the big original image, a divisor of 5000 is expected.

In the shape datase case, this preprocessing step generates a bunch of images from scratch.

As an easter-egg feature, label popularity is also printed by this command (proportion of images where each label appears in the preprocessed dataset).