-
Notifications
You must be signed in to change notification settings - Fork 16
Conversation
…reation Control the input, preprocessed and output repositories separately; use a temp directory to do the tests
Split the legacy function `utils.prepare_folders` into 3 smaller function `utils.prepare_input_folder`, `utils.prepare_preprocessed_folder` and `utils.prepare_output_folder`. This modification makes the code more flexible, as we could need only one or two of them: *e.g.* if we create a dataset, it is useless to consider `output` subdirectory. This commit is test compliant.
…n functions Modify `deeposlandia/kerastrain.py` and `deeposlandia/datagen.py` scripts so as to consider the split of `utils.prepare_folders` function, and new way of returning folders in dataset creation and model training programs.
This commit creates a module that predicts labels on passed-as-argument images. The image argument can be a list, and it is regex-compliant; it is possible to pass several images with a path like `datapath/img_000*.png`. For instance the resulting labels are only printed onto console.
deeposlandia/inference.py
Outdated
parser = add_instance_arguments(parser) | ||
args = parser.parse_args() | ||
|
||
image_paths = [item for sublist in [glob.glob(f) for f in args.image_paths] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can turn these comprehension lists into generator expressions. Such as:
(item for sublist in (glob.glob(f) for f in args.image_paths) for item in sublist)
Moreover, it's not clear for me, at least when I read the first time, that you want to flatten the list of image files. Can I propose something like:
images_paths = (glob.glob(f) for f in args.image_paths)
# flatten the result of [[image1, image2], [image10, image11]]
image_paths = itertools.chain(*image_paths)
with the great package itertools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, itertools
is really convenient. However I'm not sure to want a iterator there.
The iterator is consumed in the first loop, when the x_test
variable is built. But its items are not available any more in the last loop, when results are printed: how do we get the image filenames if such a structure is chosen?
deeposlandia/inference.py
Outdated
image_size, | ||
aggregate_value) | ||
|
||
print( prepro_folder['training_config'] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the print. Or replace it by a logging message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless 'print' statement, it has been removed.
Before this commit, the checkpoint path was recovered as the max of checkpoints in the alphanumeric order. However it uses `os.listdir`, that keeps subdirectories. The checkpoint recovering may fail when the max item is a directory. The new implementation fixes this point, by considering only files.
Remove a useless print statement, and make the `image_paths` variable construction clearer.
This PR introduces a new module that will make the inference easier in further developments (for instance, in a web app).
Until now, inference was done only after training a model. It was possible to do inference only, by passing 0 to
nb_epochs
, or alternatively a number smaller than the checkpoint training step, if a backup exists. This was quite unpractical, as the program arguments are training-focused.Here we can predict labels on a given image by simply entering the following command (as an example):
Additionally, the
prepare_folders
function has been splitted into three more precise functions:prepare_input_folder
, which is called during dataset generation (cfdeeposlandia/datagen.py
).prepare_prepro_folder
which is called during dataset generation and training process: such directories are filled during the former and the images that they contain are scanned during the latter.prepare_output_folder
which is called when results are produced, either during training process (model backup creations) or during inference (model backup recovering).