Add method to run inference from a file #107

tanaysoni · 2019-10-04T10:27:33Z

The current Inferencer only supports running inference from dicts.

This PR adds a new method inference_from_file() that takes a file as an input and uses multiprocessing to do speed-up inference preprocessing.

Additionally, the _file_to_dicts() in the Processor is made public.

farm/infer.py

tholor · 2019-10-04T14:13:50Z

farm/infer.py

+    @classmethod
+    def _multiproc_dict_to_samples(cls, dicts, processor):
+        dicts_list = [dicts]
+        dataset, tensor_names = processor.dataset_from_dicts(dicts_list, from_inference=True)


This means we call dict_to_samples twice. Once for "dataset_from_dicts" and once directly. If we want to improve speed, we probably want to change this to only be called once

farm/infer.py

Timoeller

I know you asked me about this already and I guess I did not really see the full picture.

Why dont we do
inference_from_file(file):
create dict
call inference_from_dicts(dict)

and put all multiprocessing in inference_from_dict

Was there a specific reason, cause it looks much smarter to me now that I see the code...

Timoeller · 2019-10-11T12:46:36Z

farm/infer.py

+
+            results = p.imap(
+                partial(self._multiproc_dict_to_samples, processor=self.processor),
+                dicts,


why dont we need the grouper here anymore?

Timoeller · 2019-10-11T12:48:50Z

farm/infer.py

+        return preds_all
+
+    @classmethod
+    def _multiproc_dict_to_samples(cls, dicts, processor):


Lets change the name, since dict_to_samples has an own connotation inside farm. here we do conversion to datasets and converting input data to samples at the same time.

Another way would be to add an inference flag and not delete the samples after the torch dataset is created. That way we do not need to preprocess twice the exact same data. Lets do this only if it is easy to integrate. Otherwise lets move forward with this PR

tanaysoni requested a review from tholor October 4, 2019 10:27

tholor requested changes Oct 4, 2019

View reviewed changes

tholor added the enhancement New feature or request label Oct 4, 2019

tholor assigned tanaysoni Oct 4, 2019

tanaysoni added 4 commits October 10, 2019 15:21

Add method to run inference from a file

23176d3

Process preds chunkwise to avoid memory issues for large datasets

79eb0b1

Rename params

c98c983

Adjust progress bar range

bd0ebbe

tanaysoni force-pushed the inference_from_file branch from 81d41a7 to bd0ebbe Compare October 10, 2019 13:25

Rename param

17bb8b6

Timoeller suggested changes Oct 11, 2019

View reviewed changes

tanaysoni added 6 commits October 11, 2019 15:54

Refactor inference methods

f8a6866

Update method name for Inference

d903ae1

Revert change

57ddc8c

Code formatting

279cbf1

Fix list concatenation for preds_all

951752a

Update method name in test

3fed46c

tanaysoni merged commit d524ee9 into master Oct 11, 2019

tanaysoni deleted the inference_from_file branch October 11, 2019 16:24

isamokhin mentioned this pull request Oct 16, 2019

Inference now does not utilise GPU, is much slower and more verbose #117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to run inference from a file #107

Add method to run inference from a file #107

tanaysoni commented Oct 4, 2019

tholor Oct 4, 2019

Timoeller left a comment

Timoeller Oct 11, 2019

Timoeller Oct 11, 2019

Add method to run inference from a file #107

Add method to run inference from a file #107

Conversation

tanaysoni commented Oct 4, 2019

tholor Oct 4, 2019

Choose a reason for hiding this comment

Timoeller left a comment

Choose a reason for hiding this comment

Timoeller Oct 11, 2019

Choose a reason for hiding this comment

Timoeller Oct 11, 2019

Choose a reason for hiding this comment