New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to run inference from a file #107
Conversation
farm/infer.py
Outdated
@classmethod | ||
def _multiproc_dict_to_samples(cls, dicts, processor): | ||
dicts_list = [dicts] | ||
dataset, tensor_names = processor.dataset_from_dicts(dicts_list, from_inference=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we call dict_to_samples twice. Once for "dataset_from_dicts" and once directly. If we want to improve speed, we probably want to change this to only be called once
81d41a7
to
bd0ebbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you asked me about this already and I guess I did not really see the full picture.
Why dont we do
inference_from_file(file):
create dict
call inference_from_dicts(dict)
and put all multiprocessing in inference_from_dict
Was there a specific reason, cause it looks much smarter to me now that I see the code...
farm/infer.py
Outdated
|
||
results = p.imap( | ||
partial(self._multiproc_dict_to_samples, processor=self.processor), | ||
dicts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dont we need the grouper here anymore?
farm/infer.py
Outdated
return preds_all | ||
|
||
@classmethod | ||
def _multiproc_dict_to_samples(cls, dicts, processor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets change the name, since dict_to_samples has an own connotation inside farm. here we do conversion to datasets and converting input data to samples at the same time.
Another way would be to add an inference flag and not delete the samples after the torch dataset is created. That way we do not need to preprocess twice the exact same data. Lets do this only if it is easy to integrate. Otherwise lets move forward with this PR
The current
Inferencer
only supports running inference from dicts.This PR adds a new method
inference_from_file()
that takes a file as an input and uses multiprocessing to do speed-up inference preprocessing.Additionally, the
_file_to_dicts()
in theProcessor
is made public.