Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to run inference from a file #107

Merged
merged 11 commits into from Oct 11, 2019
Merged

Conversation

tanaysoni
Copy link
Contributor

The current Inferencer only supports running inference from dicts.

This PR adds a new method inference_from_file() that takes a file as an input and uses multiprocessing to do speed-up inference preprocessing.

Additionally, the _file_to_dicts() in the Processor is made public.

@tanaysoni tanaysoni requested a review from tholor October 4, 2019 10:27
farm/infer.py Show resolved Hide resolved
farm/infer.py Outdated
@classmethod
def _multiproc_dict_to_samples(cls, dicts, processor):
dicts_list = [dicts]
dataset, tensor_names = processor.dataset_from_dicts(dicts_list, from_inference=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we call dict_to_samples twice. Once for "dataset_from_dicts" and once directly. If we want to improve speed, we probably want to change this to only be called once

farm/infer.py Outdated Show resolved Hide resolved
@tholor tholor added the enhancement New feature or request label Oct 4, 2019
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you asked me about this already and I guess I did not really see the full picture.

Why dont we do
inference_from_file(file):
create dict
call inference_from_dicts(dict)

and put all multiprocessing in inference_from_dict

Was there a specific reason, cause it looks much smarter to me now that I see the code...

farm/infer.py Outdated

results = p.imap(
partial(self._multiproc_dict_to_samples, processor=self.processor),
dicts,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dont we need the grouper here anymore?

farm/infer.py Outdated
return preds_all

@classmethod
def _multiproc_dict_to_samples(cls, dicts, processor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets change the name, since dict_to_samples has an own connotation inside farm. here we do conversion to datasets and converting input data to samples at the same time.

Another way would be to add an inference flag and not delete the samples after the torch dataset is created. That way we do not need to preprocess twice the exact same data. Lets do this only if it is easy to integrate. Otherwise lets move forward with this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants