Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference now does not utilise GPU, is much slower and more verbose #117

Closed
isamokhin opened this issue Oct 16, 2019 · 11 comments
Closed

Inference now does not utilise GPU, is much slower and more verbose #117

isamokhin opened this issue Oct 16, 2019 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@isamokhin
Copy link

After the recent commits, inference_from_dicts took place of run_inference and uses multiprocessing. However, at least for my task it made matters worse.

I have a test dataset of 200 text documents. They are preprocessed and split into fragments, from which lists of dicts {"text": "some text"} are formed for subsequent text classification with FARM. There are 5576 fragments in total. Previously, with gpu flag enabled, these fragments were classified in 53 seconds, utilising up to 1900 MB memory on GPU. Processor transformed samples into inputs for BERT model silently. I have installed FARM from the older branch to make sure that it still works this way.

With new changes to inference, each list of dicts is treated the same way as a dataset during training - i.e., FARM prints one sample from each with tokens, offsets, token_ids etc., and with full ASCII art. Even with logging turned off, tqdm progress bar is still printed for each list of dicts. Inference for all dataset now takes 370 seconds. All this time, no more than 1390 MB of GPU memory is utilised (this is the typical size of bare BERT model, loaded into GPU).

The results of inference did not change.

System:

  • OS: Ubuntu 18.04
  • GPU/CPU: NVIDIA GTX 1080 Ti, CUDA 10.1
  • FARM version: newest version on the morning of 16 Oct 2019.
@isamokhin isamokhin added the bug Something isn't working label Oct 16, 2019
@Timoeller
Copy link
Contributor

Timoeller commented Oct 16, 2019

This is clearly sub optimal (though we find our ASCII art very pleasing : )

@tanaysoni can you please have a look into this?
As for now, do you know what exact code changes could cause these problems @isamokhin?

@isamokhin
Copy link
Author

That's the point, I cannot understand what exactly caused this :( But it must be #107. All changes I see is that now multiprocessing is used, and _run_inference is now a smaller function used in inference_from_dicts, though with the same content.

@tanaysoni
Copy link
Contributor

Hi @isamokhin. I am trying to reproduce the performance difference. It'd be great if you can provide the batch_size, multiprocessing_chunk_size, and max_seq_len.

Additionally, did you implement a custom Processor or used the default TextClassificationProcessor?

@isamokhin
Copy link
Author

isamokhin commented Oct 18, 2019

Hi! I'm using default TextClassificationProcessor, with max_seq_len=128, batch_size=32 and default multiprocessing_chunk_size (I didn't set this parameter at all).

@isamokhin
Copy link
Author

I cannot share my own data, but the difference in performance can be seen on the FARM toy example. I trained a German model from doc_classification.py example and used inference in a separate file:

import time
import random
from farm.infer import Inferencer

save_dir = "saved_models/bert-german-doc-tutorial"

basic_texts = [
    {"text": "Schartau sagte dem Tagesspiegel, dass Fischer ein Idiot sei"},
    {"text": "Martin Müller spielt Handball in Berlin"},
]

texts = random.choices(basic_texts, k=25) # generate more data

model = Inferencer.load(save_dir, gpu=True)

print('Starting inference:')

start = time.time()
for i in range(40): 
    # I use this to show that Processor in new version prints a sample for each iteration, unlike earlier
    result = model.inference_from_dicts(dicts=texts) # replace with run_inference for earlier version
end = time.time()

print(f'It took {end-start} seconds')

I tried this code with FARM installed from the newest master branch and the job was done in 23.3 seconds and in each iteration a long sample of the data was printed. FARM from older branch (I used qa_redesign), before Inferencer changes, does everything silently in 5.8 seconds.

@tanaysoni
Copy link
Contributor

Thank you the code snippet, @isamokhin.

In this case, inference_from_dicts() is called inside a loop. This leads to the the latest multiprocessing code doing more work per loop iteration as it spawns new processes and manages multiprocessing Pool for each call to inference_from_dicts().

I tried the same code with a larger single list of dicts, i.e., texts = random.choices(basic_texts, k=25*40) without the loop and then the performance of the new version is similar to the older release.

As I see it, it is a tradeoff between latency and throughput. Mutliprocessing here is slower to start but yields higher throughput. I am curious to know if you've a use case that necessitates using a loop?

A potential solution can be to add a use_multiprocessing param in inference_from_dicts() to disable the use of multiprocessing. I see that being useful for REST APIs for inference where latency could be more important. It'd be cool to have this in FARM — would you like working on it? Happy to support!

@isamokhin
Copy link
Author

Thanks, @tanaysoni!

I used a loop in my code snippet, because in my task I have separate text documents, each of which is preprocessed into smaller fragments, which are then classified (by sentiment). So, when I put fragments into inference_from_dicts, I do it separately for each text document.

I guess I can rework my code to analyze a batch of documents and to find a way of discerning which small fragments are from which documents. So I will be fine, probably. But there are situations where you have to classify a stream of documents, one at a time, and it would be nice to have an option of high latency inference for such cases.

@isamokhin
Copy link
Author

And there is a problem of verbosity - even when I test my snippet with 10 texts, it prints 15 "samples", each with tokens, token_ids, ASCII art etc. With 20 texts, it's already 30 samples.

tanaysoni added a commit that referenced this issue Oct 18, 2019
Incorrect params lead to increase in verbosity of logs(#117).
tanaysoni added a commit that referenced this issue Oct 18, 2019
The call to dataset_from_dicts() from Inferencer had incorrect params leading to Samples from each multiprocessing chunk being logged, making the logs verbose.
@tanaysoni
Copy link
Contributor

Hi @isamokhin. The verbosity issue is now resolved in the Inferencer.

We plan to add a flag that allows disabling multiprocessing during Inference. That should help achieve the performance similar as before.

@isamokhin
Copy link
Author

Hi @tanaysoni! You are right, inference is much less verbose now. I will also appreciate when you add the ability to use inference without multiprocessing, making it faster in a stream of documents. I guess I will close the issue now. Thanks a lot for your help and for the great work you all are doing with FARM!

@tanaysoni
Copy link
Contributor

Hi @isamokhin, thank you for the feedback! An option to disable the use of multiprocessing in the Inferencer is now merged to the master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants