Inference now does not utilise GPU, is much slower and more verbose #117

isamokhin · 2019-10-16T11:00:51Z

After the recent commits, inference_from_dicts took place of run_inference and uses multiprocessing. However, at least for my task it made matters worse.

I have a test dataset of 200 text documents. They are preprocessed and split into fragments, from which lists of dicts {"text": "some text"} are formed for subsequent text classification with FARM. There are 5576 fragments in total. Previously, with gpu flag enabled, these fragments were classified in 53 seconds, utilising up to 1900 MB memory on GPU. Processor transformed samples into inputs for BERT model silently. I have installed FARM from the older branch to make sure that it still works this way.

With new changes to inference, each list of dicts is treated the same way as a dataset during training - i.e., FARM prints one sample from each with tokens, offsets, token_ids etc., and with full ASCII art. Even with logging turned off, tqdm progress bar is still printed for each list of dicts. Inference for all dataset now takes 370 seconds. All this time, no more than 1390 MB of GPU memory is utilised (this is the typical size of bare BERT model, loaded into GPU).

The results of inference did not change.

System:

OS: Ubuntu 18.04
GPU/CPU: NVIDIA GTX 1080 Ti, CUDA 10.1
FARM version: newest version on the morning of 16 Oct 2019.

The text was updated successfully, but these errors were encountered:

Timoeller · 2019-10-16T12:58:38Z

This is clearly sub optimal (though we find our ASCII art very pleasing : )

@tanaysoni can you please have a look into this?
As for now, do you know what exact code changes could cause these problems @isamokhin?

isamokhin · 2019-10-16T13:28:41Z

That's the point, I cannot understand what exactly caused this :( But it must be #107. All changes I see is that now multiprocessing is used, and _run_inference is now a smaller function used in inference_from_dicts, though with the same content.

tanaysoni · 2019-10-17T17:28:05Z

Hi @isamokhin. I am trying to reproduce the performance difference. It'd be great if you can provide the batch_size, multiprocessing_chunk_size, and max_seq_len.

Additionally, did you implement a custom Processor or used the default TextClassificationProcessor?

isamokhin · 2019-10-18T07:42:24Z

Hi! I'm using default TextClassificationProcessor, with max_seq_len=128, batch_size=32 and default multiprocessing_chunk_size (I didn't set this parameter at all).

isamokhin · 2019-10-18T08:21:37Z

I cannot share my own data, but the difference in performance can be seen on the FARM toy example. I trained a German model from doc_classification.py example and used inference in a separate file:

import time
import random
from farm.infer import Inferencer

save_dir = "saved_models/bert-german-doc-tutorial"

basic_texts = [
    {"text": "Schartau sagte dem Tagesspiegel, dass Fischer ein Idiot sei"},
    {"text": "Martin Müller spielt Handball in Berlin"},
]

texts = random.choices(basic_texts, k=25) # generate more data

model = Inferencer.load(save_dir, gpu=True)

print('Starting inference:')

start = time.time()
for i in range(40): 
    # I use this to show that Processor in new version prints a sample for each iteration, unlike earlier
    result = model.inference_from_dicts(dicts=texts) # replace with run_inference for earlier version
end = time.time()

print(f'It took {end-start} seconds')

I tried this code with FARM installed from the newest master branch and the job was done in 23.3 seconds and in each iteration a long sample of the data was printed. FARM from older branch (I used qa_redesign), before Inferencer changes, does everything silently in 5.8 seconds.

tanaysoni · 2019-10-18T10:29:06Z

Thank you the code snippet, @isamokhin.

In this case, inference_from_dicts() is called inside a loop. This leads to the the latest multiprocessing code doing more work per loop iteration as it spawns new processes and manages multiprocessing Pool for each call to inference_from_dicts().

I tried the same code with a larger single list of dicts, i.e., texts = random.choices(basic_texts, k=25*40) without the loop and then the performance of the new version is similar to the older release.

As I see it, it is a tradeoff between latency and throughput. Mutliprocessing here is slower to start but yields higher throughput. I am curious to know if you've a use case that necessitates using a loop?

A potential solution can be to add a use_multiprocessing param in inference_from_dicts() to disable the use of multiprocessing. I see that being useful for REST APIs for inference where latency could be more important. It'd be cool to have this in FARM — would you like working on it? Happy to support!

isamokhin · 2019-10-18T10:50:20Z

Thanks, @tanaysoni!

I used a loop in my code snippet, because in my task I have separate text documents, each of which is preprocessed into smaller fragments, which are then classified (by sentiment). So, when I put fragments into inference_from_dicts, I do it separately for each text document.

I guess I can rework my code to analyze a batch of documents and to find a way of discerning which small fragments are from which documents. So I will be fine, probably. But there are situations where you have to classify a stream of documents, one at a time, and it would be nice to have an option of high latency inference for such cases.

isamokhin · 2019-10-18T11:07:44Z

And there is a problem of verbosity - even when I test my snippet with 10 texts, it prints 15 "samples", each with tokens, token_ids, ASCII art etc. With 20 texts, it's already 30 samples.

Incorrect params lead to increase in verbosity of logs(#117).

The call to dataset_from_dicts() from Inferencer had incorrect params leading to Samples from each multiprocessing chunk being logged, making the logs verbose.

tanaysoni · 2019-10-25T10:17:26Z

Hi @isamokhin. The verbosity issue is now resolved in the Inferencer.

We plan to add a flag that allows disabling multiprocessing during Inference. That should help achieve the performance similar as before.

isamokhin · 2019-10-25T12:44:02Z

Hi @tanaysoni! You are right, inference is much less verbose now. I will also appreciate when you add the ability to use inference without multiprocessing, making it faster in a stream of documents. I guess I will close the issue now. Thanks a lot for your help and for the great work you all are doing with FARM!

tanaysoni · 2019-10-25T14:45:37Z

Hi @isamokhin, thank you for the feedback! An option to disable the use of multiprocessing in the Inferencer is now merged to the master.

isamokhin added the bug Something isn't working label Oct 16, 2019

Timoeller assigned tanaysoni Oct 16, 2019

tanaysoni added a commit that referenced this issue Oct 18, 2019

Fix params for call to dataset_from_dicts in Inferencer

967b6f3

Incorrect params lead to increase in verbosity of logs(#117).

tanaysoni added a commit that referenced this issue Oct 18, 2019

Fix logging verbosity in Inferencer (#117)

4a639d4

The call to dataset_from_dicts() from Inferencer had incorrect params leading to Samples from each multiprocessing chunk being logged, making the logs verbose.

tanaysoni added a commit that referenced this issue Oct 25, 2019

Add a use_multiprocessing param to disable multiprocessing(#117)

5e1fbf9

isamokhin closed this as completed Oct 25, 2019

tanaysoni added a commit that referenced this issue Oct 25, 2019

Add an option to disable multiprocessing in Inferencer(#117)

7f2203e

bipinkc19 mentioned this issue Aug 6, 2020

Speed issue in inference in newer versions > 0.4.3 for Q/A #486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference now does not utilise GPU, is much slower and more verbose #117

Inference now does not utilise GPU, is much slower and more verbose #117

isamokhin commented Oct 16, 2019

Timoeller commented Oct 16, 2019 •

edited

isamokhin commented Oct 16, 2019

tanaysoni commented Oct 17, 2019

isamokhin commented Oct 18, 2019 •

edited

isamokhin commented Oct 18, 2019

tanaysoni commented Oct 18, 2019

isamokhin commented Oct 18, 2019

isamokhin commented Oct 18, 2019

tanaysoni commented Oct 25, 2019

isamokhin commented Oct 25, 2019

tanaysoni commented Oct 25, 2019

Inference now does not utilise GPU, is much slower and more verbose #117

Inference now does not utilise GPU, is much slower and more verbose #117

Comments

isamokhin commented Oct 16, 2019

Timoeller commented Oct 16, 2019 • edited

isamokhin commented Oct 16, 2019

tanaysoni commented Oct 17, 2019

isamokhin commented Oct 18, 2019 • edited

isamokhin commented Oct 18, 2019

tanaysoni commented Oct 18, 2019

isamokhin commented Oct 18, 2019

isamokhin commented Oct 18, 2019

tanaysoni commented Oct 25, 2019

isamokhin commented Oct 25, 2019

tanaysoni commented Oct 25, 2019

Timoeller commented Oct 16, 2019 •

edited

isamokhin commented Oct 18, 2019 •

edited