New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference now does not utilise GPU, is much slower and more verbose #117
Comments
This is clearly sub optimal (though we find our ASCII art very pleasing : ) @tanaysoni can you please have a look into this? |
That's the point, I cannot understand what exactly caused this :( But it must be #107. All changes I see is that now multiprocessing is used, and |
Hi @isamokhin. I am trying to reproduce the performance difference. It'd be great if you can provide the Additionally, did you implement a custom |
Hi! I'm using default |
I cannot share my own data, but the difference in performance can be seen on the FARM toy example. I trained a German model from
I tried this code with FARM installed from the newest master branch and the job was done in 23.3 seconds and in each iteration a long sample of the data was printed. FARM from older branch (I used |
Thank you the code snippet, @isamokhin. In this case, I tried the same code with a larger single list of dicts, i.e., As I see it, it is a tradeoff between latency and throughput. Mutliprocessing here is slower to start but yields higher throughput. I am curious to know if you've a use case that necessitates using a loop? A potential solution can be to add a |
Thanks, @tanaysoni! I used a loop in my code snippet, because in my task I have separate text documents, each of which is preprocessed into smaller fragments, which are then classified (by sentiment). So, when I put fragments into I guess I can rework my code to analyze a batch of documents and to find a way of discerning which small fragments are from which documents. So I will be fine, probably. But there are situations where you have to classify a stream of documents, one at a time, and it would be nice to have an option of high latency inference for such cases. |
And there is a problem of verbosity - even when I test my snippet with 10 texts, it prints 15 "samples", each with tokens, token_ids, ASCII art etc. With 20 texts, it's already 30 samples. |
Incorrect params lead to increase in verbosity of logs(#117).
The call to dataset_from_dicts() from Inferencer had incorrect params leading to Samples from each multiprocessing chunk being logged, making the logs verbose.
Hi @isamokhin. The verbosity issue is now resolved in the Inferencer. We plan to add a flag that allows disabling multiprocessing during Inference. That should help achieve the performance similar as before. |
Hi @tanaysoni! You are right, inference is much less verbose now. I will also appreciate when you add the ability to use inference without multiprocessing, making it faster in a stream of documents. I guess I will close the issue now. Thanks a lot for your help and for the great work you all are doing with FARM! |
Hi @isamokhin, thank you for the feedback! An option to disable the use of multiprocessing in the Inferencer is now merged to the master. |
After the recent commits,
inference_from_dicts
took place ofrun_inference
and uses multiprocessing. However, at least for my task it made matters worse.I have a test dataset of 200 text documents. They are preprocessed and split into fragments, from which lists of dicts {"text": "some text"} are formed for subsequent text classification with FARM. There are 5576 fragments in total. Previously, with gpu flag enabled, these fragments were classified in 53 seconds, utilising up to 1900 MB memory on GPU. Processor transformed samples into inputs for BERT model silently. I have installed FARM from the older branch to make sure that it still works this way.
With new changes to inference, each list of dicts is treated the same way as a dataset during training - i.e., FARM prints one sample from each with tokens, offsets, token_ids etc., and with full ASCII art. Even with logging turned off,
tqdm
progress bar is still printed for each list of dicts. Inference for all dataset now takes 370 seconds. All this time, no more than 1390 MB of GPU memory is utilised (this is the typical size of bare BERT model, loaded into GPU).The results of inference did not change.
System:
The text was updated successfully, but these errors were encountered: