Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODQA inference speed very very slow #853

Closed
shubhank008 opened this issue May 27, 2019 · 12 comments
Closed

ODQA inference speed very very slow #853

shubhank008 opened this issue May 27, 2019 · 12 comments
Assignees

Comments

@shubhank008
Copy link

Running the default configuration and model on a EC2 p2.xlarge instance (60~GB Ram and Nvidia K80 GPU) and inference for simple questions take 40 seconds to 5 minutes.

Sometimes, no result even after 10 minutes.

MobaXterm_2019-05-27_16-36-13

@my-master
Copy link
Contributor

Hi, @shubhank008 !
The inference can speed up multiple times if you switch from CPU to GPU usage.
No result is expected behavior and means that the model didn't find an appropriate answer.
The inference process takes different time because Wiki articles have different size, so when they are chunked and passed to the reader, the number of chunks varies.

@shubhank008
Copy link
Author

shubhank008 commented May 27, 2019

Hi, @shubhank008 !
The inference can speed up multiple times if you switch from CPU to GPU usage.
No result is expected behavior and means that the model didn't find an appropriate answer.
The inference process takes different time because Wiki articles have different size, so when they are chunked and passed to the reader, the number of chunks varies.

I believe I am already using GPU since p2.xlarge is AWS GPU instances with K80 GPU
Also, the no result thing was actually just a very very slow inference, it did return the result after like 10-15 minutes, though the result seems wrong (not exactly wrong but outdated). India's prime minister is Narendra Modi since 2014~ yet it returned answer as Manmohan Singh (who was PM pre 2014~)

~~Update
Actually, does deeppavlov uses CPU by default ? My instance has GPU but I think the default deeppavlov setup is using CPU only

@yoptar
Copy link
Contributor

yoptar commented May 27, 2019

When you install config requirements, by default tensorflow and not tensorflow-gpu is installed, as we cannot reliably check if CUDA is installed.
You can install tensorflow-gpu==1.10.0 in the same environment to make the model work with your GPU.

@my-master
Copy link
Contributor

my-master commented May 27, 2019

Actually, does deeppavlov uses CPU by default ?

Yes, I recommend uninstalling tensorflow and installing tensorflow-gpu==1.10.0 like @yoptar suggested. According to your tensorflow logs, you are using CPU version.

Outdated answers can appear because we are using the English Wikipedia dump from 2018-02-11.

@shubhank008
Copy link
Author

@yoptar Wow damn me. I thought the P2 instance comes with TF GPU installed by default since it ships a deep learning AMI with all the tools, but it was not. I have now installed TF GPU and confirmed TF was using GPU for operations.
But here is the weird thing, even now, when running ODQA something is very wrong and it feels as if running the code in python hangs or gets stuck for minutes (I thought it was just results/inference but right now just initial steps took 3 minutes of time).

MobaXterm_2019-05-27_17-32-21

As you can see above, just the data check and download step took whole 3 minutes and now its stuck at that last line, even though resources should not be a problem on such a powerful instance.

@my-master Yes I understand that but thats the weird part, the result is outdated by 5-6 years, since that answer should have been in the 2013~ dump and after it the answer in all dumps should be Narendra Modi

@my-master
Copy link
Contributor

my-master commented May 27, 2019

@shubhank008 data and models files for ODQA are very huge, so it doesn't really matter how powerful the instance is, it still will take several minutes to upload 14 GB database, 17 GB ranker matrix, embedding vectors and neural network model from disk to RAM. The good part is that this time is taken only at the first inference. Since everything is uploaded, inference will take less time.

As for the model answer, it could simply be wrong, if two names of prime ministers occur in the same article in different contexts. The algorithm could just miss the part where it says that one of the prime ministers is the former one.

update
Actually, embeddings and neural model are uploaded to video memory, but it also takes time in the initial model inference.

@shubhank008
Copy link
Author

shubhank008 commented May 27, 2019

@yoptar so I made it work and leaving what was wrong and solutions for anyone who stumbles upon it.

@my-master Yes, thank you, it took me a while to realize that those steps were related to ram and not gpu and I was under the impression that my cpu/gpu usage is 0% yet they are taking so much time.
Also, using interact mode without -d (download) is way way faster, I will try using python code mode too without download (since I have them downloaded already) to see if its just the code mode which is slower or getting stuck

Furthermore, I am not sure if or how it can be a problem, but I have a vague doubt that keeping 2 terminals open (which I had, one to monitor resource usage top -c/nvidia-smi) was also somehow causing the stuckness/delay. Closing the other terminal window suddenly processed many lines of output at once.
Realized it once I ran free -m and it showed 32GB of Ram usage (while GPU usage was still 0MB).

  1. First of all, the latest deeppavlov requires tensorflow-gpu==1.13.1 (or higher). I tried with 1.0.0 but there are errors related to CUDNNGRU which I believe was added in 1.12 or 1.13
  2. AWS/EC2 does not ship with tensorflow-gpu installed so it needs to be installed. Thankfully they ship with nvidia drivers and a few CUDA versions installed, so just install a version of TF GPU you need and load/use the CUDA version compatible with it.
    I ran into many errors due to CUDA version being wrong and had to use a .bashrc and EXPORT file to change in-use cuda versions between 8,9 and lastly 10 (for tf-gpu 1.13.1). You will probably also need to downgrade or upgrade your GPU drivers.
  3. Most of the errors/problems were simply due to mismatch of drivers and installed version of packages (mainly TF, cuda, cudnn)
  4. Use AWS GPU optimization article to disable autoboost and clock the GPU to 100% performance

I have found my success in having it work using TF-GPU 1.13.1, Nvidia Driver 418, CUDA 10.0 on Ubuntu 16.04 with 32GB of Ram. Yet to test on 16GB of lower Ram.

Also, its probably better to download all the models and data separately using the "download" command of deeppavlov so you don't have to use -d or "download = true" while inference. I have kind of fixed my getting stuck and delay issue by doing so.

My initial load time of loading all the data and connecting to the wiki database (loading data to Ram) is approx 2 minutes~ and after that, the time for reply of answers is 2-10 seconds. Super awesome.

I will try the whole setup on my personal PC and laptop later this week to see how it affects the performance since pc/laptop will be physical dedicated hardware, compared to the virtual resources of cloud instances (I remember reading a report that the HDD/SDD and Ram performance of EC2 or cloud is a bit slower than same hardware in a dedicated/proper PC)

@shubhank008
Copy link
Author

@yoptar @my-master why does sometimes it fails to give any answers at all even though the question is quite focused and similar. Sometimes it gives answers and sometimes it does not.

Also, maybe its time to start updating the TF calls since lots of warnings about using older methods (maybe the reason why docs/articles suggest 1.0.0 even though you actual need 1.13.1 as even 1.12 did not work).

PS: I was using the en_odqa_pop_infer_enwiki20180211 model
Out of curiosity, I actually thought the difference between this and default model is that this one gives more details/excerpt of the answer (like how the DRQA does).
Is there a way to fetch the relevant excerpt or summary for that answer ?

MobaXterm_2019-05-27_18-32-14

@shubhank008
Copy link
Author

Its me again, heh.

Finally everything is working although the answers are not accurate enough for real world use, sad. I was quite excited when I saw DrQA and this project and wanted to create a prototype talking bot for my dad's NGO school to showcase the technology and have kids interact and answer general questions to it, but so far the accuracy and relevancy seems quite bad to me.

Was this trained on the whole Wiki dump or partially ? And how long was it trained for ? Asking to know if anything can be done to make it better and more accurate (maybe train more ?).

MobaXterm_2019-05-27_18-42-54

@ZhuoranLyu
Copy link

Its me again, heh.

Finally everything is working although the answers are not accurate enough for real world use, sad. I was quite excited when I saw DrQA and this project and wanted to create a prototype talking bot for my dad's NGO school to showcase the technology and have kids interact and answer general questions to it, but so far the accuracy and relevancy seems quite bad to me.

Was this trained on the whole Wiki dump or partially ? And how long was it trained for ? Asking to know if anything can be done to make it better and more accurate (maybe train more ?).

MobaXterm_2019-05-27_18-42-54

@shubhank008 Exactly same question! Apparently the answer is not satisfying, I even can not reproduce the same answer as the demo shows. Any idea?

@yurakuratov
Copy link
Contributor

@IgnatovFedor Could you help here, which configuration file are we using on the demo page?

@IgnatovFedor
Copy link
Collaborator

@yurakuratov, we are using en_odqa_pop_infer_enwiki20180211 and ru_odqa_infer_wiki_rubert_noans with 1080 Ti.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants