New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ODQA inference speed very very slow #853
Comments
Hi, @shubhank008 ! |
I believe I am already using GPU since p2.xlarge is AWS GPU instances with K80 GPU ~~Update |
When you install config requirements, by default |
Yes, I recommend uninstalling Outdated answers can appear because we are using the English Wikipedia dump from 2018-02-11. |
@yoptar Wow damn me. I thought the P2 instance comes with TF GPU installed by default since it ships a deep learning AMI with all the tools, but it was not. I have now installed TF GPU and confirmed TF was using GPU for operations. As you can see above, just the data check and download step took whole 3 minutes and now its stuck at that last line, even though resources should not be a problem on such a powerful instance. @my-master Yes I understand that but thats the weird part, the result is outdated by 5-6 years, since that answer should have been in the 2013~ dump and after it the answer in all dumps should be Narendra Modi |
@shubhank008 data and models files for ODQA are very huge, so it doesn't really matter how powerful the instance is, it still will take several minutes to upload 14 GB database, 17 GB ranker matrix, embedding vectors and neural network model from disk to RAM. The good part is that this time is taken only at the first inference. Since everything is uploaded, inference will take less time. As for the model answer, it could simply be wrong, if two names of prime ministers occur in the same article in different contexts. The algorithm could just miss the part where it says that one of the prime ministers is the former one. update |
@yoptar so I made it work and leaving what was wrong and solutions for anyone who stumbles upon it. @my-master Yes, thank you, it took me a while to realize that those steps were related to ram and not gpu and I was under the impression that my cpu/gpu usage is 0% yet they are taking so much time. Furthermore, I am not sure if or how it can be a problem, but I have a vague doubt that keeping 2 terminals open (which I had, one to monitor resource usage top -c/nvidia-smi) was also somehow causing the stuckness/delay. Closing the other terminal window suddenly processed many lines of output at once.
I have found my success in having it work using TF-GPU 1.13.1, Nvidia Driver 418, CUDA 10.0 on Ubuntu 16.04 with 32GB of Ram. Yet to test on 16GB of lower Ram. Also, its probably better to download all the models and data separately using the "download" command of deeppavlov so you don't have to use -d or "download = true" while inference. I have kind of fixed my getting stuck and delay issue by doing so. My initial load time of loading all the data and connecting to the wiki database (loading data to Ram) is approx 2 minutes~ and after that, the time for reply of answers is 2-10 seconds. Super awesome. I will try the whole setup on my personal PC and laptop later this week to see how it affects the performance since pc/laptop will be physical dedicated hardware, compared to the virtual resources of cloud instances (I remember reading a report that the HDD/SDD and Ram performance of EC2 or cloud is a bit slower than same hardware in a dedicated/proper PC) |
@yoptar @my-master why does sometimes it fails to give any answers at all even though the question is quite focused and similar. Sometimes it gives answers and sometimes it does not. Also, maybe its time to start updating the TF calls since lots of warnings about using older methods (maybe the reason why docs/articles suggest 1.0.0 even though you actual need 1.13.1 as even 1.12 did not work). PS: I was using the en_odqa_pop_infer_enwiki20180211 model |
Its me again, heh. Finally everything is working although the answers are not accurate enough for real world use, sad. I was quite excited when I saw DrQA and this project and wanted to create a prototype talking bot for my dad's NGO school to showcase the technology and have kids interact and answer general questions to it, but so far the accuracy and relevancy seems quite bad to me. Was this trained on the whole Wiki dump or partially ? And how long was it trained for ? Asking to know if anything can be done to make it better and more accurate (maybe train more ?). |
@shubhank008 Exactly same question! Apparently the answer is not satisfying, I even can not reproduce the same answer as the demo shows. Any idea? |
@IgnatovFedor Could you help here, which configuration file are we using on the demo page? |
@yurakuratov, we are using |
Running the default configuration and model on a EC2 p2.xlarge instance (60~GB Ram and Nvidia K80 GPU) and inference for simple questions take 40 seconds to 5 minutes.
Sometimes, no result even after 10 minutes.
The text was updated successfully, but these errors were encountered: