Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore support? #22

Open
hthuwal opened this issue Oct 3, 2018 · 15 comments
Open

Multicore support? #22

hthuwal opened this issue Oct 3, 2018 · 15 comments

Comments

@hthuwal
Copy link

hthuwal commented Oct 3, 2018

I am trying to run it on a 1.5 GB text file. The model uses only a single core and hence it's taking too long.

I couldn't find a flag to specify the number of threads to use. Is there a way to run the model on multiple cores?

@guilherme-salome
Copy link

guilherme-salome commented Oct 3, 2018

I couldn't either, an easy workaround is to just split the text file and launch two processes, one for each half of the file.

@hthuwal
Copy link
Author

hthuwal commented Oct 3, 2018 via email

@guilherme-salome
Copy link

If you find a more efficient solution or update the code to allow for multicores please post it here!

@guilherme-salome
Copy link

@hthuwal I've been using this project to go over a lot of text and I was running it in a single powerful machine and it was veeery slow. I then went to digitalocean and got one of the high tier droplets and started running openie5 in parallel (with https://www.gnu.org/software/parallel/) and on a small number of phrases (1000 at a time, more to debug really, but could be increased). The processing time was about 3 minutes for each 1000 (1 min for loading up open ie 5 more or less).

At first I was trying with the -Xmx10g -XX:+UseConcMarkSweepGC option and it was not working at all, no lines were being parsed. This options seems to work in RedHat and MacOs but did not work for me in Ubuntu 18.04. I removed the options and it started working.
However, I noticed that the memory usage was higher than 10gb, about 13gb per openie 5 process.
I was also using top to monitor cpu usage, and each process was using about 150% CPU on average.
With 64gb of RAM I was able to run 4 processes simultaneously (a 5th would crash because of low memory).

The droplet I was using has 64gb of RAM + 32vCPUs. The type of the droplet is: "CPU Optimized droplet".
There is another type that is called "Standard Droplets", and its highest tier has 192gb of RAM + 32vCPUs.
Since the bottleneck in the CPU Optimized droplet was the RAM, it may be possible to run more processes in the Standard droplet, even though there the CPUs are less powerful.

@guilherme-salome
Copy link

guilherme-salome commented Oct 5, 2018

Update: I tested their Standard Droplet with 192gb of memory and 32vCPUs and I was able to run 8 processes at the same time. That consumed 92% ish of the memory. The average CPU use was 1200%. So the bottleneck is definitely memory.

Update: Looking at the top output it seems there is still some memory free with 8 processes, so I think maybe 9 or 10 could run in parallel. 12 processes definitely does not work, neither 11.

Anyways, maybe this can help you speed up. Btw digital ocean (referral link) is giving $100 for use during october. That can buy about 60 hours of the most expensive droplet.

@hthuwal
Copy link
Author

hthuwal commented Oct 6, 2018

Thanx @Salompas. Yes, memory is the bottleneck because the process requires ~10 GB of memory just to run. I have access to a machine with about ~80GB of RAM and 32 cores. I was able to run 3 processes simultaneously. Any further increase in the number of processes chokes up the machine.

Thanx for reminding about the parallel command. I totally forgot about this and wrote a script that splits the data and spawns processes in multiple tmux windows.

@bhadramani
Copy link

openie 4.2 + had multi core support with multi threaded environment ( With approx constant RAM usage ) .
Performance recommendation

  1. For N threads use N=1 core . You may observe Nx improvement up to 8 cores.
  2. Use taskset.

Swarna may update , OpenIe 5.x is thread safe?

@bhadramani
Copy link

One more performance related suggestion, reading the files is costly , so smaller chunks must help. Choosing chunk size is another smart thing to do.
Similarly writing the output , should be done smartly ( For very very large data , may consider using RabittMQ or any similar system , which maintains the Q and save asynchronously.

@ambujpd
Copy link

ambujpd commented Feb 17, 2020

@vaibhavad @swarnaHub @harrysethi @schmmd @bhadramani
Could you please suggest on the approach to multicore support?
Alternatively, is it possible to load the model in a separate process such that it can be shared (since model size is one of the major bottlenecks)?

I tried to naively use concurrent Futures in Scala and divided sentences among them (in OpenIECli.scala). (I found OpenNLP Chunker as non thread-safe so I put it in blocking{}). But this is not giving me any improvement. For 8 concurrent futures (and 80 sentences), run time is slightly slower than serial. The extracts are getting serialized at some point, although they are running in different threads.

PS: I also see some nThreads set to 1 in some targets:

edu/stanford/nlp/models/pos-tagger/wsj-0-18-left3words-nodistsim.tagger.props:                nthreads = 1
edu/stanford/nlp/models/pos-tagger/english-bidirectional/english-bidirectional-distsim.tagger.props:                nthreads = 1
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger.props:                nthreads = 1

@ambujpd
Copy link

ambujpd commented Mar 2, 2020

@vaibhavad @swarnaHub @harrysethi @schmmd @bhadramani
Could you please suggest on the approach to multicore support?
Alternatively, is it possible to load the model in a separate process such that it can be shared (since model size is one of the major bottlenecks)?

I tried to naively use concurrent Futures in Scala and divided sentences among them (in OpenIECli.scala). (I found OpenNLP Chunker as non thread-safe so I put it in blocking{}). But this is not giving me any improvement. For 8 concurrent futures (and 80 sentences), run time is slightly slower than serial. The extracts are getting serialized at some point, although they are running in different threads.

PS: I also see some nThreads set to 1 in some targets:

edu/stanford/nlp/models/pos-tagger/wsj-0-18-left3words-nodistsim.tagger.props:                nthreads = 1
edu/stanford/nlp/models/pos-tagger/english-bidirectional/english-bidirectional-distsim.tagger.props:                nthreads = 1
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger.props:                nthreads = 1

The multithreaded implementation is working now, giving 4X improvement with 6 threads (tried on a 20-core machine. Increasing threads further showed no further improvement). The reason it wasn't showing any improvement earlier was that I was using too less heap memory - 10G. Increasing 10G to 12G gave substantial improvement in runtime already (around 10X in extractions).

@vaibhavad
Copy link
Collaborator

@ambujpd
Glad to know that multithreading implementation is working. Can you share the changes you made to make it work? in a pull request? We can test them and merge them with the codebase.

@ambujpd
Copy link

ambujpd commented Mar 5, 2020

@vaibhavad

With higher number of threads (8+), I sporadically see one or two sentences (out of 80) throwing NullPointerException from OpenNLP Chunker, even though I've put that call within Blocking. I'm looking into it currently.

@moinnadeem
Copy link

@ambujpd Hey! Are you able to share your multithreaded implementation? It would be super useful for me personally, and cut down my development time by quite a bit. Happy to spend time on the code to help if necessary

@ambujpd
Copy link

ambujpd commented Sep 25, 2020

@moinnadeem I don't have the code with me unfortunately (I remember I was able able to use some thread-safe NLP chunker, along with Scala concurrency and had gotten rid of sporadic NullPointerException issue). But in conclusion I found it was not worth the effort as scalability was quite limited. A much better alternative is multi-processing (at the cost of extra memory) which I eventually ended up using.

@vaibhavad
Copy link
Collaborator

Hi @ambujpd @moinnadeem @bhadramani @hthuwal @Salompas,

We have just released a neural OpenIE system - OpenIE6, which is better in performance and at least 10x faster than OpenIE-5 (if you run it on a GPU). You can check it out here - https://github.com/dair-iitd/openie6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants