Auto-calculate the batch size? #7

ajratner · 2015-08-04T04:35:29Z

@raphaelhoffmann Especially since the loading time of coreNLP is so (relatively) long, it seems like a better default than batch_size=1000 would be to divide lines by the number of cores (or cores*nodes for distribute). For example I just got a 2x speedup on an ec-2 node really easily here (had forgotten to set batch size first time around...). Thoughts?

The text was updated successfully, but these errors were encountered:

raphaelhoffmann · 2015-08-04T07:29:51Z

That's a great idea! Please check this in.

ajratner · 2015-08-04T07:34:32Z

Sure, will do tomorrow

On Tue, Aug 4, 2015 at 12:29 AM Raphael Hoffmann notifications@github.com
wrote:

That's a great idea! Please check this in.

—
Reply to this email directly or view it on GitHub
#7 (comment).

ajratner · 2015-08-05T21:26:02Z

Will push my new wrapper functions (in the fabfile) when done processing everything...

ajratner self-assigned this Aug 5, 2015

ajratner added the enhancement label Aug 6, 2015

ajratner closed this as completed in 5de0388 Aug 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-calculate the batch size? #7

Auto-calculate the batch size? #7

ajratner commented Aug 4, 2015

raphaelhoffmann commented Aug 4, 2015

ajratner commented Aug 4, 2015

ajratner commented Aug 5, 2015

Auto-calculate the batch size? #7

Auto-calculate the batch size? #7

Comments

ajratner commented Aug 4, 2015

raphaelhoffmann commented Aug 4, 2015

ajratner commented Aug 4, 2015

ajratner commented Aug 5, 2015