Implement batch size and n_thread options #185

kbenoit · 2020-02-27T00:45:40Z

This is fine:

> library("spacyr")
> sp <- spacy_parse(data_char_sentences)
spacy python option is already set, spacyr will use:
	condaenv = "spacy_condaenv"
successfully initialized (spaCy Version: 2.1.8, language model: en)
(python options: type = "condaenv", value = "spacy_condaenv")

but this crashes R:

library("spacyr")

data(data_corpus_sotu, package = "quanteda.corpora")
sp <- spacy_parse(quanteda::texts(data_corpus_sotu))

kbenoit · 2022-09-01T08:37:51Z

Parsing the entire 241 document SOTU corpus worked (in 2022!) on my M1 Max mac (with 64GB RAM) after a few minutes. The resulting fully parsed object has > 2.2m tokens.

But it would be nice to implement the multi-threading arguments, including batch_size, documented in https://spacy.io/usage/processing-pipelines#multiprocessing.

amatsuo self-assigned this Feb 27, 2020

kbenoit changed the title ~~spacy_parse() not handling SOTU corpus~~ Implement batch size and n_thread options Sep 1, 2022

kbenoit mentioned this issue Sep 1, 2022

Add a vignette/article on improving performance #226

Open

kbenoit added the enhancement label Sep 1, 2022

This was referenced Sep 1, 2022

multithreaded for pos tagging only supported? #199

Closed

unusally high memory footprint #202

Closed

Using multithreading #206

Closed

kbenoit unassigned amatsuo Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement batch size and n_thread options #185

Implement batch size and n_thread options #185

kbenoit commented Feb 27, 2020

kbenoit commented Sep 1, 2022

Implement batch size and n_thread options #185

Implement batch size and n_thread options #185

Comments

kbenoit commented Feb 27, 2020

kbenoit commented Sep 1, 2022