Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement batch size and n_thread options #185

Open
kbenoit opened this issue Feb 27, 2020 · 1 comment
Open

Implement batch size and n_thread options #185

kbenoit opened this issue Feb 27, 2020 · 1 comment

Comments

@kbenoit
Copy link
Collaborator

kbenoit commented Feb 27, 2020

This is fine:

> library("spacyr")
> sp <- spacy_parse(data_char_sentences)
spacy python option is already set, spacyr will use:
	condaenv = "spacy_condaenv"
successfully initialized (spaCy Version: 2.1.8, language model: en)
(python options: type = "condaenv", value = "spacy_condaenv")

but this crashes R:

library("spacyr")

data(data_corpus_sotu, package = "quanteda.corpora")
sp <- spacy_parse(quanteda::texts(data_corpus_sotu))
@amatsuo amatsuo self-assigned this Feb 27, 2020
@kbenoit kbenoit changed the title spacy_parse() not handling SOTU corpus Implement batch size and n_thread options Sep 1, 2022
@kbenoit
Copy link
Collaborator Author

kbenoit commented Sep 1, 2022

Parsing the entire 241 document SOTU corpus worked (in 2022!) on my M1 Max mac (with 64GB RAM) after a few minutes. The resulting fully parsed object has > 2.2m tokens.

But it would be nice to implement the multi-threading arguments, including batch_size, documented in https://spacy.io/usage/processing-pipelines#multiprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants