disable unnecessary spacy pipeline components#121
disable unnecessary spacy pipeline components#121ygorg merged 2 commits intoboudinfl:masterfrom sp1thas:master
Conversation
|
Thanks, though the parser is still there. According to spacy/pipeline, the parser is needed for sentence tokenisation. But also computes dependencies (which nlp = spacy.load('fr')
nlp = spacy.load('fr', disable=['ner', 'textcat'])
nlp = spacy.load('fr', disable=['ner', 'textcat', 'parser'])
nlp.add_pipe(nlp.create_pipe('sentencizer'))The text preprocessed is http://abu.cnam.fr/cgi-bin/donner?nddp1 (only the first 999999 first characters to match spacy's limitation). The time reported is a mean of 5 runs (in seconds).
|
|
Well this is a huge improvement. When I tried to remove Therefore, your recommendation is totally right. |
|
Just ran pytest it works, so i'm merging thanks ! |
Optimize
pke.readers.RawTextReaderby removing unnecessary components (ner,textcat) fromspacypipeline.related issue: #118