tokenize error #8

longma307 · 2016-01-26T18:05:37Z

I have been following your instruction to test lda2vec, but I got an error when I tried to run this line:
tokens, vocab = preprocess.tokenize(texts,max_length,tag=False,parse=False,entity=False)

runfile('/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py', wdir='/Users/m/Dropbox/Athena/Feature_Reduction/WordVectors')
Traceback (most recent call last):

File "", line 1, in
runfile('/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py', wdir='/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors')

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 81, in execfile
builtins.execfile(filename, *where)

File "/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py", line 29, in
tokens, vocab = preprocess.tokenize(texts,max_length,tag=False,parse=False,entity=False)

File "build/bdist.macosx-10.5-x86_64/egg/lda2vec/preprocess.py", line 65, in tokenize
nlp = English(data_dir=data_dir)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/language.py", line 210, in init
vocab = self.default_vocab(package)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/language.py", line 144, in default_vocab
return Vocab.from_package(package, get_lex_attr=get_lex_attr)

File "spacy/vocab.pyx", line 65, in spacy.vocab.Vocab.from_package (spacy/vocab.cpp:3592)
with package.open(('vocab', 'strings.json')) as file_:

File "/Users/lm/Documents/anaconda/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/sputnik/package_stub.py", line 68, in open
raise default(self.file_path(*path_parts))

IOError: /Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/en/data/vocab/strings.json

I have my related modules(numpy,space...) updated to the newest, I still got this error.

jcquarto · 2016-01-27T23:11:42Z

I am experiencing this problem as well, exactly the same

cemoody · 2016-02-03T17:42:32Z

This seems like a spacy error -- have y'all tried downloading the vocab files that accompany spacy?

python -m spacy.en.download ?

longma307 · 2016-02-03T18:43:39Z

@cemoody Yes, I did and I also upgraded the required module (numpy, spacy...) to the newest one, but this error is still exist.

cemoody · 2016-02-09T02:59:27Z

So it looks like it's an issue with SpaCy:

explosion/spaCy#183

explosion/spaCy#155

...I can't reproduce this, so it's tough for me to debug. All I can really do is what Honnibal is suggesting -- try adding the --force flag by doing python -m spacy.en.download --force all?

longma307 · 2016-02-10T17:39:13Z

@cemoody That works for me finally, thanks for help.

cemoody · 2016-02-10T17:51:54Z

@longma307 Glad it helped! :)

cemoody closed this as completed Feb 11, 2016

abelsonlive mentioned this issue Feb 23, 2016

Incompatibility with spacy>=0.100 #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenize error #8

tokenize error #8

longma307 commented Jan 26, 2016

jcquarto commented Jan 27, 2016

cemoody commented Feb 3, 2016

longma307 commented Feb 3, 2016

cemoody commented Feb 9, 2016

longma307 commented Feb 10, 2016

cemoody commented Feb 10, 2016

tokenize error #8

tokenize error #8

Comments

longma307 commented Jan 26, 2016

jcquarto commented Jan 27, 2016

cemoody commented Feb 3, 2016

longma307 commented Feb 3, 2016

cemoody commented Feb 9, 2016

longma307 commented Feb 10, 2016

cemoody commented Feb 10, 2016