Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenize error #8

Closed
longma307 opened this issue Jan 26, 2016 · 6 comments
Closed

tokenize error #8

longma307 opened this issue Jan 26, 2016 · 6 comments

Comments

@longma307
Copy link

I have been following your instruction to test lda2vec, but I got an error when I tried to run this line:
tokens, vocab = preprocess.tokenize(texts,max_length,tag=False,parse=False,entity=False)

runfile('/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py', wdir='/Users/m/Dropbox/Athena/Feature_Reduction/WordVectors')
Traceback (most recent call last):

File "", line 1, in
runfile('/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py', wdir='/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors')

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 81, in execfile
builtins.execfile(filename, *where)

File "/Users/lm/Dropbox/Athena/Feature_Reduction/WordVectors/lda2vec_test.py", line 29, in
tokens, vocab = preprocess.tokenize(texts,max_length,tag=False,parse=False,entity=False)

File "build/bdist.macosx-10.5-x86_64/egg/lda2vec/preprocess.py", line 65, in tokenize
nlp = English(data_dir=data_dir)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/language.py", line 210, in init
vocab = self.default_vocab(package)

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/language.py", line 144, in default_vocab
return Vocab.from_package(package, get_lex_attr=get_lex_attr)

File "spacy/vocab.pyx", line 65, in spacy.vocab.Vocab.from_package (spacy/vocab.cpp:3592)
with package.open(('vocab', 'strings.json')) as file_:

File "/Users/lm/Documents/anaconda/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()

File "/Users/lm/Documents/anaconda/lib/python2.7/site-packages/sputnik/package_stub.py", line 68, in open
raise default(self.file_path(*path_parts))

IOError: /Users/lm/Documents/anaconda/lib/python2.7/site-packages/spacy/en/data/vocab/strings.json

I have my related modules(numpy,space...) updated to the newest, I still got this error.

@jcquarto
Copy link

I am experiencing this problem as well, exactly the same

@cemoody
Copy link
Owner

cemoody commented Feb 3, 2016

This seems like a spacy error -- have y'all tried downloading the vocab files that accompany spacy?

python -m spacy.en.download ?

@longma307
Copy link
Author

@cemoody Yes, I did and I also upgraded the required module (numpy, spacy...) to the newest one, but this error is still exist.

@cemoody
Copy link
Owner

cemoody commented Feb 9, 2016

So it looks like it's an issue with SpaCy:

explosion/spaCy#183

explosion/spaCy#155

...I can't reproduce this, so it's tough for me to debug. All I can really do is what Honnibal is suggesting -- try adding the --force flag by doing python -m spacy.en.download --force all?

@longma307
Copy link
Author

@cemoody That works for me finally, thanks for help.

@cemoody
Copy link
Owner

cemoody commented Feb 10, 2016

@longma307 Glad it helped! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants