Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

执行token.py出错 #4

Open
AaronWhite95 opened this issue May 15, 2019 · 1 comment
Open

执行token.py出错 #4

AaronWhite95 opened this issue May 15, 2019 · 1 comment

Comments

@AaronWhite95
Copy link

请问在token.py中执行创建词袋的步骤时,报如下的错是为什么呢?stackoverflow上的方法都行不通
Traceback (most recent call last):
File "feature_extract.py", line 51, in
tokens = token.get_tokens()
File "/home/xfbai/Entity-Relation-SVM-master/new_token.py", line 65, in get_tokens
X_train_counts = vectorizer.fit_transform(cut_docs)
File "/home/xfbai/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1031, in fit_transform
self.fixed_vocabulary_)
File "/home/xfbai/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 962, in _count_vocab
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

谢谢

@Da-Capo
Copy link
Owner

Da-Capo commented Jun 4, 2019

这个代码也算是上古黑历史了[捂脸],不排除版本问题。我查了下这个报错是生成了空的词袋,你可以试着打印下 cut_docs 的内容确定是不是分词出了问题,或者看下CountVectorizer()的参数是不是有问题,还不行的话再向上排查。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants