-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing "TypeError: '<' not supported between … 'str' and 'int'" when doc-tag not present for most_similar()
#1737
Comments
The tag We should in fact be showing a better error here, by verifying (Additionally, just as we're showing a WARNING when a plain-string is offered as |
most_similar()
Yes, that error message is terrible, for such a (relatively) common and easy-to-make mistake. |
In fact, that whole int-vs-string design is clearly confusing, users report issues there all the time. At the same time, the "expected contract" behaviour is complex to explain, which is a possible code smell. What can we do to make the API saner? Drop the ints? Drop strings? What was the original rationale for this design? CC @manneshiva |
@piskvorky one of the possible solution - write clear docstring and describe types + examples of usage |
Sure, better documentation always helps. But in this case, I wonder if we can do better with the API itself. |
The motivation for allowing plain-int tags, and handling them specially (in fact more-simply), is to allow sophisticated users the option of avoiding the string-to-int lookup dict overhead. That overhead could be significant for giant training sets. (The much more common issue, and the primary problem here, is failing to understand |
Sure, I remember that But can we streamline the API so that users stop failing to understand that? Disable parameter type overloading, provide better checks, better error messages? "Too many users misunderstanding" = API needs some re-thinking. |
My suggestions are as in #1737 (comment) |
I am facing an issue while using the model.docvecs.most_similar() function.
gensim: 2.3.0
Python 3.6.0
Error message: '<' not supported between instances of 'str' and 'int'
My code follows:
The class LabeledLineSentence is as follows:
Error StackTrace:
#1586
The text was updated successfully, but these errors were encountered: