-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior on small strings #37
Comments
Detection for small strings in general is pretty flakey, as the algorithm is intended to work on larger samples. That said, I'm surprised to see the output is so different between major Python versions. Thanks for bringing that to our attention. |
Hey @mitya57 thanks for reporting this. I just want you to realize that both @dan-blanchard and I are the maintainers and we both have a lot of other responsibilities and very little time to devote to chardet. If you wait for us to attempt to fix this, it will probably be a big wait time. If you have time to spare and care to investigate this, you would be really accelerating how quickly this is fixed. Cheers |
Hmm, looks like Python 2.7 has some problems with interpreting unicode given inside
|
@mitya57 That makes a lot more sense. I didn't think about the fact that you were calling it from the command-line with Anyway, the difference in results between one and two character strings is to be expected. The more data you give it, the more accurate it will be, so it is wrong for one-character and correct for two characters. |
OK, that's understandable. Actually I filed this issue because currently beautifylsoup4's tests fail with an error, I will now file a pull request against beautifulsoup4 to use double |
UPD: Deleted python2.7 example because it was not working properly. See a comment below for a better test case.
This is all on Debian GNU/Linux unstable with the current master:
The second line should be
utf-8
as well, notwindows-1252
.The text was updated successfully, but these errors were encountered: