Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


feeding tiny buffers can cause incorrect detection #6

welwood08 opened this Issue · 0 comments

2 participants


I know it's only a very minor thing, but I was looking at the implementation and it struck me.

If for example I feed the single character '\xEF', the code path that checks for BOMs will not find anything. If I then feed the 2 characters '\xBB\xBF' (completing the UTF-8 BOM), the BOM checking code path is skipped. If the detector is then closed, the UTF-8 BOM is detected as windows-1252 with 95% confidence...

@aadsm aadsm referenced this issue from a commit
@aadsm Allows streaming of UTF w/ BOM strings [gh-6]
UTF strings with BOM were only being detected if the entire DOM was part of the first feed().
If a string were to be streamed and the BOM given in different feed() calls the detector would incorrectly detect a different encoding.
@aadsm aadsm closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.