Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
nsSBCharSetProber: multiply confidence by ratio of positive seqs per …
…chars. If all sequences in a text are positive sequences, the ratio of positive sequences cannot make the difference between 2 very close charsets. A ratio of positive sequences per letters on the other hand will change a tie between 2 encoding. If while adding a letter, the number of positive sequences does not increase, the confidence will decrease (corresponding to the fact it was likely not a letter). On the other hand, if the number of positive sequences increase, so will the confidence. For instance this fixes wrong detections of ISO-8859-1 and ISO-8859-15. When letters only available in ISO-8859-15 appear in a text, we expect confidence to tilt towards the close yet slightly different ISO-8859-15.
- Loading branch information