Skip to content

Add UTF-7 to replacement encoding list? / Encoding sniffing #68

@jungshik

Description

@jungshik

Maybe this was discussed before, but I couldn't find a bug on this. What do you think of treating UTF-7 the same way as ISO-2022-{KR,CN}, HZ-GB, etc?

When decoding, the whole input is replaced by U+FFFD. When encoding, use UTF-8.

Background: Blink began to use Compact Encoding Detector ( google/compact_enc_det ) when no encoding label is found (http, meta). When 7-bit encoding detection is on, it detects ISO-2022-{KR,CN}, HZ-GB AND UTF-7 in addition to ISO-2022-JP. 7-bit encoding detection is ON for ISO-2022-JP, but we want to suppress other 7-bit encodings. I think the best way to 'suppress' (unsupport) them is to turn the whole input to U+FFFD.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions