Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte Order Mark skipping #51

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Byte Order Mark skipping #51

wants to merge 2 commits into from

Conversation

andreww
Copy link
Owner

@andreww andreww commented Apr 3, 2018

Skip any Byte Order Marks and keep trying. We may be able to read some files.

andreww and others added 2 commits April 3, 2018 17:06
Try reading two files with Byte Order Marks.
test_sax_fsm_1_utf8_bom.in is UTF8 encoded
(but this is not mentioned in the XML header).
We should be able to read this as long as we don't
trip up with the BOM. test_sax_fsm_1_utf16_bom.in
is UTF16 encoded with a BOM and encoding declared
in the XML. We should not be able to read this
(we should get a non-well-formed error).
For a UTF8-encoded XML file with a Byte Order Mark and
characters that are also ascii characters, we should be
able to read the file. If the first character is not-
recognisable assume we are dealing with a BOM, skip it,
and carry on. We'll then either read the file OK or we
end up with something that is not well-formed (e.g. because
it is a different encoding).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant