Skip to content

Commit fc41c28

Browse files
MacDueawesomekling
authored andcommitted
LibWeb: Fix utf16-be check in HTMLEncodingDetection
The utf-16be check mistakenly skipped index 3, so was not checking the correct bytes. This meant UTF16-BE files could fail to decode.
1 parent 5e973fc commit fc41c28

File tree

3 files changed

+24
-2
lines changed

3 files changed

+24
-2
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Viewport <#document> at (0,0) content-size 800x600 children: not-inline
2+
BlockContainer <html> at (0,0) content-size 800x41.46875 [BFC] children: not-inline
3+
BlockContainer <body> at (8,16) content-size 784x17.46875 children: not-inline
4+
BlockContainer <(anonymous)> at (8,16) content-size 784x0 children: inline
5+
TextNode <#text>
6+
BlockContainer <p> at (8,16) content-size 784x17.46875 children: inline
7+
line 0 width: 29.21875, height: 17.46875, bottom: 17.46875, baseline: 13.53125
8+
frag 0 from TextNode start: 1, length: 15, rect: [8,16 29.21875x17.46875]
9+
"好啦朋友們"
10+
TextNode <#text>
11+
BlockContainer <(anonymous)> at (8,49.46875) content-size 784x0 children: inline
12+
TextNode <#text>
13+
14+
ViewportPaintable (Viewport<#document>) [0,0 800x600]
15+
PaintableWithLines (BlockContainer<HTML>) [0,0 800x41.46875] overflow: [0,0 800x49.46875]
16+
PaintableWithLines (BlockContainer<BODY>) [8,16 784x17.46875] overflow: [8,16 784x33.46875]
17+
PaintableWithLines (BlockContainer(anonymous)) [8,16 784x0]
18+
PaintableWithLines (BlockContainer<P>) [8,16 784x17.46875]
19+
TextPaintable (TextNode<#text>)
20+
PaintableWithLines (BlockContainer(anonymous)) [8,49.46875 784x0]
470 Bytes
Binary file not shown.

Userland/Libraries/LibWeb/HTML/Parser/HTMLEncodingDetection.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -251,10 +251,12 @@ Optional<ByteString> run_prescan_byte_stream_algorithm(DOM::Document& document,
251251
// https://html.spec.whatwg.org/multipage/parsing.html#prescan-a-byte-stream-to-determine-its-encoding
252252

253253
// Detects '<?x'
254-
if (!prescan_should_abort(input, 6)) {
254+
if (!prescan_should_abort(input, 5)) {
255+
// A sequence of bytes starting with: 0x3C, 0x0, 0x3F, 0x0, 0x78, 0x0
255256
if (input[0] == 0x3C && input[1] == 0x00 && input[2] == 0x3F && input[3] == 0x00 && input[4] == 0x78 && input[5] == 0x00)
256257
return "utf-16le";
257-
if (input[0] == 0x00 && input[1] == 0x3C && input[2] == 0x00 && input[4] == 0x3F && input[5] == 0x00 && input[6] == 0x78)
258+
// A sequence of bytes starting with: 0x0, 0x3C, 0x0, 0x3F, 0x0, 0x78
259+
if (input[0] == 0x00 && input[1] == 0x3C && input[2] == 0x00 && input[3] == 0x3F && input[4] == 0x00 && input[5] == 0x78)
258260
return "utf-16be";
259261
}
260262

0 commit comments

Comments
 (0)