Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6113 from jmdavis/issue15949
Fix issue 15949: Make readText check BOMs. merged-on-behalf-of: Vladimir Panteleev <github@thecybershadow.net>
- Loading branch information
Showing
2 changed files
with
235 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
readText now checks BOMs | ||
|
||
$(REF readText, std, file) now checks for a | ||
$(HTTP https://en.wikipedia.org/wiki/Byte_order_mark, BOM). If a BOM is present | ||
and it is for UTF-8, UTF-16, or UTF-32, $(REF readText, std, file) verifies | ||
that it matches the requested string type and the endianness of the machine, | ||
and if there is a mismatch, a $(REF UTFException, std, utf) is thrown without | ||
bothering to validate the string. | ||
|
||
If there is no BOM, or if the BOM is not for UTF-8, UTF-16, or UTF-32, then the | ||
behavior is what it's always been, and UTF validation continues as per normal, | ||
so if the text isn't valid for the requested string type, a | ||
$(REF UTFException, std, utf) will be thrown. | ||
|
||
In addition, before the buffer is cast to the requested string type, the | ||
alignment is checked (e.g. 5 bytes don't fit cleanly in an array of $(D wchar) | ||
or $(D dchar)), and a $(REF UTFException, std, utf) is now throw if the | ||
number of bytes does not align with the requested string type. Previously, the | ||
alignment was not checked before casting, so if there was an alignment mismatch, | ||
the cast would throw an Error, killing the program. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters