You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the UTF8Encoding class encounters an ill-formed UTF-8 byte sequence during a bytes-to-chars transcoding operation, it will replace that sequence with a '�' (U+FFFD REPLACEMENT CHARACTER) character in the output string. .NET Core 3.0 differs from previous versions of .NET Core and the .NET Framework in that .NET Core 3.0 follows the Unicode best practice for performing this replacement during the transcoding operation.
Version introduced
3.0
Change description
When transcoding bytes to chars, the UTF8Encoding class now performs character substitution based on Unicode best practices. The substitution mechanism used is described by The Unicode Standard, Version 12.0, Sec. 3.9 (PDF) in the heading titled U+FFFD Substitution of Maximal Subparts.
This behavior only applies when the input byte sequence contains ill-formed UTF-8 data. Additionally, if the UTF8Encoding instance has been constructed with throwOnInvalidBytes: true (see the ctor documentation), the UTF8Encoding instance will continue to throw on invalid input rather than perform U+FFFD replacement.
Old behavior
Input: The 3-byte input: [ ED A0 90 ] (ill-formed input)
Output: The 2-char output: [ FFFD FFFD ]
New behavior
Input: The 3-byte input: [ ED A0 90 ] (ill-formed input)
Output: The 3-char output: [ FFFD FFFD FFFD ]
(This 3-char output is the preferred output per Table 3-9 of the previously linked Unicode Standard PDF.)
Reason for change
This is part of a larger effort to improve UTF-8 handling throughout .NET, including by the new System.Text.Unicode.Utf8 and System.Text.Rune types. The UTF8Encoding type was given improved error handling mechanics so that it produces output consistent with the newly introduced types.
Recommended action
No action is required on the part of the developer.
.NET Core 3.0 follows Unicode best practices when replacing ill-formed UTF-8 byte sequences
See .NET Core 3.0 follows Unicode best practices when replacing ill-formed UTF-8 byte sequences for updated documentation for this change.
When the
UTF8Encoding
class encounters an ill-formed UTF-8 byte sequence during a bytes-to-chars transcoding operation, it will replace that sequence with a '�' (U+FFFD REPLACEMENT CHARACTER) character in the output string. .NET Core 3.0 differs from previous versions of .NET Core and the .NET Framework in that .NET Core 3.0 follows the Unicode best practice for performing this replacement during the transcoding operation.Version introduced
3.0
Change description
When transcoding bytes to chars, the
UTF8Encoding
class now performs character substitution based on Unicode best practices. The substitution mechanism used is described by The Unicode Standard, Version 12.0, Sec. 3.9 (PDF) in the heading titled U+FFFD Substitution of Maximal Subparts.This behavior only applies when the input byte sequence contains ill-formed UTF-8 data. Additionally, if the
UTF8Encoding
instance has been constructed withthrowOnInvalidBytes: true
(see the ctor documentation), theUTF8Encoding
instance will continue to throw on invalid input rather than perform U+FFFD replacement.Old behavior
Input: The 3-byte input:
[ ED A0 90 ]
(ill-formed input)Output: The 2-char output:
[ FFFD FFFD ]
New behavior
Input: The 3-byte input:
[ ED A0 90 ]
(ill-formed input)Output: The 3-char output:
[ FFFD FFFD FFFD ]
(This 3-char output is the preferred output per Table 3-9 of the previously linked Unicode Standard PDF.)
Reason for change
This is part of a larger effort to improve UTF-8 handling throughout .NET, including by the new
System.Text.Unicode.Utf8
andSystem.Text.Rune
types. TheUTF8Encoding
type was given improved error handling mechanics so that it produces output consistent with the newly introduced types.Recommended action
No action is required on the part of the developer.
Category
Core
Affected APIs
Issue metadata
The text was updated successfully, but these errors were encountered: