-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
Milestone
Description
Description
The StreamReader class should return a replacement character (U+FFFD) if it encounters an incomplete UTF-8 sequence. This does not work if the incomplete UTF-8 sequence is at the end of the input.
Test program:
using System;
using System.IO;
using System.Text;
namespace Test
{
class Program
{
static void Main(string[] args)
{
var bytes = new byte[] { 0xd9, 0x41, 0x42, 0x43, 0xd9 };
MemoryStream stream = new MemoryStream(bytes);
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
for (int i = 0; i < bytes.Length; i++) {
Console.WriteLine(reader.Read());
}
}
}
}
Output of this test program:
65533
65
66
67
-1
The incomplete UTF-8 sequence at the beginning of the input has been handled properly (65533 is the replacement character U+FFFD), but the incomplete UTF-8 sequence at the end of the input is ignored.
Expected output:
65533
65
66
67
65533
For comparison, consider this Java program - it produces the expected output:
import java.io.*;
public class Test
{
public static void main(String args[]) throws IOException
{
byte[] bytes = new byte[] { (byte)0xd9, 0x41, 0x42, 0x43, (byte)0xd9 };
InputStream input = new ByteArrayInputStream(bytes);
InputStreamReader reader = new InputStreamReader(input, "UTF-8");
for (int i = 0; i < bytes.length; i++) {
System.out.println(reader.read());
}
}
}
Configuration
# dotnet --info
.NET SDK (reflecting any global.json):
Version: 5.0.104
Commit: ca6b6acadb
Runtime Environment:
OS Name: fedora
OS Version: 33
OS Platform: Linux
RID: fedora.33-x64
Base Path: /usr/lib64/dotnet/sdk/5.0.104/
Host (useful for support):
Version: 5.0.4
Commit: f27d337295
.NET SDKs installed:
5.0.104 [/usr/lib64/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 5.0.4 [/usr/lib64/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 5.0.4 [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET runtimes or SDKs:
https://aka.ms/dotnet-download
Regression?
Compared to earlier .NET versions: No
Compared to Java: Yes
Other information
This bug is relevant for programs that want to do input validation, e.g. check that a byte array contains valid UTF-8.