Skip to content

StreamReader ignores incomplete UTF-8 sequence at the end of the input #50994

@mkauf

Description

@mkauf

Description

The StreamReader class should return a replacement character (U+FFFD) if it encounters an incomplete UTF-8 sequence. This does not work if the incomplete UTF-8 sequence is at the end of the input.

Test program:

using System;
using System.IO;
using System.Text;

namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
		var bytes = new byte[] { 0xd9, 0x41, 0x42, 0x43, 0xd9 };

		MemoryStream stream = new MemoryStream(bytes);
		StreamReader reader = new StreamReader(stream, Encoding.UTF8);

		for (int i = 0; i < bytes.Length; i++) {
			Console.WriteLine(reader.Read());
		}
        }
    }
}

Output of this test program:

65533
65
66
67
-1

The incomplete UTF-8 sequence at the beginning of the input has been handled properly (65533 is the replacement character U+FFFD), but the incomplete UTF-8 sequence at the end of the input is ignored.

Expected output:

65533
65
66
67
65533

For comparison, consider this Java program - it produces the expected output:

import java.io.*;

public class Test
{
	public static void main(String args[]) throws IOException
	{
		byte[] bytes = new byte[] { (byte)0xd9, 0x41, 0x42, 0x43, (byte)0xd9 };
		
		InputStream input = new ByteArrayInputStream(bytes);
		InputStreamReader reader = new InputStreamReader(input, "UTF-8");

		for (int i = 0; i < bytes.length; i++) {
			System.out.println(reader.read());
		}
	}
}

Configuration

# dotnet --info
.NET SDK (reflecting any global.json):
 Version:   5.0.104
 Commit:    ca6b6acadb

Runtime Environment:
 OS Name:     fedora
 OS Version:  33
 OS Platform: Linux
 RID:         fedora.33-x64
 Base Path:   /usr/lib64/dotnet/sdk/5.0.104/

Host (useful for support):
  Version: 5.0.4
  Commit:  f27d337295

.NET SDKs installed:
  5.0.104 [/usr/lib64/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 5.0.4 [/usr/lib64/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 5.0.4 [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET runtimes or SDKs:
  https://aka.ms/dotnet-download

Regression?

Compared to earlier .NET versions: No
Compared to Java: Yes

Other information

This bug is relevant for programs that want to do input validation, e.g. check that a byte array contains valid UTF-8.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions