Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EncoderNLS.Convert doesn't always out the correct value for 'completed' #12183

Closed
GrabYourPitchforks opened this issue Mar 5, 2019 · 4 comments
Assignees
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

Repro code:

var encoding = Encoding.GetEncoding("us-ascii", new EncoderReplacementFallback("hello"), DecoderFallback.ExceptionFallback);
var encoder = encoding.GetEncoder();

encoder.Convert(new[] { '\ud800' }, 0, 1, new byte[2], 0, 2, flush: false, out int charsUsed, out int bytesUsed, out bool completed);

Console.WriteLine($"charsUsed: {charsUsed}");
Console.WriteLine($"bytesUsed: {bytesUsed}");
Console.WriteLine($"completed: {completed}");

Expected output:

charsUsed: 1
bytesUsed: 0
completed: False

Actual output:

charsUsed: 1
bytesUsed: 0
completed: True

The culprit seems to be the line below.

https://github.com/dotnet/coreclr/blob/77fcaf6b738941a0c5dc3c00b70b49c7d9f63b69/src/System.Private.CoreLib/shared/System/Text/EncoderNLS.cs#L201

The values flush and this.HasState can never both be true at the same time because specifying flush = true mandates that the EncoderNLS instance not store any remaining high surrogate character for future invocations. Since this means (flush && this.HasState) == false always, by DeMorgan's theorem we have (!flush || !this.HasState) == true always, which means this clause isn't actually contributing to the final value for completed.

I believe the appropriate fix is to ignore the flush parameter entirely and to look only at three conditions: (a) all chars were consumed, (b) there's no leftover state on the EncoderNLS instance, and (c) there's no leftover state in the fallback buffer.

@GrabYourPitchforks GrabYourPitchforks self-assigned this Mar 5, 2019
@GrabYourPitchforks
Copy link
Member Author

/cc @tarekgh @layomia as FYI.

@tarekgh
Copy link
Member

tarekgh commented Mar 5, 2019

@GrabYourPitchforks Thanks for reporting it. I am wondering if you'll include the fix of this in your changes? Just asking to know if we need to work on it especially you marked it as 3.0 issue.

@GrabYourPitchforks
Copy link
Member Author

Yes, I have this fixed in my branch. Just opened this for tracking purposes.

GrabYourPitchforks referenced this issue in dotnet/coreclr Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corefx Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/mono Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@GrabYourPitchforks
Copy link
Member Author

This was fixed in dotnet/coreclr@43a5159.

GrabYourPitchforks referenced this issue in dotnet/corefx Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
marek-safar referenced this issue in mono/mono Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corert Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas referenced this issue in dotnet/corert Mar 11, 2019
This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corefx Oct 21, 2019
This is a complementary fix to https://github.com/dotnet/coreclr/issues/23020

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corert Oct 21, 2019
This is a complementary fix to https://github.com/dotnet/coreclr/issues/23020

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/mono Oct 21, 2019
This is a complementary fix to https://github.com/dotnet/coreclr/issues/23020

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
safern referenced this issue in dotnet/corefx Oct 21, 2019
This is a complementary fix to https://github.com/dotnet/coreclr/issues/23020

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas referenced this issue in dotnet/corert Oct 22, 2019
This is a complementary fix to https://github.com/dotnet/coreclr/issues/23020

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
marek-safar referenced this issue in mono/mono Oct 22, 2019
This is a complementary fix to https://github.com/dotnet/coreclr/issues/23020

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants