Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 Encoding isn't consistent with .Net Framework #1679

Closed
replaysMike opened this issue Jan 11, 2020 · 5 comments
Closed

UTF8 Encoding isn't consistent with .Net Framework #1679

replaysMike opened this issue Jan 11, 2020 · 5 comments

Comments

@replaysMike
Copy link

@replaysMike replaysMike commented Jan 11, 2020

I found a subtle difference that was revealed in a bunch of hashing code I had written a while back for .Net Framework. I wrote a multi-platform test that shows UTF8Encoding is treated slightly different in .Net Standard and I don't really have a good way to solve it yet.

Consider the following - I encoded a string in hex to guarantee the bytes are the same for the test:
(hopefully github doesn't mangle the expected string, it looks correct after previewing)

[Test]
public void ShouldEncodeUTF8()
{
    var netFrameworkExpected = "\u0004[\u0004�\u0001�\v,�\u001cn]�$«�� )�:�YH̗I5�V���Nl7α��i�g_�ZQW%\u001d�Dy\u001eЕ\u0013w�v+\u0012*��\u000f*��\u0019r��}���8��w��&�\r���\f����?���&�t�M��[�`kzhz9\u0015�\u0012I�\u001ey_`�\u0011\tF��A�Af~��q��%P�����\u0003�x�(g���e\u001fM�32\u0014��";
    var hex = "BC045B0488019F0B2CE61C6E5DFC24C2ABE09BDA2029CC3AE9AD5948CC9749359756B1A2D94E6C37CEB189D269AA675FF75A5157251D8544791ED09513779B762B122A89E10F2A98E91972D7CA7DF9F98038DFDB779FED269A0DE3F8FA0C828993B23F85B5A826B474E84DFECD5B87606B7A687A3915C31249CE1E795F609A11094686DF41E99041667E9DD271A0E22550FDD0C3CEF0039678F328679B8590651F4DBE3332148DBA";
    var bytes = hex.HexToBytes();
    var utf8Encoded = Encoding.UTF8.GetString(bytes);
    Assert.AreEqual(netFrameworkExpected, utf8Encoded);
}

public static byte[] HexToBytes(this string hexString)
{
    return Enumerable.Range(0, hexString.Length)
        .Where(x => x % 2 == 0)
        .Select(x => Convert.ToByte(hexString.Substring(x, 2), 16))
        .ToArray();
}

This test will pass on .Net Framework 4.8, but will fail on .Net Standard 2.0

@svick

This comment has been minimized.

Copy link
Contributor

@svick svick commented Jan 11, 2020

The difference is that on .Net Core 3.0+, some invalid byte sequences produce two �, while they produced only one � on older frameworks. This was an intentional breaking change to follow Unicode best practices, see dotnet/docs#13547 for more details.

(Also, this has nothing to do with .Net Standard. If you write a .Net Standard library and then run it on .Net Framework or .Net Core 2.x, you should see the old behavior.)

@replaysMike

This comment has been minimized.

Copy link
Author

@replaysMike replaysMike commented Jan 11, 2020

Thanks, you would think they would provide an overload to get the old behaviour as there is no way to easily fix this. I can’t just utilize the new encoding, the mechanisms are used for existing hashed password authentication and this makes it difficult to upgrade to .Net Core.

It’s good that they fixed this finally, however breaking backwards compatibility is a problem. The only way for me to fix this is to roll my own UTF8 encoding

@svick

This comment has been minimized.

Copy link
Contributor

@svick svick commented Jan 11, 2020

Supporting invalid UTF-8 sequences for passwords sounds like a bad idea to me and something nobody would intentionally use. Can't you change your policy to disallow such passwords (and suggest password reset if anyone actually tries to log in with such a password)?

@replaysMike

This comment has been minimized.

Copy link
Author

@replaysMike replaysMike commented Jan 12, 2020

we weren’t supporting this in passwords but rather the auto generated salts that are stored as binary data. We can't validate the existing password hashes correctly if we can't decode the existing UTF8 salts. I discovered the system was creating unicode strings and validating using UTF8 which is a bug in our system and now fixed, but there's not much I can do about validating existing salts without the same UTF8 handling as in .Net Framework.

@replaysMike

This comment has been minimized.

Copy link
Author

@replaysMike replaysMike commented Jan 13, 2020

I'll just leave this here in case anyone else runs into this and its a blocker - I created a Nuget package that provides UTF8 encoding as .Net Framework does it: Text.UTF8.Legacy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.