Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoder.Convert hanging in 3.1-preview.2 #31414

Closed
adityapatwardhan opened this issue Nov 7, 2019 · 2 comments
Closed

Decoder.Convert hanging in 3.1-preview.2 #31414

adityapatwardhan opened this issue Nov 7, 2019 · 2 comments
Assignees
Milestone

Comments

@adityapatwardhan
Copy link

This is a simplified repro from our code base. The following code finishes execution and outputs the string. The code worked fine till 3.1-preview1. As upgraded to 3.1-preview2 it started to hang.
The hang is in the step where we call decoder.Convert. After a few iterations, it falls in a state where the all the bytes are read but the completed variable is never set to true and hence it keeps on looping.

Source code:
I have attached the Utf8.txt file here which i try to read.

using System;
using System.Text;
using System.IO;

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");

            using (Stream stream = File.Open(@"d:\temp\Utf8.txt", FileMode.Open))
            {
                Console.WriteLine(StreamToString(stream, Encoding.UTF8));
            }
        }

        private static string StreamToString(Stream stream, Encoding encoding)
        {
            StringBuilder result = new StringBuilder(capacity: 10000);
            Decoder decoder = encoding.GetDecoder();

            int useBufferSize = 64;
            if (useBufferSize < encoding.GetMaxCharCount(10))
            {
                useBufferSize = encoding.GetMaxCharCount(10);
            }

            char[] chars = new char[useBufferSize];
            byte[] bytes = new byte[useBufferSize * 4];
            int bytesRead = 0;
            do
            {
                // Read at most the number of bytes that will fit in the input buffer. The
                // return value is the actual number of bytes read, or zero if no bytes remain.
                bytesRead = stream.Read(bytes, 0, useBufferSize * 4);

                bool completed = false;
                int byteIndex = 0;
                int bytesUsed;
                int charsUsed;

                while (!completed)
                {
                    // If this is the last input data, flush the decoder's internal buffer and state.
                    bool flush = (bytesRead == 0);
                    decoder.Convert(bytes, byteIndex, bytesRead - byteIndex,
                                    chars, 0, useBufferSize, flush,
                                    out bytesUsed, out charsUsed, out completed);

                    // The conversion produced the number of characters indicated by charsUsed. Write that number
                    // of characters to our result buffer
                    result.Append(chars, 0, charsUsed);

                    // Increment byteIndex to the next block of bytes in the input buffer, if any, to convert.
                    byteIndex += bytesUsed;
                }
            } while (bytesRead != 0);

            return result.ToString();
        }
    }
}

Output:

With 3.1-preview.1

PS D:\code\test> dotnet run
Hello World!
<h1>Unicode Demo</h1>

<p>Taken from <a
href="http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt">http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt</a></p>
<pre>

  ║                                          ║
  ║    ASCII safety test: 1lI|, 0OD, 8B     ║
  ║                      ?─────────?         ║
  ║    the euro symbol: │ 14.95 ? │         ║
  ║                      ?─────────?         ║
  ╚══════════════════════════════════════════╝

</pre>

With 3.1-preview.2

PS D:\code\test> dotnet --version
3.1.100-preview2-014569
PS D:\code\test> dotnet run
Hello World!
@GrabYourPitchforks GrabYourPitchforks self-assigned this Nov 7, 2019
@GrabYourPitchforks
Copy link
Member

Followed up with Aditya offline. (Thanks btw for the excellent repro!) This behavior is a consequence of dotnet/coreclr#27229 and is expected, since now that completed is correctly returning false to indicate that the operation is not complete the innermost loop never terminates.

For context, per https://docs.microsoft.com/en-us/dotnet/api/system.text.decoder.convert (see Remarks), the completed parameter is intended to signal two things:

  1. all input data has been converted and stored in the destination buffer, and
  2. there's no leftover state in the Decoder instance that requires further processing.

My recommendation would be to change the innermost loop so that it only checks the completed parameter when flush = true. If flush = false and you know you're going to call the Decoder.Convert method again later, only check that all input bytes have been consumed. It's acceptable for the Decoder instance to have remaining internal state in this case.

@adityapatwardhan
Copy link
Author

After changing as per the suggestion, the code works fine. Thanks!

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants