Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue 21592 - two stack traces if high surrogate is printed #7801

Merged
merged 1 commit into from
Mar 3, 2021

Conversation

aG0aep6G
Copy link
Contributor

No description provided.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @aG0aep6G! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
21592 normal two stack traces if high surrogate is printed

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#7801"

@aG0aep6G
Copy link
Contributor Author

An alternative fix would be to reset highSurrogate already when throwing the exception originally. But that would alter the behavior of LockingTextWriter more than is strictly necessary. The following is currently possible but wouldn't work anymore with the alternative fix:

import std.stdio;
import std.utf : UTFException;
void main()
{
    auto w = stdout.lockingTextWriter();
    w.put("\U00010400"w[0]);
    try w.put('\n');
    catch (UTFException e) w.put("\U00010400"w[1]);
}

@aG0aep6G aG0aep6G marked this pull request as draft February 18, 2021 07:24
@aG0aep6G
Copy link
Contributor Author

This doesn't work like I thought it would. I'll have to take another look.

@aG0aep6G
Copy link
Contributor Author

This doesn't work like I thought it would. I'll have to take another look.

Fixed. I'm not happy with throwing on incomplete UTF-16 while silently ignoring incomplete UTF-8 sequences. But sorting that out is beyond the scope of this PR.

Good to go from my side.

@aG0aep6G aG0aep6G marked this pull request as ready for review February 18, 2021 14:09
@PetarKirov
Copy link
Member

PetarKirov commented Feb 18, 2021

Perhaps I'm missing something obvious, but it seems that this shorter implementation doesn't have this bug (or at least the unittest does not trigger it):

void write(S...)(S args)
{
    import std.format : formattedWrite;
    import std.stdio : stdout;
    import std.typecons : tuple;
    auto w = lockingTextWriter();
    formattedWrite(w, "%(%s%|%)", tuple(args));
}

Full example:
https://run.dlang.io/gist/run-dlang/a8d99f54cbbe57748faa10e41d847723?compiler=dmd

import std.stdio : File;

@safe unittest // https://issues.dlang.org/show_bug.cgi?id=21592
{
    import std.exception : collectException;
    import std.utf : UTFException;
    static import std.file;
    auto deleteme = "deleteme.txt";
    scope(exit) std.file.remove(deleteme);
    auto f = File(deleteme, "w");
    auto e = collectException!UTFException(f.myWriteln(wchar(0xD801)));
    assert(e.next is null);
}

void myWriteln(S...)(ref File f, S args)
{
    myWrite(f, args, '\n');
}

void myWrite(S...)(File f, S args)
{
    import std.format : formattedWrite;
    import std.stdio : stdout;
    import std.typecons : tuple;
    auto w = f.lockingTextWriter();
    formattedWrite(w, "%(%s%|%)", tuple(args));
}

https://run.dlang.io/is/99SIV9

If the myWrite implementation above is correct, what do you think about replacing the current code with it?
If the current implementation of write(S...)(S s) handled the common cases as an optimization, I see no reason why such optimizations can't be done instead in formattedWrite. CC @berni44 who recently did many improvements to std.format.

Edit: One disadvantage is that the stack trace is longer:

std.utf.UTFException@/dlang/dmd/linux/bin64/../../src/phobos/std/utf.d(1715): surrogate UTF-16 high value past end of string
----------------
/dlang/dmd/linux/bin64/../../src/phobos/std/utf.d:1739 pure dchar std.utf.decodeImpl!(true, 0, const(wchar)[]).decodeImpl(ref const(wchar)[], ref ulong) [0x561986cf61eb]
/dlang/dmd/linux/bin64/../../src/phobos/std/utf.d:1146 pure @trusted dchar std.utf.decode!(0, const(wchar)[]).decode(ref const(wchar)[], ref ulong) [0x561986cf6129]
/dlang/dmd/linux/bin64/../../src/phobos/std/range/primitives.d:2464 pure @property @safe dchar std.range.primitives.front!(wchar).front(scope const(wchar)[]) [0x561986cf6040]
/dlang/dmd/linux/bin64/../../src/phobos/std/range/primitives.d:425 @safe void std.range.primitives.put!(void delegate(scope const(char)[]) @safe, const(wchar)[]).put(ref void delegate(scope const(char)[]) @safe, const(wchar)[]) [0x561986cfaf18]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:6462 @safe void std.format.writeAligned!(void delegate(scope const(char)[]) @safe, const(wchar)[], char).writeAligned(ref void delegate(scope const(char)[]) @safe, const(wchar)[], scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cfacd6]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:3062 @safe void std.format.formatValueImpl!(void delegate(scope const(char)[]) @safe, const(wchar), char).formatValueImpl(ref void delegate(scope const(char)[]) @safe, const(wchar), scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cfabcb]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:1875 @safe void std.format.formatValue!(void delegate(scope const(char)[]) @safe, const(wchar), char).formatValue(ref void delegate(scope const(char)[]) @safe, ref const(wchar), scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cfab7b]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:576 @safe uint std.format.formattedWrite!(void delegate(scope const(char)[]) @safe, char, const(wchar)).formattedWrite(ref void delegate(scope const(char)[]) @safe, scope const(char[]), const(wchar)) [0x561986cfa7a5]
/dlang/dmd/linux/bin64/../../src/phobos/std/typecons.d:1275 const @safe void std.typecons.Tuple!(wchar, char).Tuple.toString!(void delegate(scope const(char)[]) @safe, char).toString(scope void delegate(scope const(char)[]) @safe, scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cfa2b4]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:4041 @safe void std.format.formatObject!(std.stdio.File.LockingTextWriter, std.typecons.Tuple!(wchar, char).Tuple, char).formatObject(ref std.stdio.File.LockingTextWriter, ref std.typecons.Tuple!(wchar, char).Tuple, scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cfa219]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:4430 @safe void std.format.formatValueImpl!(std.stdio.File.LockingTextWriter, std.typecons.Tuple!(wchar, char).Tuple, char).formatValueImpl(ref std.stdio.File.LockingTextWriter, ref std.typecons.Tuple!(wchar, char).Tuple, scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cfa1d1]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:1875 @safe void std.format.formatValue!(std.stdio.File.LockingTextWriter, std.typecons.Tuple!(wchar, char).Tuple, char).formatValue(ref std.stdio.File.LockingTextWriter, ref std.typecons.Tuple!(wchar, char).Tuple, scope ref const(std.format.FormatSpec!(char).FormatSpec)) [0x561986cf5b5c]
/dlang/dmd/linux/bin64/../../src/phobos/std/format.d:576 @safe uint std.format.formattedWrite!(std.stdio.File.LockingTextWriter, char, std.typecons.Tuple!(wchar, char).Tuple).formattedWrite(ref std.stdio.File.LockingTextWriter, scope const(char[]), std.typecons.Tuple!(wchar, char).Tuple) [0x561986ceeea7]
onlineapp.d:23 @safe void onlineapp.myWrite!(wchar, char).myWrite(std.stdio.File, wchar, char) [0x561986cee9ea]
onlineapp.d:14 void onlineapp.myWriteln!(wchar).myWriteln(wchar) [0x561986cee98a]
onlineapp.d:8 _Dmain [0x561986cee536]

https://run.dlang.io/is/QHvKWh

vs.

std.utf.UTFException@std/stdio.d(2910): unpaired surrogate UTF-16 value
----------------
??:? @safe void std.stdio.File.LockingTextWriter.highSurrogateShouldBeEmpty() [0x55926864405b]
/dlang/dmd/linux/bin64/../../src/phobos/std/stdio.d:2992 @safe void std.stdio.File.LockingTextWriter.put!(char).put(char) [0x55926863d43b]
/dlang/dmd/linux/bin64/../../src/phobos/std/range/primitives.d:277 @safe void std.range.primitives.doPut!(std.stdio.File.LockingTextWriter, char).doPut(ref std.stdio.File.LockingTextWriter, ref char) [0x55926863d41e]
/dlang/dmd/linux/bin64/../../src/phobos/std/range/primitives.d:380 @safe void std.range.primitives.put!(std.stdio.File.LockingTextWriter, char).put(ref std.stdio.File.LockingTextWriter, char) [0x55926863d3f3]
onlineapp.d:50 @safe void onlineapp.write!(wchar, char).write(ref std.stdio.File, wchar, char) [0x55926863cbdb]
onlineapp.d:14 void onlineapp.myWriteln!(wchar).myWriteln(wchar) [0x55926863cb2e]
onlineapp.d:8 _Dmain [0x55926863c706]

https://run.dlang.io/is/OBTWQE

But this can be handled by catching and re-throwing the exception, similar to what is done in this PR.

@berni44
Copy link
Contributor

berni44 commented Feb 18, 2021

If the current implementation of write(S...)(S s) handled the common cases as an optimization, I see no reason why such optimizations can't be done instead in formattedWrite. CC @berni44 who recently did many improvements to std.format.

I didn't touch string handling yet. I will have to dive deeper into unicode before doing so. So I fear I cannot say anything qualified here. The only thing I know: There are some known bugs in the implementation of string handling in formattedWrite, but when I remember right, they do not touch this subject here.

@aG0aep6G
Copy link
Contributor Author

If the myWrite implementation above is correct, what do you think about replacing the current code with it?
If the current implementation of write(S...)(S s) handled the common cases as an optimization, I see no reason why such optimizations can't be done instead in formattedWrite. CC @berni44 who recently did many improvements to std.format.

Edit: One disadvantage is that the stack trace is longer:

std.utf.UTFException@/dlang/dmd/linux/bin64/../../src/phobos/std/utf.d(1715): surrogate UTF-16 high value past end of string

Note that the exception message is different. Your myWrite throws more eagerly. The existing write can handle a sequence that is split over two strings:

void main()
{
    import std.stdio: stdout;
    stdout.write("\U00010400"w[0], "\U00010400"w[1], "\n"); /* prints "𐐀" */
    stdout.myWrite("\U00010400"w[0], "\U00010400"w[1], "\n"); /* throws */
}

@PetarKirov
Copy link
Member

The existing write can handle a sequence that is split over two strings

IMO, this is not worth supporting, unless it was already guaranteed by the documentation and unit tests.

@aG0aep6G
Copy link
Contributor Author

IMO, this is not worth supporting, unless it was already guaranteed by the documentation and unit tests.

That's alright, but I just want to fix a bug. I'd rather not rewrite the whole function, changing its behavior in subtle ways.

@PetarKirov
Copy link
Member

Okay, fair enough.

I restarted the druntime job on BuildKite as it was failing due to running out of memory. I also restarted FreeBSD_32 job on the auto-tester.

@RazvanN7
Copy link
Collaborator

RazvanN7 commented Mar 2, 2021

This seems to consistently fail druntime on buildkite. The error message:


_D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoThTaZQCaFNaNfKQBrhMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T19needToSwapEndianessTaZQyFNaNbNiNfMKxSQCbQCa__T10FormatSpecTaZQpZb'
--
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoThTaZQCaFNaNfKQBrhMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T14formatIntegralTSQBg5array__T8AppenderTAyaZQoTmTaZQBzFNaNfKQBrxmMKxSQDfQDe__T10FormatSpecTaZQpkmZv'
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoTkTaZQCaFNaNfKQBrkMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T19needToSwapEndianessTaZQyFNaNbNiNfMKxSQCbQCa__T10FormatSpecTaZQpZb'
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoTkTaZQCaFNaNfKQBrkMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T14formatIntegralTSQBg5array__T8AppenderTAyaZQoTmTaZQBzFNaNfKQBrxmMKxSQDfQDe__T10FormatSpecTaZQpkmZv'
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTyAaZQoThTaZQCaFNaNfKQBrhMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T19needToSwapEndianessTaZQyFNaNbNiNfMKxSQCbQCa__T10FormatSpecTaZQpZb'
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTyAaZQoTkTaZQCaFNaNfKQBrkMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T19needToSwapEndianessTaZQyFNaNbNiNfMKxSQCbQCa__T10FormatSpecTaZQpZb'
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoTmTaZQCaFNaNfKQBrmMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T14formatIntegralTSQBg5array__T8AppenderTAyaZQoTmTaZQBzFNaNfKQBrxmMKxSQDfQDe__T10FormatSpecTaZQpkmZv'
  | generated/linux/release/64/benchmark.o:benchmark/runbench.d:function _D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoTxhTaZQCbFNaNfKQBsxhMKxSQDhQDg__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T14formatIntegralTSQBg5array__T8AppenderTAyaZQoTmTaZQBzFNaNfKQBrxmMKxSQDfQDe__T10FormatSpecTaZQpkmZv'
  | collect2: error: ld returned 1 exit status
  | Error: linker exited with status 1
  | generated/linux/release/64/unittest/test_runner core.sys.solaris.libelf
  | posix.mak:462: recipe for target 'generated/linux/release/64/benchmark' failed
  | make: *** [generated/linux/release/64/benchmark] Error 1
  | make: *** Waiting for unfinished jobs....

@aG0aep6G
Copy link
Contributor Author

aG0aep6G commented Mar 2, 2021

This seems to consistently fail druntime on buildkite. The error message:


_D3std6format__T15formatValueImplTSQBh5array__T8AppenderTAyaZQoThTaZQCaFNaNfKQBrhMKxSQDfQDe__T10FormatSpecTaZQpZv: error: undefined reference to '_D3std6format__T19needToSwapEndianessTaZQyFNaNbNiNfMKxSQCbQCa__T10FormatSpecTaZQpZb'

I don't see how the changes here could cause undefined references.

@berni44
Copy link
Contributor

berni44 commented Mar 3, 2021

Maybe this is due to the split of std.format. I don't understand the details yet, but rebasing this PR to master might help.

Anyway, it would be nice to know why this fails. Might be something like "private members of std/format/package.d are treated differently to private members of std/format.d" or something...

@RazvanN7
Copy link
Collaborator

RazvanN7 commented Mar 3, 2021

Yeah, rebasing might fix it.

@berni44
Copy link
Contributor

berni44 commented Mar 3, 2021

Strange things are happening here: I was able to reproduce the error locally by running make -f posix.mak buildkite-test in druntime. Unfortunately, now this error appears even, when I checkout master for dmd, druntime and phobos... Probably make -f posix.mak clean doesn't clean up enough...

PS: Even with a clean newly checked out master of all three parts I get this error...

PPS: Rewinding all three to 13th of Februar, 17:06 brought up similar errors. So I conclude, that this is neither related to this PR nor to the split of std.format, but rather a bug in druntimes posix.mak. It's essentially this call, that produces tons of linker errors:

../dmd/generated/linux/release/64/dmd -conf= -m64 -I/home/D/TMP/druntime/import -I../phobos -L-L../phobos/generated/linux/release/64 -fPIC -defaultlib=libphobos2.so -L-rpath=../phobos/generated/linux/release/64 -de benchmark/runbench.d -ofgenerated/linux/release/64/benchmark

But I might be wrong and using the tests in druntime wrongly. I'm not much familiar with druntime... - I will stop further investigations here.

@aG0aep6G
Copy link
Contributor Author

aG0aep6G commented Mar 3, 2021

rebased

@dlang-bot dlang-bot merged commit 3debd58 into dlang:master Mar 3, 2021
@aG0aep6G aG0aep6G deleted the 21592 branch March 3, 2021 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants