Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue 15293 #3802

Merged
merged 2 commits into from Nov 13, 2015
Merged

fix issue 15293 #3802

merged 2 commits into from Nov 13, 2015

Conversation

aG0aep6G
Copy link
Contributor

ReadlnAppender tried to claim the capacity of the passed buffer, calling
assumeSafeAppend on the result so that on the next call it has a capacity
again that can be claimed.

The obvious problem with that: readln would stomp over memory that it has
not been given.

There was also a subtler problem with it (which caused issue 15293):
When readln wasn't called with the previous line, but with the original
buffer (byLine does that), then the passed buffer had no capacity, so
ReadlnAppender would not assumeSafeAppend when slicing the new line from
it. But without a new assumeSafeAppend, the last one would still be in
effect, possibly on a sub slice of the new line.

https://issues.dlang.org/show_bug.cgi?id=15293

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
15293 [REG2.069.0] std.stdio.readln(buffer) messes up buffer's capacity

@schveiguy
Copy link
Member

ping @rainers
I don't think this is the right solution, because I don't agree with the unit tests, but I can't get the error to happen on my macOS system, so I'm having trouble diagnosing the issue.

@aG0aep6G
Copy link
Contributor Author

I can't get the error to happen on my macOS system, so I'm having trouble diagnosing the issue.

ReadlnAppender is only used in version (DIGITAL_MARS_STDIO) and version (MICROSOFT_STDIO). OS X uses the version (HAS_GETDELIM) variant.

if (buf.length >= pos + n) // buf is already large enough
return;

if (buf.capacity >= pos + n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to kill performance. A call to capacity means lookup of the metadata in the GC. reserve(1) is called for every character added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can fix this by doing this instead:

auto curCap = buf.capacity
if(curCap >= pos + n)
{
   buf.length = curCap
   newBuf = true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting buf.length to the capacity makes sense to me.

But why set newBuf = true? newBuf is supposed to be true only if the buffer has been allocated by ReadlnAppender. But a buffer from outside may have capacity, too.

@schveiguy
Copy link
Member

ReadlnAppender is only used in [Windows]

Ah, ok. I forgot that.

char[] nbuf = new char[ncap];
memcpy(nbuf.ptr, buf.ptr, pos);
cap = nbuf.capacity;
buf = nbuf.ptr[0 .. buf.length]; // remember initial length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right. The original code ignored buf.length and used cap. You are using buf.length for what cap did. So this essentially allocates no new capacity.

I think you want buf = nbuf

@schveiguy
Copy link
Member

I don't think this is the right solution

I'm having second thoughts on this opinion. I didn't realize that the other versions (non ReadlnAppender) didn't care about the returned buffer's appending status.

I'm thinking that this update (after issues noted above) will be acceptable, because readln isn't meant for looping (and when it is, you can use the same mechanism as byLine, which should be high performance).

immutable curCap = buf.capacity;
if (curCap >= pos + n)
{
buf.length = curCap;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schveiguy: If you think newBuf = true; should be here, please elaborate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

I look at it this way:

  1. buf is not big enough to hold the data, so we must expand
  2. Since buf.capacity is non-zero, we know (or we can safely assume) that the data after buf is unused.
  3. When we expand buf to its capacity, it's likely we will not use all the data, so the extra data we claim here for expanded capacity will be garbage if we don't assumeSafeAppend.

So if you set newBuf = true, then assumeSafeAppend will be called at the end, and the data we claimed but didn't use is still available for appending. Effectively it's the same thing as your original implementation, but without the capacity check in the inner loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I got it. In this scenario, the slice that's returned from data is guaranteed to be longer than the originally passed buffer. So it's safe to say that any capacity we didn't use is free for all. Amending.

@aG0aep6G
Copy link
Contributor Author

Amended fixes for the noted issues.

ReadlnAppender tried to claim the capacity of the passed buffer, calling
assumeSafeAppend on the result so that on the next call it has a capacity
again that can be claimed.

The obvious problem with that: readln would stomp over memory that it has
not been given.

There was also a subtler problem with it (which caused issue 15293):
When readln wasn't called with the previous line, but with the original
buffer (byLine does that), then the passed buffer had no capacity, so
ReadlnAppender would not assumeSafeAppend when slicing the new line from
it. But without a new assumeSafeAppend, the last one would still be in
effect, possibly on a sub slice of the new line.
@aG0aep6G
Copy link
Contributor Author

Amended the newBuf thing. Calling it safeAppend now, though. Thanks @schveiguy

@schveiguy
Copy link
Member

LGTM. I think this is actually going to end up making byLine faster too, because assumeSafeAppend/capacity is only called on expansion (those need to dig into the GC metadata which can be expensive).

@rainers please review, this changes the memory usage semantics of readln, but I think for the better.

assert(pos == 0); // assume this is the only put call
if (b.length > cap)
if (b.length > max(buf.length, buf.capacity))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should take advantage of the case b.length <= buf.length to avoid getting the capacity at all. This probably boils down to just calling putbuf unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extracting a reserveWithoutAllocating from reserve that returns false instead of allocating a new buffer. And then change putonly to this:

    void putonly(char[] b)
    {
        assert(pos == 0);   // assume this is the only put call
        if (reserveWithoutAllocating(b.length))
            memcpy(buf.ptr + pos, b.ptr, b.length);
        else
            buf = b.dup;
        pos = b.length;
    }

No .capacity call when buf.length is sufficient, at most one .capacity call. Tight allocation with putonly (not sure if this is important, but it seems to be the point of putonly), spacious allocation with putchar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done that.

@rainers
Copy link
Member

rainers commented Nov 12, 2015

I agree the test case is valid. With a slightly more elaborate GC, that reclaims memory "freed" by assumeSafeAppend, it might reuse memory still referenced by other slices.

It just took 8 month for #2794 to be merged, and it was a more obvious memory corruption. It wasn't me who insisted on performance over correctness, but that was a concern to others, so you might want to check the benchmark given there to see the actual change in performance by this PR.

@aG0aep6G
Copy link
Contributor Author

you might want to check the benchmark given there to see the actual change in performance by this PR.

no optimization flags:
readln1: 129 ms -> 158 ms
readln2: 148 ms -> 124 ms
byLine: 178 ms -> 161 ms

-release -O -inline:
readln1: 94 ms -> 125 ms
readln2: 110 ms -> 90 ms
byLine: 87 ms -> 77 ms

Base is 2.069.0. Tested in wine, not proper Windows.

@rainers
Copy link
Member

rainers commented Nov 13, 2015

Here are some numbers from my Windows system

GIT head                                              this PR
win32:
readln1: 117 ms                                       readln1: 139 ms
readln2: 133 ms                                       readln2: 94 ms
byLine: 104 ms                                        byLine: 86 ms
2192 bytes used, 1048576 bytes pool mem               941040 bytes used, 1048576 bytes pool mem
GC summary:    1 MB,    1 GC    0 ms                  GC summary:    1 MB,   10 GC    4 ms

win64:
readln1: 333 ms                                       readln1: 397 ms
readln2: 341 ms                                       readln2: 335 ms
byLine: 320 ms                                        byLine: 329 ms
5776 bytes used, 1048576 bytes pool mem               970112 bytes used, 1048576 bytes pool mem
GC summary:    1 MB,    1 GC    0 ms                  GC summary:    1 MB,   27 GC    4 ms

win32mscoff:
readln1: 277 ms                                       readln1: 355 ms
readln2: 290 ms                                       readln2: 287 ms
byLine: 263 ms                                        byLine: 278 ms
5328 bytes used, 1048576 bytes pool mem               178128 bytes used, 1048576 bytes pool mem
GC summary:    1 MB,    1 GC    0 ms                  GC summary:    1 MB,   28 GC    6 ms

GC activity happens in the readln1 test case only. We can estimate the number of actually allocated memory by multiplying the number of GC runs (-1 for the final run by the runtime) by the size of the pool (1MB). My test file has a size of 14 MB.

@schveiguy
Copy link
Member

  1. I think readln1 test is not optimal, and we are discouraging that method. Since byLine does something better and we have documented that, I wouldn't put any stock into the lower performance there.
  2. readln2 test can be made more optimal:
void testReadln2()
{
    auto f = File(filename);
    size_t n = 0;

    StopWatch sw;
    sw.start();
    char storage[200];
    char buf = storage[]
    while(!f.eof())
    {
        char[] ln = buf;
        f.readln(ln);
        n += ln.length;
        if(ln.length > buf.length) buf = ln;
    }
    auto dur = sw.peek();
    writefln("readln2: %d ms", dur.to!("msecs", int));
}

I agree with the changes made, still LGTM.

@schveiguy
Copy link
Member

but that was a concern to others

Note that the most obvious problem was byLine performance, which is now a moot point.

@rainers
Copy link
Member

rainers commented Nov 13, 2015

readln2 test can be made more optimal

Sure, but you have to know pretty much the internals of readln to do it. The test cases are rather trying the "obvious" approaches which were considered necessary to be fast.
As we have even added an example to the documentation how to get the performance of byLine, I don't object merging this PR. The DigitalMars runtime version even benefits for the more common cases, while the Microsoft runtime version has never been optimized for performance anyway.

@schveiguy
Copy link
Member

Sure, but you have to know pretty much the internals of readln to do it.

Not really. If readln gives you a bigger buffer, then why wouldn't you prefer it over the stack-allocated smaller one?

Seems like we agree it's worth merging, so I'll do so.

@schveiguy
Copy link
Member

Auto-merge toggled on

schveiguy added a commit that referenced this pull request Nov 13, 2015
@schveiguy schveiguy merged commit fc77dbb into dlang:stable Nov 13, 2015
@aG0aep6G aG0aep6G deleted the 15293 branch January 17, 2016 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants