Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 19532 - chunkBy errors involving non-forward input ranges #6841

Merged
merged 1 commit into from Jan 30, 2019

Conversation

jondegenhardt
Copy link
Contributor

@jondegenhardt jondegenhardt commented Jan 26, 2019

This PR address problems in std.algorithm.chunkBy when operating on non-forward input ranges. chunkBy has separate implementations for forward and non-forward input ranges.

The non-forward version works by instantiating a new struct to iterate over each "chunk" in the input range. One problem with the existing implementation is that the range being chunked is getting copied when these structs are created. This causes them to get out of sync if the same chunk is retrieved multiple time. This PR addresses this by passing a reference to the underlying range to these structs so these structs stay in sync with each other and with the parent range. These changes are in the ChunkByChunkImpl struct.

A second problem is that the existing implementation uses empty on the underlying range to define empty on the outer chunked range. This is a logical problem because the outer range becomes "empty" when the last chunk becomes empty, without a popFront on the outer range. This causes common iteration patterns to issue a popFront on the outer range after it becomes empty. Compounding the problem, the chunkBy popFront and front routines do not assert when called in an empty state. This makes diagnosis more difficult. Asserts are triggered when the caller or callee detects it. std.algorithm.map detects this case, which is why it shows up in issue reports. (map.popFront is called, it asserts because the underlying chunkBy range is "empty".) Iteration patterns that call chunkBy.popFront will have the call forwarded to the underlying input range, which is where the error is triggered. This is why range producing algorithms like roundRobin, heapify (binary heap), and multiwayMerge trigger asserts when the resulting range is passed to chunkBy.

This second problem is fixed by changing the "empty" definition to reflect that the outer range is not empty until popFront is run on the last chunk. Assert checks were also added to the popFront and front routines. The effect is to divorce the notion of emptying an individual chunk from the act of popFront'ing the chunk off the outer range. Using popFront on a chunk to pop it off the outer range still has the effect of emptying the chunk.

Issues 16169 and 17966 are also fixed by this PR, they are variations on the theme.

The forward input range implementation was not examined as part of this PR. Some of the new unit test cases also include forward ranges were it makes sense.

private ElementType!Range prev;

this(Range range, ElementType!Range _prev)
this(ref Range range, ElementType!Range _prev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be return ref cc @WalterBright

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it simply become: this(return ref Range range, ElementType!Range _prev)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked my local unit test build, the iteration.d file is built with -dip1000. My impression was that this flag would catch needed changes. I'll wait additional clarification before making changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WalterBright Can you clarify? Should this be return ref as per @thewilsonator? The PR as written compiles with -dip1000. I searched phobos code and looked through the DIP 1000 document, but didn't see an example using return ref in a constructor function. I'd like to identify the correct way to write this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, the compiler is able to infer parameter attributes like scope and return for templates, lambdas, etc. so in many cases adding the attributes manually is not necessary / has no effect. However adding attributes is needed for cases where the compiler can't infer things on its own.
Which are those cases is highly implementation specific, and it depends on a case by case basis. As rule of thumb, try to go first without scope/return and if the compiler complains, add them as necessary.
Most importantly, have test cases involving static arrays in @safe unittests, as the escape analysis enabled by -dip1000 will emit errors only in @safe functions, and the point of the escape analysis is to disallow escaping a reference to a stack-allocated object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZombineDev Thanks for the explanation. Current chunkBy unit tests are not @safe, they are @system instead. There is a separate issue listed for this, issue 18161. According to that report the problem is that the implementation for the forward range case uses RefCounted.

My personal preference would be to leave the issue of making chunkBy @safe as a separate PR, and have this PR focus on the bug being fixed. That is, leave the implementation as written, without introducing scope / return. However, if reviewers prefer, I could try to make the non-forward input range implementation @safe and introduce unit tests for this purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at this topic and experimented with creating @safe unit tests. Best I can tell, DIP 1000 is saying that the technique I used is not considered @safe under -dip1000. First time I've really looked at DIP 1000, but my interpretation is that it is saying that the GC can no longer be used to manage lifetime in @safe code, and instead some other technique, such as reference counting, must be used. Implementing this appears be an issue, perhaps temporary, as RefCounted is @system at the moment.

It'd be fantastic if I've missed something basic and there's an easy way satisfy@safe under -dip1000. If so, great. But otherwise I recommend going forward with the PR as written. The bug being addressed is quite significant, enough that it makes chunkBy dangerous to use. It'd be better to fix the bug even if it's @system under -dip1000. The forward input range implementation of chunkBy is @system due to use of RefCounted, so presumably both pieces of the chunkBy implementation could be made @safe together.

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @jondegenhardt! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
19532 normal chunkBy assert error involving non-forward input ranges.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#6841"

@dlang-bot dlang-bot merged commit 98d74d5 into dlang:master Jan 30, 2019
@jondegenhardt jondegenhardt deleted the fix-19532-chunkBy-input-range branch January 30, 2019 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants