Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace batch tidy #176

Merged
merged 7 commits into from May 12, 2019

Conversation

Projects
None yet
2 participants
@frankmcsherry
Copy link
Member

commented May 11, 2019

This PR is trying to add a bit more discipline to how traces are built from batches. Previously, a trace would contain all non-empty batches, but empty batches didn't quite make it to the trace until there were some records to transmit and .. it was all a bit hack-y with some unfortunate consequences.

The PR also re-organizes operators/arrange.rs into several files, in part so that each of the moving parts more obviously should have independently specifiable behaviors.

The TraceWriter accepts a sequence of batches, whose lower and upper frontiers should line up. It also has a seal method which helps submit an empty batch up to a supplied frontier. These batches go to the shared trace (if it exists), and to each of the private queues associated with imports.

The TraceAgent implements the TraceReader trait and supports the importing of traces into other dataflows. This does a bit of a dance with the existing trace: reading its current batches out, pushing them into a new private queue, and then dropping that queue into the list of the TraceWriter. The import_core method is its core functionality.

The Arrangement is the arranged analogue of a Collection, and has lots of methods that allow you to use it in weird ways.


The PR should be mostly functionality preserving, with a few new features: a trace now has a read_upper method that allows you to read out its current upper frontier (previously, there was some dead time until non-empty batches arrived).

There is the risk that the PR has some performance regressions, as we are now moving around more empty batches. We will have to check for that. Nothing requires the internals of the trace to actually move these many batches around, but we do now need to respect the contract that having shown the trace some empty batches it properly reflects that information.

frankmcsherry added some commits May 10, 2019

@frankmcsherry

This comment has been minimized.

Copy link
Member Author

commented May 11, 2019

One additional perk here, which might help width debugging #174, is more clarity on what needs to be true for cursor_through, and that all batches that get handed out have made their way through the trace so that any distinguish_since failures should be more debuggable. Maybe.

frankmcsherry added some commits May 11, 2019

@frankmcsherry

This comment has been minimized.

Copy link
Member Author

commented May 12, 2019

@comnik I'm planning on landing this soon, to expose a more current upper for each trace, but there might be some performance fall out. It should be correctable, if and once we see it, so do holler if you see any regressions. Unfortunately, the repo doesn't really have its own performance tests... =/

@frankmcsherry frankmcsherry merged commit 56a7eed into master May 12, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@comnik

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

Great! I'll have to read a bit more to understand what has changed and see whether there are performance regressions on my end.

@frankmcsherry

This comment has been minimized.

Copy link
Member Author

commented May 12, 2019

The main changes should mainly be that traces are force-fed more empty batches, where previously traces didn't really receive empty batches and everyone else had to work around that, looking at timely frontiers to pick up that information. Now both traces and import folks see empty batches, which allows traces to report an accurate "what is the most current upper I've seen" where it couldn't before.

The additional empty traces could gunk up the works for e.g. merging, but in principle we can drop them immediately once received and fix that problem, but we need to remove some consistency checks (e.g. that one batch.upper equals the next batch.lower, which needn't be true if there were empty batches in between that we dropped).

@comnik

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

Ah I understand now. Before reading the patch I thought this would already do away with the distinguish frontier. I don't have a clear cut way to check for performance regressions either, but establishing one won't hurt, sooner or later, so I'll think on that.

Do you think Arrange could be publicly exported through arrange/mod as well? That would make this less of a breaking change.

@frankmcsherry

This comment has been minimized.

Copy link
Member Author

commented May 12, 2019

Ah crud! Fix incoming.

@comnik

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

ShutdownButton is not where it used to be either, although I'm not sure whether that is intended to be a public thing just yet?

@frankmcsherry

This comment has been minimized.

Copy link
Member Author

commented May 12, 2019

I just re-pub used that back into arrange too. It was in arrange::agent. Gah, testing these things is ... :)

@comnik

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

No more breakage now :) My Travis setup should catch a bunch of them, as long as we're building off of Differential master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.