Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use timely's logging infrastructure to log Tracker state #321

Open
wants to merge 4 commits into
base: master
from

Conversation

@saradecova
Copy link

saradecova commented Mar 5, 2020

Add TrackerEvent which records additions or removals
of capabilities, as well as propagation events when changes
in implications are propagated along the internal connections
and edges of the graph.

Add DebugEvent which records the state of pointstamps,
implications, and worklist of Tracker.

Enabling loggers for these events is done the same way as for
logging::TimelyEvent's

At the moment, we use these loggers for comparative testing between
Isabelle implementation of progress tracking and our Rust implementation.

This graph plots the runtime of the computation with and without the changes, and suggests
that the increase is not significant despite the changes being on the critical path.
The example used is timely/examples/barrier.rs with 10,000,000 samples, run
with four workers. The graph shows complementary cdf for:

  1. current master,
  2. change with all loggers disabled (almost identical to (1)),
  3. original TimelyEvent logger, and
  4. original TimelyEvent logger + TrackerEvent logger.

We got similar results when run on other examples.

Add TrackerEvent which records additons or removals
of capabilities, as well as propagation events when chanages
in implications are propagated along the internal connections
and edges of the graph.

Add DebugEvent which records the state of pointstamps,
implications, and worklist of Tracker.

Enabling loggers for these events is done the same way as for
logging::TimelyEvent's.
@saradecova saradecova force-pushed the saradecova:extend-logging branch from e3908d5 to 6db0fbf Mar 5, 2020
@@ -317,6 +317,13 @@ impl<T: PartialOrder+Ord+Clone> MutableAntichain<T> {
self.frontier().less_equal(time)
}

/// Clones the vector of updates.
/// Only used for debugging purposes.

This comment has been minimized.

Copy link
@frankmcsherry

frankmcsherry Mar 5, 2020

Member

Can you say a bit more about what this is used for in the comments? If this lands, I'll need to support it, and it would help to understand why it is here and under which circumstances it could go away.

This comment has been minimized.

Copy link
@saradecova

saradecova Mar 9, 2020

Author

Removed the function as it's no longer needed.

/// Clones the vector of updates.
/// Only used for debugging purposes.
#[inline]
pub fn updates(&self) -> Vec<(T, i64)> {

This comment has been minimized.

Copy link
@frankmcsherry

frankmcsherry Mar 5, 2020

Member

Could this return a &[(T, i64)] instead, to avoid a mandatory allocation?

This comment has been minimized.

Copy link
@saradecova

saradecova Mar 9, 2020

Author

Removed the function completely as it's no longer needed.

@@ -145,8 +147,9 @@ pub struct Builder<T: Timestamp> {
impl<T: Timestamp> Builder<T> {

/// Create a new empty topology builder.
pub fn new() -> Self {
pub fn new(path: Vec<usize>) -> Self {

This comment has been minimized.

Copy link
@frankmcsherry

frankmcsherry Mar 5, 2020

Member

I think this is a big abstraction change, in that reachability.rs used to be agnostic to the hierarchical nature of names in timely dataflow, and just did reachability tracking in a scope with no additional information. This seems to bake that in now, which .. will have to ponder whether that is a good call or not.

Another option might be to produce another implementation of reachability.rs, putting things behind a trait.

Not trying to be difficult, but attempting to minimize the complexity in an already-too-complicated bit of logic.

This comment has been minimized.

Copy link
@saradecova

saradecova Mar 9, 2020

Author

That's a good point. I removed the path argument from from Tracker and Builder constructor.

However, it still must be supplied when the logger is registered. The logger now looks like this:

pub tracker_logger: Option<(Vec<usize>, crate::logging::Logger<TrackerEvent>)>,

The logging events such as "a capability is added/removed from a location" do not make sense if they cannot be tracked down to a particular location.

Let me know what you think please and whether you would still prefer going your way.

@frankmcsherry

This comment has been minimized.

Copy link
Member

frankmcsherry commented Mar 5, 2020

I have some general questions about the PR!

I think I understand the goal, which is to get out information about the steps that reachability.rs takes as it runs. What I'm less clear on is whether this is the best way to do that. I think that what's in reachability.rs is essentially deterministic once its inputs are specified. So, if we captured the timestamp changes that are supplied as inputs, and the moments at which propagation happens, I think this would be sufficient information to re-execute the code and see what happens, but without asking reachability.rs to record this for you.

Would it serve your purposes just as well to instrument the moments and nature of pointstamp updates and propagation?

@saradecova

This comment has been minimized.

Copy link
Author

saradecova commented Mar 9, 2020

Thank you for the comments! I changed the logging so there is only one additional logger (as apposed to two) logging the minimum information to re-execute the code as you suggested.

@saradecova saradecova force-pushed the saradecova:extend-logging branch from 9e77029 to 9d66121 Mar 9, 2020
@utaal

This comment has been minimized.

Copy link
Member

utaal commented Mar 13, 2020

Hi folks, a couple of more notes from a conversation with @saradecova .

We're using Debug to convert timestamps of arbitrary types to Strings for logging:

https://github.com/TimelyDataflow/timely-dataflow/pull/321/files#diff-d233d1f563e31c39b0ad93a7182fdcb3R493

This seems reasonable (we wouldn't know how to encode the type otherwise), however this makes it harder (impossible) for the consumer to know what type to expect for a certain scope. As an example, to be able to replay the behaviour of the Tracker using a trace generated with these events, we need to parse the timestamps (to re-establish the partial order, mainly).

We're wondering if it makes sense to add (or adjust an existing) TimelyEvent to output type information for scope timestamps. This way, a consumer can determine how to parse the stringified timestamps (possibly by providing FromStr implementations).

One option would be to use TypeId (docs), which unfortunately hides its internals, and makes it hard (impossible) to parse at the receiver.

I'm still considering options, but @frankmcsherry let us know if you have opinions.

@saradecova

This comment has been minimized.

Copy link
Author

saradecova commented Mar 13, 2020

To follow on @utaal and our conversation, we can encode the type in string using std::any::type_name.

Moreover, by adding internal_summaries to OperatesEvent it would be also beneficial to have the timestamp type for users of TimelyEvent log-stream.

In pracrise, this could be an event informing us that "A new subscope was created at address addr from the root with associated timestamp_type:

/// The creation of a `Subgraph`.
pub struct SubgraphEvent {
    pub addr: Vec<usize>,
    pub timestamp_type: String,
}
@utaal

This comment has been minimized.

Copy link
Member

utaal commented Mar 13, 2020

Also, it may be the right time to address these todo(s) in ProgressEvent:

messages: Vec::new(),
internal: Vec::new(),

messages: Vec::new(),
internal: Vec::new(),

Loggs an event whenever a new instance of a Subgraph is created.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.