Doc issues, or maybe I'm dumb #248

dhalperi · 2019-03-13T05:11:19Z

I am finally picking up DD/TD because I want to play with Monoids. New to both Rust and DD.

Going through the mdbook, but at HEAD instead of version 0.7, and am able to explain many differences between my output and the docs.

Things I can't explain, so far:

Step 1, Write a Program - we put the code in src/main.rs but the suggested command is cargo run --example hello. I get (at both HEAD and v0.7) that there is no such example. Am I missing something or is this just a bug?
(The same issue happens for all the other issues in cells I've seen).
Increase the Scale - Don't we need to change the code? The page seems to just have two run commands (that fail for me because of --example hello), with no code changes. I did understand that we probably add --release to get it to go faster.

Is there some magic linkage to https://github.com/TimelyDataflow/timely-dataflow/blob/master/examples/hello.rs that I've messed up in my setup? How does it know which version of things to run?

I'm guessing these are just omissions/bugs, but I am open to being completely wrong given that I am new to a lot here.

The text was updated successfully, but these errors were encountered:

dhalperi · 2019-03-13T05:13:23Z

For posterity:

➜  lca git:(master) ✗ cat Cargo.toml 
[package]
name = "lca"
version = "0.1.0"
authors = ["Daniel Halperin <daniel@halper.in"]
edition = "2018"

[dependencies]
timely = { git = "https://github.com/TimelyDataflow/timely-dataflow" }
differential-dataflow = { git = "https://github.com/TimelyDataflow/differential-dataflow" }
#timely = "0.7"
#differential-dataflow = "0.7"

dhalperi · 2019-03-13T05:14:18Z

➜  lca git:(master) ✗ ls *
Cargo.lock Cargo.toml

src:
main.rs

target:
debug   release

dhalperi · 2019-03-13T05:32:24Z

Consolidate vs Distinct?

The Concat docs mention that concatenating manages with itself reversed will have duplicate entries for (0, 0). This is, I think, one element with multiplicity 2.

This exact same example is used on the Consolidate docs, which says "does nothing to the collection except ensure that each element occurs with only one count". Is this "one count" as in "we need to know what the multiplicity of the element is authoritatively" or as in "multiplicity [count] of one"? I initially assumed the latter, which is basically just Distinct in SQL terms at least.

Then, this text confused me mildly because I thought Distinct would have had a simpler explanation: "We might see two copies of the same element, (0, 0)... This is because for reasons of efficiency, operators like map and concat do not work too hard to "consolidate" the changes that they produce. This is rarely a problem, but it can nonetheless be helpful to consolidate a collection before inspecting it, to ensure that you see the most concise version."

Finally, there's a separate Distinct operator, which is explained in exactly the semantics I would have expected - set vs bag semantics.

So how do Distinct and Consolidate differ?

Maybe Consolidate is actually the former - something about tuples having a single count - but I can't figure out how to inspect the count. So how would it be reflected other than as the same tuple printed twice? [The count doesn't seem to be in the (data, time, diff) triple mentioned in the very first example]

dhalperi · 2019-03-13T06:01:25Z

In the first Arrangement Example, is query backwards?

Specifically, if knows is a pair of the relation and query is (query_id, source) [here, I'm assuming source is in the relation and query_id is an int, nonce, etc., not the same type], IIUC we can't join query and knows without flipping query.

frankmcsherry · 2019-03-13T09:54:50Z

These are super helpful. I'm on the road at the moment (4am locally) but I'll start working through these! Some are doc issues, some are "no, that should just work!".

comnik · 2019-03-13T09:55:59Z

Hi! Chiming in here, as Frank is supposed to be on a beach somewhere right now.

This indeed looks like an error in the docs. It is probably easiest to add your own example to the examples/ folder in the Timely repo and run cargo run --example <your_example>. Cargo has first-level support for these examples. Finally, you can of course also create an examples/ folder in your own repository and run them the same way.
This is also not on you, there is indeed no scale parameter in the current hello example. You could try the bfs example like so cargo run --example bfs 100 1000 where the first argument is the number of nodes, the second one the number of edges in a randomly generated graph.

In any case, --release does make a significant difference to both performance and, unfortunately, compile times.

To increase the "scale" as in "number of workers", no code change is necessary! This is controlled by the -w parameter seen in action in chapter 2.2.

comnik · 2019-03-13T10:09:43Z

Consolidate vs Distinct – a fan favourite! First of all, the Distinct operator does what you would think. It ensures that there is at most one of every datapoint in the collection.

Consolidate is more interesting, and somewhat unique to Differential. Assume a collection of edges. At t_0, we introduce an edge (a, b) which is represented as ((a, b), t_0, +1). Then at t_1 we retract that edge again, ((a, b), t_1, -1). Now, logically the edges collection is empty, but physically, there are two tuples lying around. This can be problematic in iterative dataflows, but also leads to potentially confusing outputs for clients. It's like asking "Is b reachable from a?" and Differential responding "Yes... but no!". When you'd prefer a simple "no".

To a first approximation, Consolidate ensures, that the physical representation corresponds to the logical representation.

By a single count, we refer to the number of diffs we have to look at to get the full picture of the collection at a point in time, not the magnitude of the diffs themselves.

Another example would be adding the same tuple multiple times. This might lead to a physical representation of ((a, b), t_0, 1), ((a, b), t_0, 1) when you'd like to see a concise ((a, b), t_0, 2). Consolidate does this for you.

frankmcsherry · 2019-03-13T12:11:50Z

I believe TimelyDataflow/differential-dataflow#158 should address these questions. Sorry for the confusion!

(NB: it is technically a different repo; which was the source of my initial confusion. you were right about all reported issues).

Also, thanks very much to @comnik for stepping in with explanations!

frankmcsherry · 2019-03-13T12:13:28Z

Also, if it turns out I was overly optimistic and the fixes don't address core ambiguities please do feel free to say as much. :) While they might be clearer now, the goal for sure is to make them clearer for future folks as well!

dhalperi · 2019-03-13T16:50:11Z

(NB: it is technically a different repo; which was the source of my initial confusion. you were right about all reported issues).

Sorry -- I had open 3 different mdbooks and 2 different GitHub repos and created the issue in the wrong one :)

frankmcsherry mentioned this issue Mar 13, 2019

Mdbook updates TimelyDataflow/differential-dataflow#158

Merged

dhalperi closed this as completed Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc issues, or maybe I'm dumb #248

Doc issues, or maybe I'm dumb #248

dhalperi commented Mar 13, 2019 •

edited

dhalperi commented Mar 13, 2019

dhalperi commented Mar 13, 2019

dhalperi commented Mar 13, 2019 •

edited

dhalperi commented Mar 13, 2019

frankmcsherry commented Mar 13, 2019

comnik commented Mar 13, 2019 •

edited

comnik commented Mar 13, 2019

frankmcsherry commented Mar 13, 2019

frankmcsherry commented Mar 13, 2019

dhalperi commented Mar 13, 2019

Navigation Menu

Doc issues, or maybe I'm dumb #248

Doc issues, or maybe I'm dumb #248

Comments

dhalperi commented Mar 13, 2019 • edited

dhalperi commented Mar 13, 2019

dhalperi commented Mar 13, 2019

dhalperi commented Mar 13, 2019 • edited

dhalperi commented Mar 13, 2019

frankmcsherry commented Mar 13, 2019

comnik commented Mar 13, 2019 • edited

comnik commented Mar 13, 2019

frankmcsherry commented Mar 13, 2019

frankmcsherry commented Mar 13, 2019

dhalperi commented Mar 13, 2019

dhalperi commented Mar 13, 2019 •

edited

dhalperi commented Mar 13, 2019 •

edited

comnik commented Mar 13, 2019 •

edited