Transform does not have a stable unique name #51
Comments
Thanks for letting us know about this -- we're going to work on improving the documentation on this and such, but in the meantime here is some more information about the error. By default the missing stable unique names is a warning not an error, but in your example the use of
The single parameter version of apply is safe if you don't have the same PTransform being applied multiple times in the same context.
|
Ah, thank you for the information. I didn't realize it was possible to downgrade it to a warning again. I disagree with the description of apply as safe. It's safe as long as I have analyzed the entire call stack to ensure that no one else ever calls the same built in transform, which is not a reasonable thing to have to do. It means that your code's validity depends on the private implementation details of everyone else's code.
Each of those functions works fine in isolation, but together they break. The only way I see to fix it is to add a mandatory name parameter to them and then thread it through all of your function calls. Since the uniqueness requirement is easily handled by the system, would it be possible to make the pipeline creation ordering be deterministic in order to satisfy the stability requirement as well? It seems like making this the users responsibility is going to create a large amount of boiler plate and tricky context dependent errors that are very difficult to unit test for. |
Regarding safety of apply: The names are all scoped to the containing
If you The unique names are currently assigned deterministically. The reason this isn't enough, is that it doesn't satisfy stability. Consider the following code:
If you change that to:
The names have changed which would prevent updating. But, the pipeline hasn't actually changed, you just reordered some code. |
I see what you're saying about scoping now. It's not intuitive that it's necessary to wrap everything into it's own PTransform instead of the less bulky function call. Apply still seems rather dangerous to me, since very simple and reasonable things are now runtime errors.
In the case of reordering the code and wanting to somehow restart it without change, it seems like you could perform an analysis of the graph and produce a unique identifier based on the transforms place in the topology rather than the name of the function. I'm not sure you want the name of a transform to be tied to it's functionality either. The name is the human readable identifier, which should be entirely distinct from any sort of functional identifier. That said, I'm not sure I entirely understand how reloading a pipeline in intended to work so I may be confused about what the name is for. |
Hi Louis, The documentation around updating pipelines has been much improved since we last gave you an update. I hope that is able to provide some insight into that functionality. I can see why the need to provide names can be frustrating, which is why we let you downgrade to the warning. We've found that human-readable names are useful in several places (including in the Dataflow Monitoring UI) and are less error-prone than other automatic techniques. In your sample pipeline, you might decide to reorder the filters ( I don't think that this is blocking you any more, so I'll close this issue. Please reopen with questions or if there is more work to be done here. Thanks! |
After updating to dataflow 0.4.20150727 we've started getting this exception at runtime. It seems to happen nearly every time we
apply
any transform without explicitly giving it a name. Is this the intent?This entirely reasonable code now results in an unintuitive runtime crash:
Adding some boilerplate fixes it
but this should be unnecessary since I gave it the same name it had already inferred for that transform.
Three questions:
I assume it's not because it says "stable". What's the danger though? If a randomly generated name isn't ok, is it what should we do for programmatically generated transforms? Is an incremented counter ok?
The text was updated successfully, but these errors were encountered: