[FLINK-1486] add print method for prefixing a user defined string#372
[FLINK-1486] add print method for prefixing a user defined string#372mxm wants to merge 1 commit intoapache:masterfrom
Conversation
mxm
commented
Feb 6, 2015
- extend API to include a print(String sinkIdentifier) method
- change PrintingOutputformat to include the sink identifier
- if appropriate, print sink identifier and task id
|
Looks good. |
|
I think it would be nice to have some kind of hierarchical structure of the output such as: Looks good otherwise. |
|
Good idea. So we would print |
|
Sounds good to me! |
f440c0e to
3d1c95c
Compare
|
To make it a bit more explicit what is sink identifier and what is the task identifier (especially when just one of the two are printed), I prefixed the sink identifier with "sinkId" and the task identifier with "taskId". |
|
I think this is very valuable. I've tried it out and it looks good to me. Personally, I would prefer the shorter versions proposed by Fabian. If we don't differentiate between parallelism 1 and > 1 we wouldn't have to worry about cases where just one is printed. But I'm fine with your current solution as well. |
|
+1 for conciseness. |
c575134 to
71f318a
Compare
|
I've updated the pull request. I decided to implement the concise method: If no objections, I will merge this tomorrow. |
|
LGTM |
… string - extend API to include a print(String sinkIdentifier) method - change PrintingOutputformat to include the sink identifier - if appropriate, print sink identifier and task id - update documentation
|
I've added documentation for the new print method. Will merge later on. |
|
This may come a bit late (given that this is merged now), but I did not think of it before: When we change the printing to happen on the client console (which we should, IMHO), we will probably realize it via Does this change still make sense then? |
|
I think you are right. If there's only one sink active, there is no need for a sink identifier. |
|
Do we want to break backwards compatibility or include a new method for printing on the client? After all, printing on the workers is a useful tool to debug the dataflow of a program. |
|
Can you think of a case where printing on the client is worse than printing on worker? |
|
From what I have seen, most people expect print() to actually go to the client. |
|
@StephanEwen Don't think printing on the client can be worse if the output still contains information about the producers (e.g. by a task id). IMO, a sink identifier could still make sense when you make multiple calls to print and want to distinguish easily between the outputs. |
|
Ah, okay. You mean we have two methods:
|
|
Just saying that a prefix helps to identify output, even if everything is printed on the client. Additionally, including the task id in the output can be useful for debugging. |
|
Okay, let's just try and not make this too confusing for users. Do we need all three versions?
|
|
Yes, it should be simple for the user. It makes sense to have one print method which just prints the output on the client. In addition, we could have another advanced print method which prints a prefix and optionally the task id.
|
… string - extend API to include a print(String sinkIdentifier) method - change PrintingOutputformat to include the sink identifier - if appropriate, print sink identifier and task id - update documentation This closes apache#372
… string - extend API to include a print(String sinkIdentifier) method - change PrintingOutputformat to include the sink identifier - if appropriate, print sink identifier and task id - update documentation This closes apache#372
… string - extend API to include a print(String sinkIdentifier) method - change PrintingOutputformat to include the sink identifier - if appropriate, print sink identifier and task id - update documentation This closes apache#372