[DPL Analysis] Simple workflow suffix solution#5558
Conversation
|
Thanks! This means at present every task would need to add the suffix by hand? |
|
Yes :/ |
There was a problem hiding this comment.
Why not simply passing the ctx to the adaptAnalysisTask?
One possible solution we could have is to make the adaptAnalysisTask() function be a member of (an object returned by) the ConfigContext. Something like:
ctx.taskMaker().makeAnalysisTask<ETask>();or alternatively have an helper method for the workflow, rather than for the simple task:
return adaptAnalysisWorkflow(ctx, TaskSpec<ETask>{"output-obj-test"}, TaskSpec<ATask>{"eta-and-phi-histograms"}});What do you think?
There was a problem hiding this comment.
Another solution would be that defineDataProcessing does not return a std::vector<DataProcessorSpec>, but a ConfigureDataProcessorAction which then can be used to do the lazy configuration of the DataProcessor. I think we can even guarantee backward compatibility by having a constructor for ConfigureDataProcessorAction(DataProcessorSpec).
There was a problem hiding this comment.
Hi,
true, we can just pass the ctx and append the suffix inside the adaptAnalysisTask().
ConfigureDataProcessorAction - hm, still we need to pass the ctx somewhere, so it'll still mean a change in analysis workflows. I don't see how it could be much different from just passing the ctx to adaptAnalysisTask().
adaptAnalysisWorkflow looks also good to me since ctx would be used just once. Though it is a bigger change on the user side ;)
There was a problem hiding this comment.
Regarding ConfigureDataProcessorAction: my idea would be that the context is passed outside the user code.
Basically we would go from std::vector<DataProcessorSpec> runDataProcessing(ConfigContext...) to std::vector<ConfigureDataProcessorAction> runDataProcessing(ConfigContext...), where ConfigureDataProcessorAction would be something like:
enum struct DataProcessorAction : char {
CopyFullFromProtoToTarget, // The default action, copy the whole DataProcessorSpec to the one labelled as target
AppendInputsFromProtoToTarget, // Take the inputs in proto and replace them in the one labelled.
AppendOutputsFromProtoToTarget
};
struct ConfigureDataProcessorAction {
std::string target;
DataProcessorSpec *proto;
DataProcessorAction action;
};
There was a problem hiding this comment.
It seems to me it'd be then the same solution as your primary one, with appending the suffix after the tasks creation in overrideSuffix().
We have direct access to output object only inside adaptAnalysisTask(). So, later it is not possible to change the task hashes inside objects and copying the spec doesn't help.
Maybe I am wrong and changing something in DataProcessorSpec will be enough to match the hashes properly in sinks...? Though I haven't suceeded with it previously. In that case, wouldn't it be simplier to stay with overrideSuffix()?
|
Ciao @ktf , what do you think about the current solution? |
|
So we discussed this a bit with @jgrosseo. In the end I think the easiest is to pass the context to the adaptAnalysisTask, since there are places were tasks are added programmatically and adapting to adaptAnalysisWorkflow would require extra manipulations. The more complex solution I was trying to propose is to use the adaptAnalysisTask to capture outputs and manipulate them afterwards, but after having a look at the current adaptAnalysisTask that is indeed quite of a development so I would just go for the modified |
|
Ok, I adjusted the code. |
|
Looks very good to me. I guess now we have to adjust all existing tasks, right? @saganatt do you feel like grepping for them..? ;-) |
|
ok, I will do ;-) |
| adaptAnalysisTask<BTask>("etaphi-histogram"), | ||
| adaptAnalysisTask<CTask>("pt-histogram"), | ||
| adaptAnalysisTask<DTask>("output-wrapper"), | ||
| adaptAnalysisTask<ATask>("eta-and-phi-histograms", ctx), |
|
Thanks a lot for this tedious work! |
|
@ktf the errors seem only in the tests and unrelated. Is this due to the not running tests earlier? Can we merge? |
Hi @jgrosseo @ktf ,
turns out the workflow suffix needs to be appended early enough. The task names and hashes based on them are propagated to objects already inside the
adaptAnalysisTask(). Any following task name changes prevent proper matching of output objects to tasks in sink.I tried to do some workaround on the Framework side but we don't have access to the objects after tasks creation and before the sink. We could e.g. record task names in object headers, then in the sink function append the suffix and re-generate the hashes but this actually defies the purpose of having task hashes...