New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a test to run a v2 flow file with the new DAG engine #1840
Add a test to run a v2 flow file with the new DAG engine #1840
Conversation
This is a step towards building an integration test for the new DAG engine. Refactored the DagBuilder API to make it easier to use. Instead of using a DagBuilder class to link nodes, use the name of the nodes directly. This facility can also be used in the future to build the tools to run flows locally for testing purposes.
public DagBuilder(final String name, final DagProcessor dagProcessor) { | ||
requireNonNull(name, "The name of the DagBuilder can't be null"); | ||
this.name = name; | ||
requireNonNull(name, "The dagProcessor of the DagBuilder can't be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be requireNonNull(dagProcessor,
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
Should check non null of the dagProcessor parameter.
@@ -98,16 +141,16 @@ private void checkCircularDependencies() { | |||
class CircularDependencyChecker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dependency check has already been done when loading the flow files: azkaban.project.DirectoryYamlFlowLoader#buildFlowEdges
Is there any particular reason why this needs to be implemented again in building the DAG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DAG component is designed with minimal assumptions including the validations done to its inputs prior to passing these inputs to the DAG engine.
This allows this component to be used in more scenarios. e.g. the test included in this PR doesn't have a need to convert the v2 flow into Flow object which is where the azkaban.project.DirectoryYamlFlowLoader#buildFlowEdges is called.
In the future, we can consider getting rid of one of the validations in the AZ loading scenario. For the time being, the cost of double validation seems small.
this.dagBuilder = new DagBuilder(flowName, new SimpleDagProcessor()); | ||
} | ||
|
||
Dag create() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that logic of creating Dag is similar to what has been done in DirectoryYamlFlowLoader. Would it be easier to just reuse the existing part? Do you plan to eventually integrate them together or keep them as separate paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the DirectoryYamlFlowLoader code in detail.
Eventually, yes, we will need to handle running both v1 and v2 flows. Will look into reducing logic duplication then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a step towards building an integration test for the new DAG engine.
The jobs runs are skipped in this test.
Refactored the DagBuilder API to make it easier to use.
Instead of using a DagBuilder class to link nodes, use the name of the
nodes directly.
This facility can also be used in the future to build the tools to run
flows locally for testing purposes.