You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A common need is validating whether OpenLineage integration produces valid OpenLineage events given a Spark job.
However, it's not very easy right now: you have to clone the project, get the whole development environment set up, and then if you use custom dependencies or customized Spark, you have to edit a lot of Java test files to configure it to resemble actual environment.
Other way is setting it up and using it with some actual production job: this has different set of problems. It also requires you to set up a real OpenLineage backend just to confirm that a set of events matches expectations.
However, we can simplify the experience by creating a test CLI application that reuses common integration test framework, and gives a binary OK/FAILED response given a Spark job, expected files with JSON events and Docker image with Spark - with any customizations possible, custom dependencies, other deviations from Apache-hosted Spark libs.
The idea is that we'd run a given Spark SQL statement within a pre-prepared job using given Spark image, similar to what we're doing in the integration tests now. Events could be given in a separate files or in single newline-delimited JSON file.
The resulting events could be matched either by mockserver, or we could use File transport with some bind mounted directory.
First iteration could just accept a SQL job, without any other job customizations needed:
A common need is validating whether OpenLineage integration produces valid OpenLineage events given a Spark job.
However, it's not very easy right now: you have to clone the project, get the whole development environment set up, and then if you use custom dependencies or customized Spark, you have to edit a lot of Java test files to configure it to resemble actual environment.
Other way is setting it up and using it with some actual production job: this has different set of problems. It also requires you to set up a real OpenLineage backend just to confirm that a set of events matches expectations.
However, we can simplify the experience by creating a test CLI application that reuses common integration test framework, and gives a binary OK/FAILED response given a Spark job, expected files with JSON events and Docker image with Spark - with any customizations possible, custom dependencies, other deviations from Apache-hosted Spark libs.
The idea is that we'd run a given Spark SQL statement within a pre-prepared job using given Spark image, similar to what we're doing in the integration tests now. Events could be given in a separate files or in single newline-delimited JSON file.
The resulting events could be matched either by mockserver, or we could use File transport with some bind mounted directory.
First iteration could just accept a SQL job, without any other job customizations needed:
CLI interface could look like this:
The text was updated successfully, but these errors were encountered: