-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark Integration] environment-properties not showing up #2203
Comments
Any ideas on how i can support the investigation of this issue? |
Am I correct that this feature was working and got suddenly broken? If so, could you help us pointing to which OL release broke it? |
I run into a similar issue recently: most of START events were missing the After some investigation, I could drill down to:
To test this possible cause, I removed the skipping logic from Environment
Note: We were using OL 0.18 before, but that version also stopped working. So, I guess this can be a side effect of a recent Databricks/Spark change. |
I was fixing this last week within PR -> #2537 |
Thanks @pawel-big-lebowski. I've rebuilt and tested with the new jar, but this environment property issue is still happening. Looking at the logs, from a job I have I could count:
I've then changed the following line and tested the custom jar again with the same job: Line 216 in 3a0d14c
Changed to EventType eventType = START;
Results:
So, my guess is that there are some race condition between Note: in good news, #2537 looks to really resolve #2499 without adding jars back |
Our model allows sending only a single The issue you encounter may be related to these lines: Lines 34 to 35 in d11d0f0
which enforce building this particular facet only for Please, keep in mind that Openlineage model is cumulative. So, if for a single run there exists any event with |
Yeah, I know. I only wanted to check if the RUNNING events were being generated by a Lines 34 to 35 in d11d0f0
I thought about changing this facet, or having a different version, to accept I guess https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator needs to be updated in order to accept |
@gerson23 Should this issue be closed ? |
@kacpermuda Yes, we can close this. Thanks |
Hello!
We are currently trying to work with the spark integration for OpenLineage in our Databricks instance.
We´ve recently started using the "environment-properties" attribute with information (for our context) regarding the notebook path (if it is a notebook run), or the the job run ID (if it is a databricks job run). But the thing is that these attributes are not always present, if present at all.
Problem:
"environment-properties" attribute is not present for all runID and sometimes is not present at all. (Maybe its being filtered by some condition)
Context:
databricks platform
"openLineageVersion": "1.4.1"
"sparkVersion": "3.4.0"
"scalaVersion": "2.12.15"
Spark Cluster Config:
spark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener
spark.openlineage.version v1
spark.openlineage.debugFacet enabled
spark.openlineage.transport.type console
Example Notebook
Simple Read and Write operation
**Resulting JSON payload / without environment-property attribute **
metadata.json
**Resulting JSON payload from another example / with environment-property attribute for some runIds **
metadata.json
The text was updated successfully, but these errors were encountered: