-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally capture memory only actions like collect()
, count()
etc
#361
Comments
This is as designed - only persistent actions lineage is captured. The reason is that the data stored into memory doesn't have any stable identifier, so the lineage of it might only be useful temporarily, but not quite in a long term perspective. On the other hand, when thinking about it after some time passed I acknowledge the existing of a valid use case for that. |
@wajda , thanks for clarifying. I hope to see the feature soon :).. |
collect()
, count()
etc
+100 to the feature request. |
The lineage data model requires existence of a target data source that is represented by a URI. |
Yeah, that works. I guess multiple steps need changes to support "read-only" actions. |
…e classes. Add more attributes to their signature.
Fixed. spline:
plugins:
za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin:
enabled: true or in a properties format: spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true |
* issue #361 pass `funcName` down to the harvester components * issue #361 Deduplicate "Synthetic" flag in components' extras * issue #361 Promote type aliases ReadNodeInfo and WriteNodeInfo to case classes. Add more attributes to their signature. * issue #361 Make plugins configurable * issue #361 Optionally capture memory only actions like `collect()`, `count()` etc * (no issue) Rename RDD example job for clarity * issue #361 Scala 2.11 syntax fix * issue #361 README * issue #361 add test * issue #565 minor: remove unnecessary "toString" call * issue #565 Restart Spark context after enabling non-persistent actions plugin
Hi,
I have code where I am collecting the spark DF rows into a variable and then iterating over those rows to insert in Cosmos DB graph. When I run this notebook I expect to see some lineage generated in driver logs (I tried setting the dispatcher to console as well as logging). But I do not see anything.
But I replace the collect action with writing to a file in Azure datalake, then I can see the lineage details in driver logs.
Any idea what could be the issue?
The text was updated successfully, but these errors were encountered: