New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel #40510
Conversation
Merged to master and branch-3.4. |
…nd storageLevel ### What changes were proposed in this pull request? Implements `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel`. ### Why are the changes needed? Missing APIs. ### Does this PR introduce _any_ user-facing change? `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel` will be available. ### How was this patch tested? Added/enabled the related tests. Closes #40510 from ueshin/issues/SPARK-42889/cache. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit b4f0224) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@@ -54,6 +54,20 @@ message UserContext { | |||
repeated google.protobuf.Any extensions = 999; | |||
} | |||
|
|||
// StorageLevel for persisting Datasets/Tables. | |||
message StorageLevel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping @ueshin @HyukjinKwon
When I try to import import "spark/connect/base.proto";
in catalog.proto
to reuse StorageLevel
for message CacheTable
, I found the following compile messages:
spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto
spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors.
spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import.
spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import.
spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors.
spark/connect/relations.proto:84:5: "Catalog" is not defined.
spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors.
spark/connect/commands.proto:63:3: "Relation" is not defined.
spark/connect/commands.proto:81:3: "Relation" is not defined.
spark/connect/commands.proto:142:3: "Relation" is not defined.
spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors.
spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors.
....
So should we move StorageLevel
to a more suitable place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah should probably move ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…le to avoid potential `file recursively imports` ### What changes were proposed in this pull request? #40510 introduce `message StorageLevel` to `base.proto`, but if we try to import `base.proto` in `catalog.proto` to reuse `StorageLevel` in `message CacheTable` and run `build/sbt "connect-common/compile" to compile, there will be following message in compile log: ``` spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors. spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import. spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import. spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors. spark/connect/relations.proto:84:5: "Catalog" is not defined. spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors. spark/connect/commands.proto:63:3: "Relation" is not defined. spark/connect/commands.proto:81:3: "Relation" is not defined. spark/connect/commands.proto:142:3: "Relation" is not defined. spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors. spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors. .... ``` So this pr move `message StorageLevel` to a separate file to avoid this potential file recursively imports. ### Why are the changes needed? To avoid potential file recursively imports. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual check: - Add `import "spark/connect/common.proto";` to `catalog.proto` - run `build/sbt "connect-common/compile"` No compilation logs related to `File recursively imports itself` . Closes #40518 from LuciferYang/SPARK-42889-FOLLOWUP. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…le to avoid potential `file recursively imports` #40510 introduce `message StorageLevel` to `base.proto`, but if we try to import `base.proto` in `catalog.proto` to reuse `StorageLevel` in `message CacheTable` and run `build/sbt "connect-common/compile" to compile, there will be following message in compile log: ``` spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors. spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import. spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import. spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors. spark/connect/relations.proto:84:5: "Catalog" is not defined. spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors. spark/connect/commands.proto:63:3: "Relation" is not defined. spark/connect/commands.proto:81:3: "Relation" is not defined. spark/connect/commands.proto:142:3: "Relation" is not defined. spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors. spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors. .... ``` So this pr move `message StorageLevel` to a separate file to avoid this potential file recursively imports. To avoid potential file recursively imports. No - Pass GitHub Actions - Manual check: - Add `import "spark/connect/common.proto";` to `catalog.proto` - run `build/sbt "connect-common/compile"` No compilation logs related to `File recursively imports itself` . Closes #40518 from LuciferYang/SPARK-42889-FOLLOWUP. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 88cc239) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…nd storageLevel ### What changes were proposed in this pull request? Implements `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel`. ### Why are the changes needed? Missing APIs. ### Does this PR introduce _any_ user-facing change? `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel` will be available. ### How was this patch tested? Added/enabled the related tests. Closes apache#40510 from ueshin/issues/SPARK-42889/cache. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit b4f0224) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…le to avoid potential `file recursively imports` apache#40510 introduce `message StorageLevel` to `base.proto`, but if we try to import `base.proto` in `catalog.proto` to reuse `StorageLevel` in `message CacheTable` and run `build/sbt "connect-common/compile" to compile, there will be following message in compile log: ``` spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors. spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import. spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto". To use it here, please add the necessary import. spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors. spark/connect/relations.proto:84:5: "Catalog" is not defined. spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors. spark/connect/commands.proto:63:3: "Relation" is not defined. spark/connect/commands.proto:81:3: "Relation" is not defined. spark/connect/commands.proto:142:3: "Relation" is not defined. spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors. spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors. .... ``` So this pr move `message StorageLevel` to a separate file to avoid this potential file recursively imports. To avoid potential file recursively imports. No - Pass GitHub Actions - Manual check: - Add `import "spark/connect/common.proto";` to `catalog.proto` - run `build/sbt "connect-common/compile"` No compilation logs related to `File recursively imports itself` . Closes apache#40518 from LuciferYang/SPARK-42889-FOLLOWUP. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 88cc239) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Implements
DataFrame.cache
,persist
,unpersist
, andstorageLevel
.Why are the changes needed?
Missing APIs.
Does this PR introduce any user-facing change?
DataFrame.cache
,persist
,unpersist
, andstorageLevel
will be available.How was this patch tested?
Added/enabled the related tests.