Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel #40510

Closed
wants to merge 1 commit into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Mar 21, 2023

What changes were proposed in this pull request?

Implements DataFrame.cache, persist, unpersist, and storageLevel.

Why are the changes needed?

Missing APIs.

Does this PR introduce any user-facing change?

DataFrame.cache, persist, unpersist, and storageLevel will be available.

How was this patch tested?

Added/enabled the related tests.

@ueshin ueshin marked this pull request as ready for review March 22, 2023 00:43
@HyukjinKwon
Copy link
Member

Merged to master and branch-3.4.

HyukjinKwon pushed a commit that referenced this pull request Mar 22, 2023
…nd storageLevel

### What changes were proposed in this pull request?

Implements `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel`.

### Why are the changes needed?

Missing APIs.

### Does this PR introduce _any_ user-facing change?

`DataFrame.cache`, `persist`, `unpersist`, and `storageLevel` will be available.

### How was this patch tested?

Added/enabled the related tests.

Closes #40510 from ueshin/issues/SPARK-42889/cache.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit b4f0224)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@@ -54,6 +54,20 @@ message UserContext {
repeated google.protobuf.Any extensions = 999;
}

// StorageLevel for persisting Datasets/Tables.
message StorageLevel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping @ueshin @HyukjinKwon

When I try to import import "spark/connect/base.proto"; in catalog.proto to reuse StorageLevel for message CacheTable, I found the following compile messages:

spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto
spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors.
spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors.
spark/connect/relations.proto:84:5: "Catalog" is not defined.
spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors.
spark/connect/commands.proto:63:3: "Relation" is not defined.
spark/connect/commands.proto:81:3: "Relation" is not defined.
spark/connect/commands.proto:142:3: "Relation" is not defined.
spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors.
spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors.
....

So should we move StorageLevel to a more suitable place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah should probably move ..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HyukjinKwon pushed a commit that referenced this pull request Mar 23, 2023
…le to avoid potential `file recursively imports`

### What changes were proposed in this pull request?
#40510 introduce `message StorageLevel` to `base.proto`, but if we try to import `base.proto` in `catalog.proto` to reuse `StorageLevel` in `message CacheTable` and run `build/sbt "connect-common/compile" to compile, there will be following message in compile log:

```
spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto
spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors.
spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors.
spark/connect/relations.proto:84:5: "Catalog" is not defined.
spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors.
spark/connect/commands.proto:63:3: "Relation" is not defined.
spark/connect/commands.proto:81:3: "Relation" is not defined.
spark/connect/commands.proto:142:3: "Relation" is not defined.
spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors.
spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors.
....
```

So this pr move `message StorageLevel` to a separate file to avoid this potential file recursively imports.

### Why are the changes needed?
To avoid potential file recursively imports.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual check:
   - Add `import "spark/connect/common.proto";` to `catalog.proto`
   - run `build/sbt "connect-common/compile"`

No compilation logs related to `File recursively imports itself` .

Closes #40518 from LuciferYang/SPARK-42889-FOLLOWUP.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Mar 23, 2023
…le to avoid potential `file recursively imports`

#40510 introduce `message StorageLevel` to `base.proto`, but if we try to import `base.proto` in `catalog.proto` to reuse `StorageLevel` in `message CacheTable` and run `build/sbt "connect-common/compile" to compile, there will be following message in compile log:

```
spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto
spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors.
spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors.
spark/connect/relations.proto:84:5: "Catalog" is not defined.
spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors.
spark/connect/commands.proto:63:3: "Relation" is not defined.
spark/connect/commands.proto:81:3: "Relation" is not defined.
spark/connect/commands.proto:142:3: "Relation" is not defined.
spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors.
spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors.
....
```

So this pr move `message StorageLevel` to a separate file to avoid this potential file recursively imports.

To avoid potential file recursively imports.

No

- Pass GitHub Actions
- Manual check:
   - Add `import "spark/connect/common.proto";` to `catalog.proto`
   - run `build/sbt "connect-common/compile"`

No compilation logs related to `File recursively imports itself` .

Closes #40518 from LuciferYang/SPARK-42889-FOLLOWUP.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 88cc239)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…nd storageLevel

### What changes were proposed in this pull request?

Implements `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel`.

### Why are the changes needed?

Missing APIs.

### Does this PR introduce _any_ user-facing change?

`DataFrame.cache`, `persist`, `unpersist`, and `storageLevel` will be available.

### How was this patch tested?

Added/enabled the related tests.

Closes apache#40510 from ueshin/issues/SPARK-42889/cache.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit b4f0224)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…le to avoid potential `file recursively imports`

apache#40510 introduce `message StorageLevel` to `base.proto`, but if we try to import `base.proto` in `catalog.proto` to reuse `StorageLevel` in `message CacheTable` and run `build/sbt "connect-common/compile" to compile, there will be following message in compile log:

```
spark/connect/base.proto:23:1: File recursively imports itself: spark/connect/base.proto -> spark/connect/commands.proto -> spark/connect/relations.proto -> spark/connect/catalog.proto -> spark/connect/base.proto
spark/connect/catalog.proto:22:1: Import "spark/connect/base.proto" was not found or had errors.
spark/connect/catalog.proto:144:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/catalog.proto:161:12: "spark.connect.DataType" seems to be defined in "spark/connect/types.proto", which is not imported by "spark/connect/catalog.proto".  To use it here, please add the necessary import.
spark/connect/relations.proto:25:1: Import "spark/connect/catalog.proto" was not found or had errors.
spark/connect/relations.proto:84:5: "Catalog" is not defined.
spark/connect/commands.proto:22:1: Import "spark/connect/relations.proto" was not found or had errors.
spark/connect/commands.proto:63:3: "Relation" is not defined.
spark/connect/commands.proto:81:3: "Relation" is not defined.
spark/connect/commands.proto:142:3: "Relation" is not defined.
spark/connect/base.proto:23:1: Import "spark/connect/commands.proto" was not found or had errors.
spark/connect/base.proto:25:1: Import "spark/connect/relations.proto" was not found or had errors.
....
```

So this pr move `message StorageLevel` to a separate file to avoid this potential file recursively imports.

To avoid potential file recursively imports.

No

- Pass GitHub Actions
- Manual check:
   - Add `import "spark/connect/common.proto";` to `catalog.proto`
   - run `build/sbt "connect-common/compile"`

No compilation logs related to `File recursively imports itself` .

Closes apache#40518 from LuciferYang/SPARK-42889-FOLLOWUP.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 88cc239)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants