Skip to content

[SPARK-46926][PS][CONNECT] Support convert_dtypes and infer_objects in fallback mode#44959

Closed
zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng:ps_fallback_infer_obj_v2
Closed

[SPARK-46926][PS][CONNECT] Support convert_dtypes and infer_objects in fallback mode#44959
zhengruifeng wants to merge 2 commits intoapache:masterfrom
zhengruifeng:ps_fallback_infer_obj_v2

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Support convert_dtypes and infer_objects in fallback mode

Why are the changes needed?

for parity

Does this PR introduce any user-facing change?

new functions support in fallback mode, disabled by default

How was this patch tested?

added ut

Was this patch authored or co-authored using generative AI tooling?

no

init

init

init

init
"asfreq",
"asof",
"convert_dtypes",
"infer_objects",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add them all for fallback for now without a couple of tests instead of adding each API here with each test cases because we can add the tests in the future, and ideally we should implement all API

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, let me add them in another PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a new PR
#44965

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

spark uses partial data to infer the schema, and warns

                    warnings.warn(
                        "The amount of data for return type inference might not be large enough. "
                        "Consider increasing an option `compute.shortcut_limit`."
                    )

in this case (compute.shortcut_limit < dataset size) , the inferred schema for the two new methods maybe incorrect. Do we need additional operations (e.g. fail this invocation) or just warn users in this way?

@HyukjinKwon

@HyukjinKwon
Copy link
Copy Markdown
Member

Let's leave it with warning for now

@zhengruifeng zhengruifeng marked this pull request as draft January 31, 2024 01:42
@zhengruifeng zhengruifeng deleted the ps_fallback_infer_obj_v2 branch January 31, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants