Skip to content

[SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values#43611

Closed
dtenedor wants to merge 16 commits intoapache:masterfrom
dtenedor:fix-more-udtf-errors
Closed

[SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values#43611
dtenedor wants to merge 16 commits intoapache:masterfrom
dtenedor:fix-more-udtf-errors

Conversation

@dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Oct 31, 2023

What changes were proposed in this pull request?

This PR adds checks to return specific error messages if any Python UDTF analyze or eval method accepts or returns wrong values.

Error messages improved include:

  • If the __init__ method takes more arguments than self and analyze_result.
  • If the UDTF call passes more or fewer arguments than analyze or eval expects (not using *args or **kwargs).
  • If the analyze method returns an object besides a StructType in the AnalyzeResult schema field.
  • If there are extra optional AnalyzeResult fields relating to input table arguments (e.g. with_single_partition) but the analyze method received no input table argument.
  • If the analyze method tries to return a list of strings for the partition_by optional field of the AnalyzeResult instead of a list of PartitioningColumn objects.
  • If the AnalyzeResult is missing the schema argument entirely.
  • If we use keyword arguments in the TVF call but the analyze or eval method does not accept arguments with those keyword(s) (or **kwargs).

Why are the changes needed?

This helps users understand how to easily fix their user-defined table functions if they are malformed.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds test coverage.

Was this patch authored or co-authored using generative AI tooling?

No.

commit

commit

commit

commit

commit

commit

commit

commit

commit

commit

commit

commit

commit
@dtenedor
Copy link
Contributor Author

cc @ueshin

@dtenedor dtenedor changed the title [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' method accepts or returns wrong values [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values Nov 6, 2023
@dtenedor
Copy link
Contributor Author

dtenedor commented Nov 9, 2023

cc @ueshin the failing tests look unrelated

respond to code review comments

reformat python
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ueshin for your review! Responded to your comments, please take another look.

@dtenedor dtenedor requested a review from ueshin November 17, 2023 00:01
@ueshin
Copy link
Member

ueshin commented Nov 18, 2023

cc @allisonwang-db

Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ueshin for your review! Please take another look.

@dtenedor dtenedor requested a review from ueshin November 20, 2023 18:12
Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@dtenedor
Copy link
Contributor Author

Hi @allisonwang-db, responded to your code review comments, please take another look.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

handler = read_udtf(infile)
args, kwargs = read_arguments(infile)

error_prefix = f"Failed to evaluate the user-defined table function '{handler.__name__}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handler.__name__ should also be used the registered name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, done!

@ueshin
Copy link
Member

ueshin commented Nov 29, 2023

The failed tests seem not related to this PR.

@ueshin
Copy link
Member

ueshin commented Nov 29, 2023

Thanks! merging to master.

@ueshin ueshin closed this in f5e4e84 Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants