-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value Error
When Applying Multiple Data Validation Decorators to the Same Function
#950
Comments
Thanks for opening! This is limitation I think. E.G. two that have the same name + another complexity. I think we can build a fix, but just to check, if you have them both in the same validator (E.G. as follows) does it work? My guess is not, but worth a try: @check_output_custom(
CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail")),
CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")
) |
Another thought would be to add another custom validator that takes in multiple validators... 🤔 Otherwise I think a potential avenue to scope would be to include some |
@elijahbenizzy |
Fixes #950. This fixes the case where someone wants to do ```python @check_output_custom( CategoricalValuesValidatorPySparkDataFrame(column="ReportingId", allowed_values=[156], importance="fail")), CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail") ) ``` The validator name is the same, but it's used differently. So this appends a numeric number to make it unique.
@rohithrockzz could you try installing |
Fixes #950. This fixes the case where someone wants to do ```python @check_output_custom( CategoricalValuesValidatorPySparkDataFrame(column="ReportingId", allowed_values=[156], importance="fail")), CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail") ) ``` The validator name is the same, but it's used differently. So this appends a numeric number to make it unique.
@skrawcz |
@rohithrockzz great thanks for verifying. I will publish a non-RC version in the morning. |
@rohithrockzz this has been released under |
Description:
I encountered an issue while trying to apply multiple data validation decorators to a single function in the Hamilton DAG framework. Specifically, I am trying to validate different columns of a DataFrame using multiple instances of the @check_output_custom decorator. However, I receive a ValueError indicating that the function cannot be defined more than once.
Steps to Reproduce:
Example code snippet:
1st issue code snippet
This raises the error:
ValueError: Cannot define function process_order_data_raw more than once. Already defined by function <function process_order_data
2nd issue code snippet
This raises the error:
ValueError: Cannot define function process_order_data_CategoricalValuesValidator more than once. Already defined by function <function process_order_data
Expected Behavior
Applying multiple @check_output_custom decorators to a single function should allow for different validation checks on various columns of the DataFrame without raising a ValueError.
Actual Behavior
A ValueError is raised, indicating that the function cannot be defined more than once by the same validator.
Library & System Information
python version = 3.9.5
hamilton library version = 1.65.0
Additional Context:
This issue prevents the application of multiple validators to a single function, which is necessary for comprehensive data validation in our use case. It would be helpful if the framework could support multiple validators on the same function without raising errors.
Thank you for your attention to this issue.
The text was updated successfully, but these errors were encountered: