Please sign in to comment.
[SPARK-23233][PYTHON] Reset the cache in asNondeterministic to set de…
…terministic properly ## What changes were proposed in this pull request? Reproducer: ```python from pyspark.sql.functions import udf f = udf(lambda x: x) spark.range(1).select(f("id")) # cache JVM UDF instance. f = f.asNondeterministic() spark.range(1).select(f("id"))._jdf.logicalPlan().projectList().head().deterministic() ``` It should return `False` but the current master returns `True`. Seems it's because we cache the JVM UDF instance and then we reuse it even after setting `deterministic` disabled once it's called. ## How was this patch tested? Manually tested. I am not sure if I should add the test with a lot of JVM accesses with the intetnal stuff .. Let me know if anyone feels so. I will add. Author: hyukjinkwon <email@example.com> Closes #20409 from HyukjinKwon/SPARK-23233. (cherry picked from commit 3227d14) Signed-off-by: gatorsmile <firstname.lastname@example.org>
- Loading branch information...
Showing with 16 additions and 0 deletions.