Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3855][SQL] Preserve the result attribute of python UDFs though transformations #2717

Closed
wants to merge 3 commits into from

Conversation

marmbrus
Copy link
Contributor

@marmbrus marmbrus commented Oct 8, 2014

In the current implementation it was possible for the reference to change after analysis.

@SparkQA
Copy link

SparkQA commented Oct 8, 2014

QA tests have started for PR 2717 at commit 9533286.

  • This patch merges cleanly.

@davies
Copy link
Contributor

davies commented Oct 8, 2014

It's reproducable by this query:

SELECT strlen(a) FROM test WHERE strlen(a) > 1

@SparkQA
Copy link

SparkQA commented Oct 8, 2014

QA tests have finished for PR 2717 at commit 9533286.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21485/Test PASSed.

@davies
Copy link
Contributor

davies commented Oct 8, 2014

@marmbrus Could you add a test in pyhon/pyspark/tests.py (SQLTests) ?

@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have started for PR 2717 at commit 6343bcb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have finished for PR 2717 at commit 6343bcb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21549/Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have started for PR 2717 at commit 6343bcb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have finished for PR 2717 at commit 6343bcb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have started for PR 2717 at commit 6343bcb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have finished for PR 2717 at commit 6343bcb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@@ -679,6 +679,12 @@ def test_udf(self):
[row] = self.sqlCtx.sql("SELECT twoArgs('test', 1)").collect()
self.assertEqual(row[0], 5)

def test_udf2(self):
self.sqlCtx.registerFunction("strlen", lambda string: len(string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data type should be IntegerType

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have started for PR 2717 at commit da14879.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have finished for PR 2717 at commit da14879.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21599/Test FAILed.

@davies
Copy link
Contributor

davies commented Oct 10, 2014

@marmbrus LGTM, just wonder that why you do not use IntegerType as returnType in the tests? I looks wired that we compare stringType with > 1

@marmbrus
Copy link
Contributor Author

@davies, I was lazy and didn't want to look up the syntax for that :P

You are right its a little weird, but Hive supports it.

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have started for PR 2717 at commit da14879.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 11, 2014

QA tests have finished for PR 2717 at commit da14879.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21609/Test FAILed.

@pwendell
Copy link
Contributor

Jenkins - retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21701/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have started for PR 2717 at commit da14879.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have finished for PR 2717 at commit da14879.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatePython(

@pwendell
Copy link
Contributor

I'd like to pull this in - is that alright @marmbrus?

@marmbrus
Copy link
Contributor Author

Yes, please do.
On Oct 17, 2014 5:10 PM, "Patrick Wendell" notifications@github.com wrote:

I'd like to pull this in - is that alright @marmbrus
https://github.com/marmbrus?


Reply to this email directly or view it on GitHub
#2717 (comment).

@asfgit asfgit closed this in adcb7d3 Oct 17, 2014
@marmbrus marmbrus deleted the pythonUdfResults branch November 19, 2014 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants