Skip to content

Conversation

davies
Copy link
Contributor

@davies davies commented Apr 14, 2015

Support access columns by index in Python:

>>> df[df[0] > 3].collect()
[Row(age=5, name=u'Bob')]

Access items in ArrayType or MapType

>>> df.select(df.l.getItem(0), df.d.getItem("key")).show()
>>> df.select(df.l[0], df.d["key"]).show()

Access field in StructType

>>> df.select(df.r.getField("b")).show()
>>> df.select(df.r.a).show()

@davies
Copy link
Contributor Author

davies commented Apr 14, 2015

cc @rxin

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30268 has finished for PR 5513 at commit 11f1df3.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #668 has finished for PR 5513 at commit 11f1df3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • commons-math3-3.4.1.jar
    • snappy-java-1.1.1.7.jar
  • This patch removes the following dependencies:
    • commons-math3-3.1.1.jar
    • snappy-java-1.1.1.6.jar

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #673 has started for PR 5513 at commit 11f1df3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create two versions: one for string, and one for int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key of MapType could be any type (for example, DateType), so it should be Any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetItem operation is designed for both array and map. But to make the name consistent with GetItem, I think we should keep the name but only change the type to Any

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense. @davies can you add a unit test to scala? in ColumnExpressionSuite or DataFrameSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #671 has finished for PR 5513 at commit 11f1df3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch adds the following new dependencies:
    • commons-math3-3.4.1.jar
    • snappy-java-1.1.1.7.jar
  • This patch removes the following dependencies:
    • commons-math3-3.1.1.jar
    • snappy-java-1.1.1.6.jar

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30325 has finished for PR 5513 at commit 6c32e79.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30360 has finished for PR 5513 at commit 6b62540.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #679 has started for PR 5513 at commit 6b62540.

@yhuai
Copy link
Contributor

yhuai commented Apr 15, 2015

LGTM

cc @marmbrus

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30379 timed out for PR 5513 at commit d125ac4 after a configured wait of 120m.

@davies
Copy link
Contributor Author

davies commented Apr 16, 2015

@JoshRosen Could you change the timeout to 180 minutes?

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #682 timed out for PR 5513 at commit d125ac4 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #683 has started for PR 5513 at commit d125ac4.

@davies
Copy link
Contributor Author

davies commented Apr 16, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30402 timed out for PR 5513 at commit d125ac4 after a configured wait of 120m.

@JoshRosen
Copy link
Contributor

@davies The Jenkins plugin's timeout is already configured to 180 minutes; I think you'll need to update the TESTS_TIMEOUT variable in dev/run-tests-jenkins in order to bump up the timeout that's triggering here.

@JoshRosen
Copy link
Contributor

By the way, the timeout in the bash script should be less than the timeout that we've configured in Jenkins. If it's not, then Jenkins' timer will fire first and prevent our script from posting useful timeout messages to GitHub (in that case, we'd only get the build abort message from AMPLab Jenkins).

@davies
Copy link
Contributor Author

davies commented Apr 16, 2015

@JoshRosen Changed to 150m.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30421 has finished for PR 5513 at commit 7ada9eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@davies
Copy link
Contributor Author

davies commented Apr 16, 2015

@marmbrus I think this PR is ready to go.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30424 has finished for PR 5513 at commit e04d5a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@marmbrus
Copy link
Contributor

Thanks! Merged to master.

@asfgit asfgit closed this in 6183b5e Apr 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants