Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10417][SQL] Iterating through Column results in infinite loop #8574

Closed
wants to merge 2 commits into from

Conversation

0x0FFF
Copy link
Contributor

@0x0FFF 0x0FFF commented Sep 2, 2015

pyspark.sql.column.Column object has __getitem__ method, which makes it iterable for Python. In fact it has __getitem__ to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)

Issue reproduction:

df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
for i in df["name"]: print i

@srowen
Copy link
Member

srowen commented Sep 2, 2015

Don't know much about Python myself but that sounds convincing. CC @davies

break
self.assertEqual(0, 1)
except TypeError:
self.assertEqual(1, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use assertRaises to test the exception case.

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #1712 has finished for PR 8574 at commit ea2e9d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample
    • class KMeans @Since("1.5.0") (
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):
    • case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
    • case class UnionNode(children: Seq[LocalNode]) extends LocalNode

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 2, 2015

@cloud-fan, I addressed your comments with last commit

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 2, 2015

Looks like it's not being retested after the last commit as Jenkins failed to update the status and the dashboard shows that it's still running. Am I right?

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 2, 2015

Jenkins, retest this please

@davies
Copy link
Contributor

davies commented Sep 2, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #1714 has finished for PR 8574 at commit f041635.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Sep 2, 2015

Merged into master, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants