[SPARK-10417][SQL] Iterating through Column results in infinite loop #8574

0x0FFF · 2015-09-02T14:36:52Z

pyspark.sql.column.Column object has __getitem__ method, which makes it iterable for Python. In fact it has __getitem__ to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)

Issue reproduction:

df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
for i in df["name"]: print i

srowen · 2015-09-02T14:55:54Z

Don't know much about Python myself but that sounds convincing. CC @davies

cloud-fan · 2015-09-02T15:01:24Z

python/pyspark/sql/tests.py

+                break
+            self.assertEqual(0, 1)
+        except TypeError:
+            self.assertEqual(1, 1)


you can use assertRaises to test the exception case.

SparkQA · 2015-09-02T15:19:01Z

Test build #1712 has finished for PR 8574 at commit ea2e9d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaTrainValidationSplitExample
- class KMeans @Since("1.5.0") (
- class DCT(JavaTransformer, HasInputCol, HasOutputCol):
- class SQLTransformer(JavaTransformer):
- class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):
- case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
- case class UnionNode(children: Seq[LocalNode]) extends LocalNode

… test

0x0FFF · 2015-09-02T15:22:23Z

@cloud-fan, I addressed your comments with last commit

0x0FFF · 2015-09-02T15:56:59Z

Looks like it's not being retested after the last commit as Jenkins failed to update the status and the dashboard shows that it's still running. Am I right?

0x0FFF · 2015-09-02T16:47:10Z

Jenkins, retest this please

davies · 2015-09-02T17:12:58Z

LGTM

SparkQA · 2015-09-02T17:36:28Z

Test build #1714 has finished for PR 8574 at commit f041635.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-09-02T20:37:00Z

Merged into master, thanks!

[SPARK-10417][SQL] Iterating through Column results in infinite loop

ea2e9d4

cloud-fan reviewed Sep 2, 2015
View reviewed changes

[SPARK-10417][SQL] Change error message and use assertRaises for unit…

f041635

… test

asfgit closed this in 6cd98c1 Sep 2, 2015

jhaberstroh-sharethis mentioned this pull request Jun 21, 2023

[SPARK-44137] Change handling of iterable objects for on field in joins #41686

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10417][SQL] Iterating through Column results in infinite loop #8574

[SPARK-10417][SQL] Iterating through Column results in infinite loop #8574

0x0FFF commented Sep 2, 2015

srowen commented Sep 2, 2015

cloud-fan Sep 2, 2015

SparkQA commented Sep 2, 2015

0x0FFF commented Sep 2, 2015

0x0FFF commented Sep 2, 2015

0x0FFF commented Sep 2, 2015

davies commented Sep 2, 2015

SparkQA commented Sep 2, 2015

davies commented Sep 2, 2015

[SPARK-10417][SQL] Iterating through Column results in infinite loop #8574

[SPARK-10417][SQL] Iterating through Column results in infinite loop #8574

Conversation

0x0FFF commented Sep 2, 2015

srowen commented Sep 2, 2015

cloud-fan Sep 2, 2015

Choose a reason for hiding this comment

SparkQA commented Sep 2, 2015

0x0FFF commented Sep 2, 2015

0x0FFF commented Sep 2, 2015

0x0FFF commented Sep 2, 2015

davies commented Sep 2, 2015

SparkQA commented Sep 2, 2015

davies commented Sep 2, 2015