-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7150] SparkContext.range() and SQLContext.range() #6230
Changes from all commits
d3a0c1b
bd998ba
13dbe84
867c417
617da76
f45e3b2
cbf5200
4590208
789eda5
d3ce5fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -122,6 +122,26 @@ def udf(self): | |
"""Returns a :class:`UDFRegistration` for UDF registration.""" | ||
return UDFRegistration(self) | ||
|
||
def range(self, start, end, step=1, numPartitions=None): | ||
""" | ||
Create a :class:`DataFrame` with single LongType column named `id`, | ||
containing elements in a range from `start` to `end` (exclusive) with | ||
step value `step`. | ||
|
||
:param start: the start value | ||
:param end: the end value (exclusive) | ||
:param step: the incremental step (default: 1) | ||
:param numPartitions: the number of partitions of the DataFrame | ||
:return: A new DataFrame | ||
|
||
>>> sqlContext.range(1, 7, 2).collect() | ||
[Row(id=1), Row(id=3), Row(id=5)] | ||
""" | ||
if numPartitions is None: | ||
numPartitions = self._sc.defaultParallelism | ||
jdf = self._ssql_ctx.range(int(start), int(end), int(step), int(numPartitions)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will make the parameters unpredictable, and lead to exceptions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the start or end is invalid, you will get an exception anyway. By converting them in Python, we will got an exception in Python way (failed to converted into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are right. |
||
return DataFrame(jdf, self) | ||
|
||
@ignore_unicode_prefix | ||
def registerFunction(self, name, f, returnType=StringType()): | ||
"""Registers a lambda function as a UDF so it can be used in SQL statements. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test for large ints (i.e. > 32 bits)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might make sense to have that in tests.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done