Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12150] [SQL] [Minor] Add range API without specifying the slice number #10149

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

For usability, add another sqlContext.range() method. Users can specify start, end, and step without specifying the slice number. The slice number is based on the sparkContext's defaultParallelism. It just makes consistent with the RDD range API.

@SparkQA
Copy link

SparkQA commented Dec 4, 2015

Test build #47206 has finished for PR 10149 at commit 72860c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

@gatorsmile
Copy link
Member Author

@marmbrus @cloud-fan This PR changes the external API. Not sure if this will be merged or we should revisit it after the release of 1.6? Thank you!

@marmbrus
Copy link
Contributor

This seems reasonable, but if we are going to do work on sqlContext.range it would be nice if we could take care of a bigger issue. Right now this code is much slower than it needs to be.

screen shot 2015-12-08 at 4 20 59 pm

I think that is is because we are boxing and creating a new Row for each element. Instead, we should consider creating an actual logical/physical operator for range. the implementation of doExecute() can mirror the code in sparkContext.range, but would allocate one UnsafeRow per partition and just call setLong for each element (reusing the same buffer).

Does that make sense?

Benchmark code here: https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/DatasetPerformance.scala

@gatorsmile
Copy link
Member Author

That sounds an interesting work! Let me think about it.

Do you want me close this PR?

@gatorsmile
Copy link
Member Author

I will make a try this week. Thanks for your insights! : )

@gatorsmile gatorsmile closed this Dec 16, 2015
@marmbrus
Copy link
Contributor

Cool, let me know if you run into any problems.

@gatorsmile
Copy link
Member Author

Thanks! Will do my best. : )

@gatorsmile gatorsmile deleted the range branch January 9, 2016 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants