Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.sample #49

Closed
thunterdb opened this issue Mar 29, 2019 · 2 comments · Fixed by #327
Closed

DataFrame.sample #49

thunterdb opened this issue Mar 29, 2019 · 2 comments · Fixed by #327
Labels
enhancement New feature or request

Comments

@thunterdb
Copy link
Contributor

This function is going to require more thoughts than most others because Spark and Pandas have the same function name (sample) to provide slightly different semantics:

Documentation:

Before starting on designing what the expectations should be, here are some constraints:

  • the existing spark code must still behave similarly
  • the pandas code may have to call arguments by names to make it compatible. This is usually the standard practice anyway

Some questions which the design doc should explore:

  • when calling for a number of items to return, should it return a pandas or a spark dataframe. I expect a spark dataframe
  • should the number of elements returned be exact? I would expect it to be the case since this the full idea of specifying the number of elements
  • should the elements always be the same? This is very hard to do with the current implementation of sample() in Spark, so this would have to be changed a bit
@AbdealiLoKo
Copy link
Contributor

Somewhat related PR: #48

@rxin
Copy link
Contributor

rxin commented May 14, 2019

I think we should start with a simple implementation that supports frac first. We can worry about how to do exact or approximate n later. Basically supports the following:

def sample(n, frac, replace):

and throw an exception if n is specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants