You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This function is going to require more thoughts than most others because Spark and Pandas have the same function name (sample) to provide slightly different semantics:
Before starting on designing what the expectations should be, here are some constraints:
the existing spark code must still behave similarly
the pandas code may have to call arguments by names to make it compatible. This is usually the standard practice anyway
Some questions which the design doc should explore:
when calling for a number of items to return, should it return a pandas or a spark dataframe. I expect a spark dataframe
should the number of elements returned be exact? I would expect it to be the case since this the full idea of specifying the number of elements
should the elements always be the same? This is very hard to do with the current implementation of sample() in Spark, so this would have to be changed a bit
The text was updated successfully, but these errors were encountered:
I think we should start with a simple implementation that supports frac first. We can worry about how to do exact or approximate n later. Basically supports the following:
This function is going to require more thoughts than most others because Spark and Pandas have the same function name (
sample
) to provide slightly different semantics:Documentation:
Before starting on designing what the expectations should be, here are some constraints:
Some questions which the design doc should explore:
The text was updated successfully, but these errors were encountered: