Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-38271] PoissonSampler may output more rows than MaxRows
### What changes were proposed in this pull request? when `replacement=true`, `Sample.maxRows` returns `None` ### Why are the changes needed? the underlying impl of `SampleExec` can not guarantee that its number of output rows <= `Sample.maxRows` ``` scala> val df = spark.range(0, 1000) df: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> df.count res0: Long = 1000 scala> df.sample(true, 0.999999, 10).count res1: Long = 1004 ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuites Closes #35593 from zhengruifeng/fix_sample_maxRows. Authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information