Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion python/pyspark/rdd.py
Original file line number Diff line number Diff line change
Expand Up @@ -1039,7 +1039,8 @@ def sample(
>>> 6 <= rdd.sample(False, 0.1, 81).count() <= 14
True
"""
assert fraction >= 0.0, "Negative fraction value: %s" % fraction
if not fraction >= 0:
raise ValueError("Fraction must be nonnegative.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change assert to ValueError is right definately, I believe all assert in main code (espacailly, use when validation params) should be fix (test mode is okay)

[1] https://mail.python.org/pipermail/python-list/2013-November/810940.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertionError is replaced with a ValueError, but I think it maybe trivial to add it to migration-doc

return self.mapPartitionsWithIndex(RDDSampler(withReplacement, fraction, seed).func, True)

def randomSplit(
Expand Down Expand Up @@ -1077,7 +1078,11 @@ def randomSplit(
>>> 250 < rdd2.count() < 350
True
"""
if not all(w >= 0 for w in weights):
raise ValueError("Weights must be nonnegative")
s = float(sum(weights))
if not s > 0:
raise ValueError("Sum of weights must be positive")
cweights = [0.0]
for w in weights:
cweights.append(cweights[-1] + w / s)
Expand Down Expand Up @@ -4565,6 +4570,8 @@ def coalesce(self: "RDD[T]", numPartitions: int, shuffle: bool = False) -> "RDD[
>>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect()
[[1, 2, 3, 4, 5]]
"""
if not numPartitions > 0:
raise ValueError("Number of partitions must be positive.")
if shuffle:
# Decrease the batch size in order to distribute evenly the elements across output
# partitions. Otherwise, repartition will possibly produce highly skewed partitions.
Expand Down