Skip to content

Commit

Permalink
[SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

avgMetrics was summed, not averaged, across folds

Author: =^_^= <maxmoroz@gmail.com>

Closes #14456 from pkch/pkch-patch-1.
  • Loading branch information
pkch authored and srowen committed Aug 3, 2016
1 parent ae22628 commit 639df04
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion python/pyspark/ml/tuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ class CrossValidator(Estimator, ValidatorParams):
>>> evaluator = BinaryClassificationEvaluator()
>>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)
>>> cvModel = cv.fit(dataset)
>>> cvModel.avgMetrics[0]
0.5
>>> evaluator.evaluate(cvModel.transform(dataset))
0.8333...
Expand Down Expand Up @@ -234,7 +236,7 @@ def _fit(self, dataset):
model = est.fit(train, epm[j])
# TODO: duplicate evaluator to take extra params from input
metric = eva.evaluate(model.transform(validation, epm[j]))
metrics[j] += metric
metrics[j] += metric/nFolds

if eva.isLargerBetter():
bestIndex = np.argmax(metrics)
Expand Down

0 comments on commit 639df04

Please sign in to comment.