Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

JohnsonKuan · 2022-04-01T00:38:04Z

This PR is to fix an issue with calling a method (e.g. self._process_label_issues_kwargs()) in the constructor of LearningWithNoisyLabels()

This PR also adds a test for CI to make sure sklearn GridSearchCV runs properly when we pass find_label_issues_kwargs dict args as hyper-parameters.

The issue is similar to the one discussed here.

Code below to reproduce the issue:

from cleanlab.classification import LearningWithNoisyLabels
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
import numpy as np


def make_linear_dataset(n_classes=3, n_samples=300):
    X, y = make_classification(
        n_samples=n_samples,
        n_features=2,
        n_redundant=0,
        n_informative=2,
        random_state=1,
        n_clusters_per_class=1,
        n_classes=n_classes,
    )
    rng = np.random.RandomState(2)
    X += 2 * rng.uniform(size=X.shape)
    return (X, y)

param_grid = {
    "find_label_issues_kwargs": [
        {"filter_by": "prune_by_noise_rate"},
        {"filter_by": "prune_by_class"},
        {"filter_by": "both"},
    ],
    "converge_latent_estimates": [True, False],
}

ds = make_linear_dataset(n_classes=3, n_samples=600)
X, y = ds
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=0
)

clf = LogisticRegression(random_state=0, solver="lbfgs", multi_class="auto")

cv = GridSearchCV(
    estimator=LearningWithNoisyLabels(clf),
    param_grid=param_grid,
)

cv.fit(X=X_train, y=y_train)

codecov · 2022-04-01T00:40:02Z

Codecov Report

Merging #153 (2c4e037) into master (0f86e7b) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #153   +/-   ##
=======================================
  Coverage   87.64%   87.64%           
=======================================
  Files          12       12           
  Lines        1028     1028           
  Branches      191      191           
=======================================
  Hits          901      901           
  Misses        104      104           
  Partials       23       23

Impacted Files	Coverage Δ
cleanlab/classification.py	`96.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f86e7b...2c4e037. Read the comment docs.

jwmueller

Good catch!

JohnsonKuan added 4 commits March 31, 2022 16:35

Move _process_label_issues_kwargs to fit method

7193c26

Add test for sklearn GridSearchCV with kwargs

74dc092

Add comment to test for sklearn GridSearchCV with kwargs

d1adc39

Change cv=3 for GridSearchCV in test

2c4e037

JohnsonKuan requested review from jwmueller, anishathalye and cgnorthcutt April 1, 2022 00:38

JohnsonKuan changed the title ~~Add fix and test for sklearn GridSearchCV~~ Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels Apr 1, 2022

jwmueller approved these changes Apr 1, 2022

View reviewed changes

jwmueller merged commit 6d8102c into cleanlab:master Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

JohnsonKuan commented Apr 1, 2022 •

edited

codecov bot commented Apr 1, 2022 •

edited

jwmueller left a comment

Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

Conversation

JohnsonKuan commented Apr 1, 2022 • edited

codecov bot commented Apr 1, 2022 • edited

Codecov Report

jwmueller left a comment

Choose a reason for hiding this comment

JohnsonKuan commented Apr 1, 2022 •

edited

codecov bot commented Apr 1, 2022 •

edited