Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

Merged
merged 4 commits into from
Apr 1, 2022
Merged

Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels #153

merged 4 commits into from
Apr 1, 2022

Conversation

JohnsonKuan
Copy link
Contributor

@JohnsonKuan JohnsonKuan commented Apr 1, 2022

This PR is to fix an issue with calling a method (e.g. self._process_label_issues_kwargs()) in the constructor of LearningWithNoisyLabels()

This PR also adds a test for CI to make sure sklearn GridSearchCV runs properly when we pass find_label_issues_kwargs dict args as hyper-parameters.

The issue is similar to the one discussed here.

Code below to reproduce the issue:

from cleanlab.classification import LearningWithNoisyLabels
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
import numpy as np


def make_linear_dataset(n_classes=3, n_samples=300):
    X, y = make_classification(
        n_samples=n_samples,
        n_features=2,
        n_redundant=0,
        n_informative=2,
        random_state=1,
        n_clusters_per_class=1,
        n_classes=n_classes,
    )
    rng = np.random.RandomState(2)
    X += 2 * rng.uniform(size=X.shape)
    return (X, y)

param_grid = {
    "find_label_issues_kwargs": [
        {"filter_by": "prune_by_noise_rate"},
        {"filter_by": "prune_by_class"},
        {"filter_by": "both"},
    ],
    "converge_latent_estimates": [True, False],
}

ds = make_linear_dataset(n_classes=3, n_samples=600)
X, y = ds
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=0
)

clf = LogisticRegression(random_state=0, solver="lbfgs", multi_class="auto")

cv = GridSearchCV(
    estimator=LearningWithNoisyLabels(clf),
    param_grid=param_grid,
)

cv.fit(X=X_train, y=y_train)

@codecov
Copy link

codecov bot commented Apr 1, 2022

Codecov Report

Merging #153 (2c4e037) into master (0f86e7b) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #153   +/-   ##
=======================================
  Coverage   87.64%   87.64%           
=======================================
  Files          12       12           
  Lines        1028     1028           
  Branches      191      191           
=======================================
  Hits          901      901           
  Misses        104      104           
  Partials       23       23           
Impacted Files Coverage Δ
cleanlab/classification.py 96.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f86e7b...2c4e037. Read the comment docs.

@JohnsonKuan JohnsonKuan changed the title Add fix and test for sklearn GridSearchCV Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels Apr 1, 2022
Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@jwmueller jwmueller merged commit 6d8102c into cleanlab:master Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants