Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class Imbalance Data Check for Severe imbalance #1905

Merged
merged 6 commits into from Mar 2, 2021
Merged

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Mar 1, 2021

part 1 of #1864

Added check to support our severe class imbalance scenario for the new datasplitter.

Still need to address how to find multiclass class imbalances, but I'll leave that for a future PR since I think we need to discuss the best way for identifying that.

@bchen1116 bchen1116 self-assigned this Mar 1, 2021
@codecov
Copy link

codecov bot commented Mar 1, 2021

Codecov Report

Merging #1905 (32c4fca) into main (0367dee) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@     Coverage Diff      @@
##   main   #1905   +/-   ##
============================
============================

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0367dee...6859bc5. Read the comment docs.

@bchen1116 bchen1116 marked this pull request as ready for review March 1, 2021 21:19
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 This looks good to me!

@bchen1116 bchen1116 merged commit ae9a6be into main Mar 2, 2021
if len(below_threshold) and len(sample_counts):
sample_count_values = sample_counts.index.tolist()
severe_imbalance = [v for v in sample_count_values if v in below_threshold]
warning_msg = "The following labels have severe class imbalance because they fall under {:.0f}% of the target and have less than {} samples: {}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: "The following labels in the target have severe class imbalanced because" etc

@dsherry dsherry mentioned this pull request Mar 11, 2021
@freddyaboulton freddyaboulton deleted the bc_1864_imbalance branch May 13, 2022 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants