Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution data check #21

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Distribution data check #21

wants to merge 8 commits into from

Conversation

NabilFayak
Copy link
Collaborator

  • Created distribution_data_check to screen for positive and negative skews as well as bimodal distributions Distribution data check #21
  • Added simple_normalizer transformer to transform data using the Yeo-Johnson method

Resolves #20

Copy link

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments, seems like docstrings are generally not correct. Also, we should be using this data check on the overall data, not just the target values.

checkmates/data_checks/checks/distribution_data_check.py Outdated Show resolved Hide resolved
checkmates/pipelines/transformers.py Outdated Show resolved Hide resolved
Copy link

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be best to add some starter tests here to make sure that the data check and transformer work as expected for simple cases. Right now, there's missing logic that doesn't piece things together.

Copy link

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, are we planning on adding tests as part of this MR or is that later?

checkmates/data_checks/checks/distribution_data_check.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
checkmates/data_checks/checks/distribution_data_check.py Outdated Show resolved Hide resolved
Copy link

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments but almost there!

checkmates/pipelines/transformers.py Show resolved Hide resolved
checkmates/data_checks/checks/distribution_data_check.py Outdated Show resolved Hide resolved
checkmates/pipelines/transformers.py Show resolved Hide resolved
checkmates/pipelines/utils.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New data check necessary for skewed distributions
3 participants