Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jarque-Bera #2891

Merged
merged 9 commits into from
Oct 12, 2021
Merged

Add Jarque-Bera #2891

merged 9 commits into from
Oct 12, 2021

Conversation

bchen1116
Copy link
Contributor

fix #2886

@bchen1116 bchen1116 self-assigned this Oct 7, 2021
@codecov
Copy link

codecov bot commented Oct 7, 2021

Codecov Report

Merging #2891 (1e526db) into main (93cfaf4) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2891     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        302     302             
  Lines      28388   28392      +4     
=======================================
+ Hits       28292   28296      +4     
  Misses        96      96             
Impacted Files Coverage Δ
...alml/data_checks/target_distribution_data_check.py 100.0% <100.0%> (ø)
...hecks_tests/test_target_distribution_data_check.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93cfaf4...1e526db. Read the comment docs.

):
X, y = X_y_regression

random_state = 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the random state to avoid flaky tests since these distributions are randomly generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this a bit more--I'm assuming random_state = 0 will not give us a distribution that would raise a warning? If that's true, I'm curious if there's a good way to make sure that this test is not relying on something flaky... or at the very least, we should comment on this so we don't come back 6 months from now wondering what this value is 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! So something both @ParthivNaresh and I noticed was that when the sample gets smaller, the distribution can end up looking more normal compared to lognormal, especially after dropping values outside 3 st_devs. This can be seen here:
image

We are setting the random_state here so that the tests don't flake for our expected values. I can add a further comment in the file

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool that this didn't take too much to implement!! Left a comment about the flakiness but otherwise, LGTM 🥳

):
X, y = X_y_regression

random_state = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this a bit more--I'm assuming random_state = 0 will not give us a distribution that would raise a warning? If that's true, I'm curious if there's a good way to make sure that this test is not relying on something flaky... or at the very least, we should comment on this so we don't come back 6 months from now wondering what this value is 😂

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I am also a little concerned that there might be some flakiness with the random seed, but I think we'll find out over time and it is documented enough that I think we'd easily find out why it's failing. I think the docstring could probably use a little info to let someone reading it know what's happening with the switching of tests at a glance, but nice work!

@bchen1116 bchen1116 merged commit 3bca25f into main Oct 12, 2021
@chukarsten chukarsten mentioned this pull request Oct 14, 2021
@freddyaboulton freddyaboulton deleted the bc_2886_jb branch May 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce Jarque-Bera for detecting Lognormal distributions
3 participants