Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key result in chapter 10 sensitive to jittering #9

Open
alexklibisz opened this issue Mar 20, 2018 · 2 comments
Open

Key result in chapter 10 sensitive to jittering #9

alexklibisz opened this issue Mar 20, 2018 · 2 comments

Comments

@alexklibisz
Copy link

This issue pertains to Chapter 10 and its source code in variability.py, which estimates distributions for the mean and standard deviation of male and female heights, then uses the distributions to compute distributions for the coefficient of variation for males and females. A key result seems to be that the coefficient of variation for females is higher than that of males. However, if you remove the jittering that gets applied to the original heights, this result seems to be reversed.

variability.py line 462 applies "jittering" to the list of heights.

I also modified line 266 to print the label for the posterior mean being printed.

If you run the script with jittering, you see that the coefficient of variation for females is greater than that of males, which matches the book's result.

$ python variability.py
...
female CV posterior mean 0.04379422911488041
male CV posterior mean 0.04151490569938492
...
female bigger 1.0000000000000628
male bigger 0

The resulting plot also matches that the book:

image

Now if you comment-out line 462 (the jittering), and re-run the script, you see that the mean coefficient of variation is non-negligibly higher for males.

$ python variability.py
...
male CV posterior mean 0.042135070189436574
female CV posterior mean 0.039877437544664336
...
female bigger 0
male bigger 1.0000000000000615

The resulting plot reflects this result.
image

My instinct is to trust the second result, as it uses the data in its raw form. Still, it would be nice to understand how this simple jittering can cause such a drastic difference in the coefficient of variation.

I'll post back if I can think of any solution or explanation to this problem.

@AllenDowney
Copy link
Owner

Interesting. I will investigate as soon as I can, but it might be a little while.

Both distributions have some strange outliers, which have a disproportionate effect on the estimated CV. I might investigate whether something is going on there.

Thanks for raising the issue.

@manujchandra
Copy link

manujchandra commented Apr 27, 2019

Hi,

Talking about jittering, why do we jitter in the first place? What is the use of jittering?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants