Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paired t-test plot 95% CI shifted #46

Closed
AnaFVicente opened this issue Jun 19, 2019 · 11 comments
Closed

Paired t-test plot 95% CI shifted #46

AnaFVicente opened this issue Jun 19, 2019 · 11 comments
Assignees
Labels
aesthetics Issues to do with how the estimation plots look statistics issues to do with the statistics underlying estimation plots

Comments

@AnaFVicente
Copy link

I used you the data you created on https://acclab.github.io/DABEST-python-docs/tutorial.html
I make a t-test plot. If I choose median difference, the histogram corresponding to 95% CI (grey histogram) is not aligned with the interval (black line)
image

That's the code:
from scipy.stats import norm # Used in generation of populations.

np.random.seed(9999) # Fix the seed so the results are replicable.
Ns = 20 # The number of samples taken from each population

Create samples

c1 = norm.rvs(loc=3, scale=0.4, size=Ns)
c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns)
t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns)
t3 = norm.rvs(loc=3, scale=0.75, size=Ns)
t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

Add a gender column for coloring the data.

females = np.repeat('Female', Ns/2).tolist()
males = np.repeat('Male', Ns/2).tolist()
gender = females + males

Add an id column for paired data plotting.

id_col = pd.Series(range(1, Ns+1))

Combine samples and gender into a DataFrame.

df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1,
'Control 2' : c2, 'Test 2' : t2,
'Control 3' : c3, 'Test 3' : t3,
'Test 4' : t4, 'Test 5' : t5,
'Test 6' : t6,
'Gender' : gender, 'ID' : id_col
})

two_groups_paired = dabest.load(df, idx=("Test 6", "Test 5"),
paired=True, id_col="ID")

plt.figure()
two_groups_paired.median_diff.plot()
plt.show()

@AnaFVicente
Copy link
Author

Thanks in advance,

Ana

@josesho josesho self-assigned this Jun 20, 2019
@josesho josesho added the bug label Jun 20, 2019
@josesho
Copy link
Member

josesho commented Jun 20, 2019

Hi @AnaFVicente , thanks for flagging this up!

I think this is a case of the commutative property of means, vs. the inherent weirdness of medians?

Using t5 and t6 from above:

>>> np.mean(t5) - np.mean(t6)
-0.023344683345558614
>>> np.mean(t5 - t6)
-0.023344683345558614 # Same result as above.

but

>>> np.median(t5) - np.median(t6)
-0.22693218492666745
>>> np.median(t5 - t6)
0.0528625540482075 # Not the same result...

Right now, I think the best option is for us to remove the mean lines for Gardner-Altman paired median plots....

Again, thanks for bringing this to our attention.

@AnaFVicente
Copy link
Author

Thanks for your response.
If I understood well, the median of the differences (black line) is calculated with a different method than the distribution of the differences (grey histogram). That's why there's a shift between both representations. Is there a way to calculate both parameters by using the same method: np.median(t5) - np.median(t6) or np.median(t5 - t6), so I get nice plots?
Ana

@josesho
Copy link
Member

josesho commented Jun 24, 2019

The problem isn't that they are calculated in a different way. It seems to be much deeper than that. The paired median difference of t5 and t6 is positive, even though the median of t6 is lower than the median of t5....

After thinking about it for a while, there might not be a good way to depict paired median difference with the Gardner-Altman estimation plot. You might have to use the Cumming estimation plot to do so.

Simply use

 two_groups_paired.median_diff.plot(float_contrast=False)

to plot the sampling error histogram below the paired slopegraph.

@josesho josesho added aesthetics Issues to do with how the estimation plots look statistics issues to do with the statistics underlying estimation plots and removed bug labels Jun 24, 2019
@AnaFVicente
Copy link
Author

Thanks for your reply. However, if I plot the cumming estimation I still have the same problem: the histogram corresponding to 95% CI distibrution (grey histogram) is not aligned with the interval (black line)

@AnaFVicente
Copy link
Author

image

@AnaFVicente
Copy link
Author

Would you suggest just to remove 95% grey histogram, corresponding to CI distibrution ?

@josesho
Copy link
Member

josesho commented Jun 25, 2019

The error curve actually is aligned with the 95CI; bootstraps derived from medians often have non-normal distributions. If you find the error curve distracting, you could remove it in a vector graphics program, but I'd advise including it as it highlights:

  • the non-normality of the median difference
  • the graded nature of the confidence interval.

Hope this helps!

@josesho
Copy link
Member

josesho commented Jun 25, 2019

Also, we are looking into how to properly compute and display paired median differences, taking into account all we have discussed above. Thanks for flagging this up to us!

@AnaFVicente
Copy link
Author

Thanks a lot for your help!

@josesho josesho closed this as completed Jun 25, 2019
@AnaFVicente
Copy link
Author

Just a last question. I don't understand how the curve can be correctly aligned if it represents the median differences distribution while the black line represents 95%CI. Most of the curve should be inside the black line, only 5% of the date could be outside. If I understand correctly. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aesthetics Issues to do with how the estimation plots look statistics issues to do with the statistics underlying estimation plots
Projects
None yet
Development

No branches or pull requests

2 participants