Paired t-test plot 95% CI shifted #46

AnaFVicente · 2019-06-19T16:24:30Z

I used you the data you created on https://acclab.github.io/DABEST-python-docs/tutorial.html
I make a t-test plot. If I choose median difference, the histogram corresponding to 95% CI (grey histogram) is not aligned with the interval (black line)

That's the code:
from scipy.stats import norm # Used in generation of populations.

np.random.seed(9999) # Fix the seed so the results are replicable.
Ns = 20 # The number of samples taken from each population

Create samples

c1 = norm.rvs(loc=3, scale=0.4, size=Ns)
c2 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
c3 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

t1 = norm.rvs(loc=3.5, scale=0.5, size=Ns)
t2 = norm.rvs(loc=2.5, scale=0.6, size=Ns)
t3 = norm.rvs(loc=3, scale=0.75, size=Ns)
t4 = norm.rvs(loc=3.5, scale=0.75, size=Ns)
t5 = norm.rvs(loc=3.25, scale=0.4, size=Ns)
t6 = norm.rvs(loc=3.25, scale=0.4, size=Ns)

Add a `gender` column for coloring the data.

females = np.repeat('Female', Ns/2).tolist()
males = np.repeat('Male', Ns/2).tolist()
gender = females + males

Add an `id` column for paired data plotting.

id_col = pd.Series(range(1, Ns+1))

Combine samples and gender into a DataFrame.

df = pd.DataFrame({'Control 1' : c1, 'Test 1' : t1,
'Control 2' : c2, 'Test 2' : t2,
'Control 3' : c3, 'Test 3' : t3,
'Test 4' : t4, 'Test 5' : t5,
'Test 6' : t6,
'Gender' : gender, 'ID' : id_col
})

two_groups_paired = dabest.load(df, idx=("Test 6", "Test 5"),
paired=True, id_col="ID")

plt.figure()
two_groups_paired.median_diff.plot()
plt.show()

The text was updated successfully, but these errors were encountered:

AnaFVicente · 2019-06-19T16:26:26Z

Thanks in advance,

Ana

josesho · 2019-06-20T06:50:07Z

Hi @AnaFVicente , thanks for flagging this up!

I think this is a case of the commutative property of means, vs. the inherent weirdness of medians?

Using t5 and t6 from above:

>>> np.mean(t5) - np.mean(t6)

-0.023344683345558614

>>> np.mean(t5 - t6)

-0.023344683345558614 # Same result as above.

but

>>> np.median(t5) - np.median(t6)

-0.22693218492666745

>>> np.median(t5 - t6)

0.0528625540482075 # Not the same result...

Right now, I think the best option is for us to remove the mean lines for Gardner-Altman paired median plots....

Again, thanks for bringing this to our attention.

AnaFVicente · 2019-06-20T08:34:35Z

Thanks for your response.
If I understood well, the median of the differences (black line) is calculated with a different method than the distribution of the differences (grey histogram). That's why there's a shift between both representations. Is there a way to calculate both parameters by using the same method: np.median(t5) - np.median(t6) or np.median(t5 - t6), so I get nice plots?
Ana

josesho · 2019-06-24T07:42:24Z

The problem isn't that they are calculated in a different way. It seems to be much deeper than that. The paired median difference of t5 and t6 is positive, even though the median of t6 is lower than the median of t5....

After thinking about it for a while, there might not be a good way to depict paired median difference with the Gardner-Altman estimation plot. You might have to use the Cumming estimation plot to do so.

Simply use

 two_groups_paired.median_diff.plot(float_contrast=False)

to plot the sampling error histogram below the paired slopegraph.

AnaFVicente · 2019-06-24T09:55:02Z

Thanks for your reply. However, if I plot the cumming estimation I still have the same problem: the histogram corresponding to 95% CI distibrution (grey histogram) is not aligned with the interval (black line)

AnaFVicente · 2019-06-24T09:55:25Z

AnaFVicente · 2019-06-24T10:02:09Z

Would you suggest just to remove 95% grey histogram, corresponding to CI distibrution ?

josesho · 2019-06-25T05:27:36Z

The error curve actually is aligned with the 95CI; bootstraps derived from medians often have non-normal distributions. If you find the error curve distracting, you could remove it in a vector graphics program, but I'd advise including it as it highlights:

the non-normality of the median difference
the graded nature of the confidence interval.

Hope this helps!

josesho · 2019-06-25T05:28:17Z

Also, we are looking into how to properly compute and display paired median differences, taking into account all we have discussed above. Thanks for flagging this up to us!

AnaFVicente · 2019-06-25T07:27:02Z

Thanks a lot for your help!

AnaFVicente · 2019-06-25T15:56:27Z

Just a last question. I don't understand how the curve can be correctly aligned if it represents the median differences distribution while the black line represents 95%CI. Most of the curve should be inside the black line, only 5% of the date could be outside. If I understand correctly. Thanks in advance.

josesho self-assigned this Jun 20, 2019

josesho added the bug label Jun 20, 2019

AnaFVicente closed this as completed Jun 24, 2019

AnaFVicente reopened this Jun 24, 2019

josesho added aesthetics Issues to do with how the estimation plots look statistics issues to do with the statistics underlying estimation plots and removed bug labels Jun 24, 2019

josesho closed this as completed Jun 25, 2019

AnaFVicente mentioned this issue Jul 4, 2019

Paired t-test alignment #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paired t-test plot 95% CI shifted #46

Paired t-test plot 95% CI shifted #46

AnaFVicente commented Jun 19, 2019

AnaFVicente commented Jun 19, 2019

josesho commented Jun 20, 2019 •

edited

AnaFVicente commented Jun 20, 2019

josesho commented Jun 24, 2019

AnaFVicente commented Jun 24, 2019

AnaFVicente commented Jun 24, 2019

AnaFVicente commented Jun 24, 2019

josesho commented Jun 25, 2019

josesho commented Jun 25, 2019

AnaFVicente commented Jun 25, 2019

AnaFVicente commented Jun 25, 2019

Paired t-test plot 95% CI shifted #46

Paired t-test plot 95% CI shifted #46

Comments

AnaFVicente commented Jun 19, 2019

Create samples

Add a gender column for coloring the data.

Add an id column for paired data plotting.

Combine samples and gender into a DataFrame.

AnaFVicente commented Jun 19, 2019

josesho commented Jun 20, 2019 • edited

AnaFVicente commented Jun 20, 2019

josesho commented Jun 24, 2019

AnaFVicente commented Jun 24, 2019

AnaFVicente commented Jun 24, 2019

AnaFVicente commented Jun 24, 2019

josesho commented Jun 25, 2019

josesho commented Jun 25, 2019

AnaFVicente commented Jun 25, 2019

AnaFVicente commented Jun 25, 2019

Add a `gender` column for coloring the data.

Add an `id` column for paired data plotting.

josesho commented Jun 20, 2019 •

edited