Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presence of NaN in unrelated columns breaks DABEST #44

Closed
DizietAsahi opened this issue Jun 17, 2019 · 1 comment
Closed

Presence of NaN in unrelated columns breaks DABEST #44

DizietAsahi opened this issue Jun 17, 2019 · 1 comment
Assignees
Labels
Milestone

Comments

@DizietAsahi
Copy link
Contributor

When trying to work on a large dataframe, containing several columns, some of which could be analyzed using dabest, I realized that other columns that are unrelated to the comparison I'm trying to do (i.e. columns that are not included in the x/y parameters) are interfering with the results.

Demonstration:

dabest.__version__
'0.2.4'

create example dataframe

df = pd.DataFrame(
    {'groups': np.random.choice(['Group 1', 'Group 2', 'Group 3'], size=(100,)),
     'value': np.random.random(size=(100,))})
df['unrelated'] = np.nan
df.head()
  groups value unrelated
Group 1 0.592223 NaN
Group 1 0.432398 NaN
Group 3 0.714241 NaN
Group 1 0.889762 NaN
Group 1 0.388109 NaN

compare Group 1 vs Group 2:

test = dabest.load(data=df, x='groups', y='value', idx=['Group 1', 'Group 2'])
test.mean_diff

This generates a bunch of warnings:

.../numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
.../numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
.../dabest/_stats_tools/confint_2group_diff.py:157: RuntimeWarning: invalid value encountered in less
prop_less_than_es = sum(B < effsize) / len(B)
.../dabest/_classes.py:545: UserWarning: The lower limit of the BCa interval cannot be computed. It is set to the effect size itself. All bootstrap values were likely all the same.
stacklevel=0)
.../dabest/_classes.py:550: UserWarning: The upper limit of the BCa interval cannot be computed. It is set to the effect size itself. All bootstrap values were likely all the same.
stacklevel=0)
.../scipy/stats/stats.py:5001: RuntimeWarning: divide by zero encountered in double_scalars
z = (bigu - meanrank) / sd
.../numpy/core/fromnumeric.py:3367: RuntimeWarning: Degrees of freedom <= 0 for slice
**kwargs)
.../numpy/core/_methods.py:110: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
.../numpy/core/_methods.py:132: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

and then the result is incorrect:

(...)
The unpaired mean difference between Group 1 and Group 2 is nan [95%CI nan, nan].
The two-sided p-value of the Mann-Whitney test is 0.0.
(...)

running the same analysis but keeping only the columns that are relevant generates the correct result

test = dabest.load(data=df[['groups','value']], x='groups', y='value', idx=['Group 1', 'Group 2'])
test.mean_diff

(...)
The unpaired mean difference between Group 1 and Group 2 is -0.0708 [95%CI -0.202, 0.0631].
The two-sided p-value of the Mann-Whitney test is 0.268.
(...)

Alternatively, if the unrelated column(s) do not contain NaNs, everything works as expected:

df.unrelated = 0

test = dabest.load(data=df, x='groups', y='value', idx=['Group 1', 'Group 2'])
test.mean_diff

(...)
The unpaired mean difference between Group 1 and Group 2 is -0.0708 [95%CI -0.202, 0.0631].
The two-sided p-value of the Mann-Whitney test is 0.268.
(...)

@josesho josesho self-assigned this Jun 18, 2019
@josesho josesho added the bug label Jun 18, 2019
@josesho josesho added this to the v0.2.5 milestone Jun 18, 2019
@josesho
Copy link
Member

josesho commented Jun 18, 2019

Thanks for the excellent diagnosis of the problem, @DizietAsahi ! This was very recently brought to my attention by a colleague as well. Expect a bugfix shortly. Thanks!

This was referenced Sep 3, 2019
@josesho josesho closed this as completed Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants