Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

"CDF computation generated NaN values" thrown for large dataset in Mann Whitney Wilcoxon Test #1327

Open
1 of 3 tasks
zhouchgh opened this issue Apr 20, 2018 · 0 comments
Open
1 of 3 tasks

Comments

@zhouchgh
Copy link
Contributor

What would you like to submit? (put an 'x' inside the bracket that applies)

  • question
  • bug report
  • feature request

Issue description
MannWhitneyWilcoxonTest class will throw "CDF computation generated NaN values" exception if the sample size is larger than 50,000.

Here's the code snippet I'm trying to execute:

var trandomA = new TRandom();
var A = trandomA.ExponentialSamples(1.0).Take(50000).ToList();
var B = trandomB.ExponentialSamples(1.0).Take(50000).ToList();
mannWhitneyWilcoxonTest = new MannWhitneyWilcoxonTest(A, B);

The MannWhitneyWilcoxonTest class will throw "CDF computation generated NaN values" exception.

After looking into the source of MannWhitneyWilcoxonTest and MannWhitneyDistribution, I find that there are some places have calculation as below:
NumberOfSamples1 * (NumberOfSamples1 + 1)

Since NumberOfSamples1 are int type, not sure if the issue is caused by interger overflow.
In my hunch, 50000 * 50000 = 2,500,000,000 > 2,147,483,647 (INT_MAX)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants