Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception thrown when firstQuartile and thirdQuartile are equal in @vx/stats/computeStats #427

Closed
mjsarfatti opened this issue Feb 25, 2019 · 8 comments · Fixed by #841
Closed

Comments

@mjsarfatti
Copy link

In a way, it's a very uncommon case. On the other hand it can happen more often than not if your dashboard lets users filter data freely.

My data array turned out as follows: [10000, 2400, 10000, 10000]

If I feed it into computeStats, the function will compute:
firstQuartile: 10000
thirdQuartile: 10000
IQR: 0 (firstQuartile - thirdQuartile)

Now, IQR is used as a multiplier and as a divisor in several places, turning up 0's and NaN's that eventually just crash the program.

Should computeStats "exit gracefully", for example returning an empty { boxPlot, binData } object and a readable console error stating that the stats provided do not allow to compute a boxPlot?

I guess I can easily wrap the function call in a try/catch block for now, but I wouldn't rely on the fact that every developer out there will test their app with such a strange array. But their apps will crush in production once fed with real world data...

@mjsarfatti mjsarfatti changed the title Exception thrown when firstQuartile and thirdQuartile are equal Exception thrown when firstQuartile and thirdQuartile are equal in @vx/stats/computeStats Feb 25, 2019
@hshoff
Copy link
Member

hshoff commented Feb 26, 2019

Yes computeStats should handle error states such as above gracefully. Might take me a bit to get around to fixing this, but will tag it with a help wanted label in the mean time. Happy to review PRs that come in.

@mjsarfatti
Copy link
Author

mjsarfatti commented Feb 26, 2019 via email

@hshoff
Copy link
Member

hshoff commented Feb 27, 2019

We would want it to end up something like this:
screen shot 2019-02-26 at 8 32 34 pm

@mjsarfatti
Copy link
Author

I also think that the median, firstQuartile and thirdQuartile are not computed correctly in computeStats.

For example, the median of the array [1, 2, 3, 4] would be the average (mean) of the numbers 2 and 3, hence 2.5. While the median of the array [1, 2, 3, 4, 5] is 3.

computeStats instead would, in the first case, select 3 as the median. A similar thing goes on with firstQuartile and thirdQuartile.

Unless there is a specific reason for this, I'd correct it as well.

@williaster
Copy link
Collaborator

@conglei ^any thoughts on this computation?

@conglei
Copy link

conglei commented Mar 1, 2019

I also think that the median, firstQuartile and thirdQuartile are not computed correctly in computeStats.

For example, the median of the array [1, 2, 3, 4] would be the average (mean) of the numbers 2 and 3, hence 2.5. While the median of the array [1, 2, 3, 4, 5] is 3.

computeStats instead would, in the first case, select 3 as the median. A similar thing goes on with firstQuartile and thirdQuartile.

Unless there is a specific reason for this, I'd correct it as well.

Thanks @mjsarfatti for pointing it out. Yes, it is a bug. And in terms of the corner case you mentioned, I agree that we should handle that case more carefully.

@conglei
Copy link

conglei commented Mar 1, 2019

We would want it to end up something like this:
screen shot 2019-02-26 at 8 32 34 pm

Yes, the result should be like this, and I remembered it should be the case. Will double check. @mjsarfatti if you haven't start working on the fix PR, I'm happy to fix this issue next week if it is not too late for you.

@mjsarfatti
Copy link
Author

I haven't really really started on it, so if you are tackling this anyway I'll leave it to you.
I did implement a fix on a locally downloaded version of computeStats that I heavily customized, but in case it can save you some work:

export default function computeStats(numericalArray) {
  const points = [...numericalArray].sort((a, b) => a - b);
  const sampleSize = points.length;

  const getMedian = dataSet => {
    if (dataSet.length % 2 === 1) {
      return dataSet[(dataSet.length - 1) / 2];
    }
    return (dataSet[dataSet.length / 2 - 1] + dataSet[dataSet.length / 2]) / 2;
  };

  const median = getMedian(points);

  const lowerHalfLength = Math.floor(sampleSize / 2);
  const lowerHalf = points.slice(0, lowerHalfLength);
  const firstQuartile = getMedian(lowerHalf);

  const upperHalfLength = Math.ceil(sampleSize / 2);
  const upperHalf = points.slice(upperHalfLength);
  const thirdQuartile = getMedian(upperHalf);

  const IQR = thirdQuartile - firstQuartile;

  [...]

LethalPants added a commit to LethalPants/visx that referenced this issue Oct 5, 2020
williaster pushed a commit that referenced this issue Oct 6, 2020
…uartile are equal (#841)

* fix: [computeStats] change method to calculate first and third quartile to fix #427

* test: [computeStats] add test for edge case mentioned in #427

* refactor: move `calcMedian` out of `computeStats`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants