Skip to content
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.

Stats queries return integers as numbers, floats as strings #278

Open
yozlet opened this issue Jan 26, 2016 · 10 comments
Open

Stats queries return integers as numbers, floats as strings #278

yozlet opened this issue Jan 26, 2016 · 10 comments

Comments

@yozlet
Copy link
Contributor

yozlet commented Jan 26, 2016

Example:

{
  "aggregations": {
    "school.tuition_revenue_per_fte": {
      "count": 7362,
      "min": 0,
      "max": 207213,
      "avg": "0.9954935343656614E4",
      "sum": 73288234,
      "sum_of_squares": 1249896256472,
      "variance": "0.7067598825743325E8",
      "std_deviation": "0.8406901228005076E4",
      "std_deviation_bounds": {
        "upper": "0.26768737799666764E5",
        "lower": "-0.6858867112353537E4"
      }
    }
  }
}

... whereas what we get when querying ES directly is:

{
  "aggregations": {
    "school.tuition_revenue_per_fte": {
      "count": 7362,
      "min": 0,
      "max": 207213,
      "avg": 9954.935343656614,
      "sum": 7.3288234E7,
      "sum_of_squares": 1.249896256472E12,
      "variance": 7.067598825743325E7,
      "std_deviation": 8406.901228005076,
      "std_deviation_bounds": {
        "upper": 26768.737799666764,
        "lower": -6858.867112353537
      }
    }
  }
}

I initially thought this was something to do with Ruby mis-parsing scientific notation, but now I suspect it's something in our type-detection output code.

@ultrasaurus
Copy link
Contributor

@ultrasaurus
Copy link
Contributor

@siruguri would you have time to take a look at what's going on here?

@siruguri
Copy link
Contributor

I have to re-install elastic search because I got a new laptop, and remind
myself how to load up some dummy data... I can take a whirl at that this
afternoon, and hopefully at least point out what the deal is... not sure if
I can get around to fixing/testing it in the next few days...

On Wed, Feb 17, 2016 at 2:19 PM, Sarah Allen notifications@github.com
wrote:

@siruguri https://github.com/siruguri would you have time to take a
look at what's going on here?


Reply to this email directly or view it on GitHub
#278 (comment)
.

@siruguri
Copy link
Contributor

in controllers.rb, line 75, the .to_json calls .to_s on each item in
the hash data. Floats are BigDecimals; so to_s makes them strings which I
think is the behavior of BigDecimal, whereas the integers are Fixnums
(maybe?).

So the solution would be either to intercept the to_s if possible in
JSON, or post-process the output.

I am not too sure when I can get to this, maybe next week? I know I also
have a failing test request on another issue.

On Wed, Feb 17, 2016 at 2:27 PM, Sameer Siruguri siruguri@gmail.com wrote:

I have to re-install elastic search because I got a new laptop, and remind
myself how to load up some dummy data... I can take a whirl at that this
afternoon, and hopefully at least point out what the deal is... not sure if
I can get around to fixing/testing it in the next few days...

On Wed, Feb 17, 2016 at 2:19 PM, Sarah Allen notifications@github.com
wrote:

@siruguri https://github.com/siruguri would you have time to take a
look at what's going on here?


Reply to this email directly or view it on GitHub
#278 (comment)
.

@siruguri
Copy link
Contributor

So the writer of BigDecimal actually wanted the default implementation to be that floating points are rendered as strings by the to_json method, to reduce the possibility of (small) errors during de/re-serialization of data. As far as I could figure, it seems like monkey-patching BigDecimal is the only solution, with the caveat that it will produce this corner-case error.

Should I do that? Should we instead let the numbers be strings so that the client system can decide how to re-cast to their version of the floating-point type?

@yozlet
Copy link
Contributor Author

yozlet commented Mar 4, 2016

Sorry to be so slow, @siruguri! Ugh, this is a nasty one, and I'm really torn. Providing floats as strings feels like giving a worse user/developer experience in return for a level of accuracy that - in all the existing use cases, anyway - really isn't needed. On the other hand, Open-data-maker is meant to be domain-agnostic, and I can't speak for the needs of future data sets and their users. (On the other other hand, I don't know what ES uses for its floats internally; the accuracy may already have been compromised before the stats were emitted.)

My inclination is to monkey-patch in favour of a better user experience. What does everyone else think?

@siruguri
Copy link
Contributor

siruguri commented Mar 8, 2016

I'm open to either solution though I think I'd rather un-string the value
in the open-data-maker code than monkey-patch BigDecimal. I'll work on this
today.

On Fri, Mar 4, 2016 at 12:18 PM, Yoz Grahame notifications@github.com
wrote:

Sorry to be so slow, @siruguri https://github.com/siruguri! Ugh, this
is a nasty one, and I'm really torn. Providing floats as strings feels like
giving a worse user/developer experience in return for a level of accuracy
that - in all the existing use cases, anyway - really isn't needed. On the
other hand, Open-data-maker is meant to be domain-agnostic, and I can't
speak for the needs of future data sets and their users. (On the other
other hand, I don't know what ES uses for its floats internally; the
accuracy may already have been compromised before the stats were emitted.)

My inclination is to monkey-patch in favour of a better user experience.
What does everyone else think?


Reply to this email directly or view it on GitHub
#278 (comment)
.

@siruguri
Copy link
Contributor

siruguri commented Mar 8, 2016

In fact, now that I think about it,

Providing floats as strings feels like giving a worse user/developer experience in return for a level of accuracy that - in all the existing use cases, anyway - really isn't needed.

sounds equiv to, "let's cast the BigDecimal object via .to_f" Should I do that instead? Thoughts? Then .to_json will continue to produce non-stringified numbers.

@yozlet
Copy link
Contributor Author

yozlet commented Mar 8, 2016

@siruguri Yes, that sounds like the right option to me. Thank you!

siruguri added a commit to siruguri/open-data-maker that referenced this issue Mar 9, 2016
* We prefer having floats in our JSON strings, rather than strings when
  the source data type is BigDecimal.
@siruguri
Copy link
Contributor

siruguri commented Mar 9, 2016

#303

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants