When encoding floating point numbers, always include a `.` #181

Closed
sol opened this Issue Jan 31, 2014 · 5 comments

Comments

Projects
None yet
4 participants
Contributor

sol commented Jan 31, 2014

With Aeson 0.7.0.0 the encoding behavior for floating point numbers changed. Basically 2.0 :: Double and 2 :: Int are encode in the same way (haven't looked at the code, but I expect this to be due to the scientific change /cc @basvandijk).

You could argue that this behavior is perfectly fine, JSON does not really distinguish between floats and integers after all. Anyway, this can lead to unexpected behavior when working with other systems (in our case this introduced a serious bug that we can not easily fix in our code).

Not sure if it's easy to change the current behavior, but I'd be willing to work on a patch, if it will be accepted upstream (e.g. by tagging the Number constructor).

What do you guys think?

Steps to reproduce:

ghci> :set -XOverloadedStrings
ghci> import Data.Aeson
ghci> encode $ object  ["foo" .= (2 :: Double)]

With 0.6.2.1 this resulted in:

{"foo":2.0}"

With 0.7.0.0 the behavior changed, now it's:

{"foo":2}
Collaborator

basvandijk commented Mar 17, 2014

With Aeson 0.7.0.0 the encoding behavior for floating point numbers changed. Basically 2.0 :: Double and 2 :: Int are encode in the same way (haven't looked at the code, but I expect this to be due to the scientific change /cc @basvandijk).

Indeed, a Scientific number stores the coeffiicent and the base-10 exponent of the number:

data Scientific = Scientific
    { coefficient    ::                !Integer -- ^ The coefficient of a scientific number.
    , base10Exponent :: {-# UNPACK #-} !Int     -- ^ The base-10 exponent of a scientific number.
    }

If the exponent is positive it means the number is an integer. If it's negative it's a floating point number.

You could argue that this behavior is perfectly fine, JSON does not really distinguish between floats and integers after all. Anyway, this can lead to unexpected behavior when working with other systems (in our case this introduced a serious bug that we can not easily fix in our code).

I'm sorry to hear that! Although this looks like buggy software since floating point numbers should be parsable from an integer number like in Haskell:

read "1" :: Float
1.0

Not sure if it's easy to change the current behavior, but I'd be willing to work on a patch, if it will be accept upstream (e.g. by tagging the Number constructor).

There is some precedent to do this since Haskell always prints floating point numbers with a floating point (even if it's redundant).

If we implement this it would probably make sense to add a Bool flag to the Scientific constructor which when True always prints a floating point number.

The parser for scientific numbers then also needs to change so that it enables this field when a floating point is detected.

Then the ToJSON instance for Double/Float will enable this field.

What do you guys think?

I'm a bit concerned about the space implications of adding this Bool field to Scientific but I can see that this is a simple way of working around buggy software.

@bos what do you think?

Owner

bos commented Apr 11, 2014

The current behaviour and the old behaviour are both correct, as Javascript does not have separate integer and floating point types. I'm sorry that this has resulted in a bug for you, @sol, but I don't think we should change this. Whatever code is doing the wrong thing when presented with "2" instead of "2.0" is at fault.

bos closed this Apr 11, 2014

Contributor

sol commented Apr 12, 2014

Well, this basically means that you can not use aeson to interact with ElasticSearch. Of course, we can pretend that they should just fix things, but this may not happen, which basically means that we need an other JSON library that allows us to control the encoding behavior more thoroughly. (/ccing @bitemyapp, who is currently working on an ElasticSearch interface that uses aeson)

@bos This is crippling for the Haskell Elasticsearch library I'm working on.

From a narrow JSON/Number-is-always-ieee-754 perspective that 2 being a valid encoding of Double makes sense, but you cannot imagine that there are plenty of services out there that use JSON as a format for the protocol, but make the distinction can you?

Aeson is by this point a vital piece of infrastructure for most Haskell users. Sometimes accommodations for annoying deviations from what is strictly "standard" (which is almost never a primary consideration in JSON based protocols).

Elasticsearch is a document store and search engine. It has to make the distinction between integers and floating points numbers even if the API uses JSON.

The point of a library like Aeson is to be able to communicate with other machines and services, strict adherence to a standard in this matter makes Aeson less rather than more capable in that regard.

The standard says nothing that obligates Aeson to this follow this behavior. The Number digits preceding a . is called the integer component, with the digits following being called the fraction component. The standard does not declare "2.0" to be invalid nor does it suggest that 2.0 should be simplified to 2. I am comfortable suggesting that the flexibility was left in precisely so distinctions between fractional and non fractional numbers could be expressed.

Aeson uses arbitrary precision representations despite the fact that this has no precedent in JavaScript Number implementations nor is it in the JSON standard.

Aeson's behavior on this issue is neither in the spirit nor the letter of the standard.

Fixing this issue doesn't compromise Aeson's adherence to the standard nor would it break any "expectations" from non-Aeson clients. It would improve Aeson's usefulness and applicability.

We cannot move the mountain to Mohammed on this. Not least of all because as Haskell users we don't have the numbers to lobby for changes in data stores like Elasticsearch.

Expecting Elasticsearch to coerce the types and potentially lose information is less reasonable and more surprising than the fix proposed here.

Please consider reopening the issue.

Just to clarify, I do NOT want or propose the library do anything that could destroy information. If anything, I am asking for behavior that reduces the possibility of this.

Summary: Some services that use JSON parse 2 to int and 2.0 to double. This is used to express intent, for type-safety, and to prevent data loss. Most JSON clients handle this.

If the present behavior must be preserved, is there a way to accommodate both?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment