Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Enhance representation of Numbers #22

Closed
hvr opened this Issue Sep 11, 2011 · 6 comments

Comments

Projects
None yet
4 participants
Collaborator

hvr commented Sep 11, 2011

This issue actually covers two items:

  1. optimize memory layout for small integers
  2. add support for arbitrary precision fractional values

  1. Currently in aeson's IR small integers requires 3 heap objects for being represented:
data Data.Aeson.Types.Value
  = ...
  | Data.Aeson.Types.Number Data.Attoparsec.Number.Number

data Data.Attoparsec.Number.Number
  = Data.Attoparsec.Number.I !Integer
  | Data.Attoparsec.Number.D {-# UNPACK #-} !Double

data Integer
  = integer-gmp:GHC.Integer.Type.S# GHC.Prim.Int#
  | integer-gmp:GHC.Integer.Type.J# GHC.Prim.Int# GHC.Prim.ByteArray#

When representing a list of 1000 small integer values, this requires 3000 heap objects, each of which requires 2 words, leading to a space requirement of 6000 words (= ~48 KiB on 64bit). This could be reduced by 1/3 down to 4000 words by adding an additional constructor to Number:

data Data.Attoparsec.Number.Number
  = Data.Attoparsec.Number.I !Int
  | Data.Attoparsec.Number.IB !Integer
  | Data.Attoparsec.Number.D {-# UNPACK #-} !Double

and adapting attoparsec and aeson to use the I constructor if the parsed integer is representable with a plain Int.


  1. arbitrary precision decimals

Currently a decode followed by a encode via aeson is not loss-free, as it has to reduce the parsed decimal to the numbers representable by the Double type. Here's an example that shows the problem already occuring with small values as they occur in monetary applications:

{-# LANGUAGE OverloadedStrings #-}

import Data.Aeson.Parser
import Data.Aeson
import qualified Data.Attoparsec as AP
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString as B

toStrictBS = B.concat . BL.toChunks

main = do
  let x = AP.parseOnly json "[1.62,1.64,1.82]"
  print x
  print $ toStrictBS (encode x)

when executed this outputs:

Right (Array fromList [Number 1.62,Number 1.6400000000000001,Number 1.8199999999999998] :: Data.Vector.Vector)
"[1.62,1.6400000000000001,1.8199999999999998]"

A similar approach as in 1. could be used to use the Double type when the parsed value can be properly represented by it, and if not switch to an arbitrary-precision representation, for instance something optimized for base-10 such as

data Decimal = Decimal { decSignificant, decExponent :: !Integer }

toRational (Decimal s e) | e >= 0     = toRational (s * 10^e)
                         | otherwise  = s % (10 ^ (-e))
Owner

bos commented Sep 19, 2011

For future reference, please open two issues if you have two things to discuss :-)

Collaborator

basvandijk commented Nov 3, 2011

Regarding:

data Data.Attoparsec.Number.Number
  = Data.Attoparsec.Number.I !Int
  | Data.Attoparsec.Number.IB !Integer
  | Data.Attoparsec.Number.D {-# UNPACK #-} !Double

It's a little unfortunate that both this type and Integer make a choice between small and large integers.

Ideally we would have a type that represents large integers only:

data LargeInteger = J# Int# ByteArray#

(I guess this type can be implemented easily by just copying GHC.Integer (from integer-gmp) and cutting out the small integer S# constructors.)

then we can have:

data Data.Attoparsec.Number.Number
  = Data.Attoparsec.Number.I  {-# UNPACK #-} !Int
  | Data.Attoparsec.Number.IB {-# UNPACK #-} !LargeInteger
  | Data.Attoparsec.Number.D  {-# UNPACK #-} !Double
Collaborator

hvr commented Nov 3, 2011

Yes, that'd be even better :-)

I wonder if GHC HQ could be persuaded into adding such a LargeInteger (maybe BigInt would be a better naming) in the GHC-module namespace, as it might be generally useful to have a not-so-smart single-constructor Integer for some applications...

iustin commented Dec 4, 2012

Hi all,

This issue seems to have stalled. I'm interested as well in a more efficient representation, is there any way I could help?

Owner

bos commented Dec 6, 2012

If someone can come up with a patch and a benchmark that demonstrates its value, I'll apply it. It doesn't look difficult.

Owner

bos commented Nov 22, 2013

I think that the newly(ish) introduced Scientific type addresses this.

@bos bos closed this Nov 22, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment