Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

A general serializer for Data.Map #79

Open
nponeccop opened this Issue · 11 comments

4 participants

@nponeccop

I don't know if the idea is good but look at this dirty proof of concept code:

class ToPropertyName a where
    toPropertyName :: a -> String

instance (ToPropertyName a, ToJSON b) => ToJSON (M.Map a b) where
    toJSON = toJSON . M.mapKeys toPropertyName  

In many languages (think of JS, PHP, Perl) map keys can only be strings. So people serialize other values to strings to get the lookup performance of native associative containers in their languages. To interoperate with those people a Haskeller needs to parse their compound map keys (property names in ECMAScript parlance) into a more type safe form.

Of course one can put compound values in any place, but property name is a special case because of performance, so it may deserve a special treatment in Aeson.

Also, in Haskell people use enumerations and newtypes isomorphic to strings in places where in poorer languages they have to use strings. So it's good to have an ability to use an enumeration (I mean data Foo = Bar | Baz | Quux) for map keys, either by detecting enumerations during TH derivation or by allowing people to use simple instances like toPropertyName = show instead of having to implement a full instance for Data.Map Foo Something

What do you think?

@basvandijk
Collaborator

I actually use the following module at work: Note the TODO ;-)

So I'm very much +1 on this.

Bryan, should I go ahead and make a pull request out of this?

{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE FlexibleInstances #-}

module Data.Aeson.Name where

import qualified Data.ByteString      as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.Text            as T
import qualified Data.Text.Encoding   as TE
import qualified Data.Text.Lazy       as TL

import Data.Aeson (Value(Object), ToJSON, toJSON, FromJSON, parseJSON)

import Data.Hashable (Hashable)

import qualified Data.HashMap.Strict as H

-- TODO: Propose these classes for aeson:

class ToName   a where toName   :: a      -> T.Text
class FromName a where fromName :: T.Text -> a

instance ToName   T.Text         where toName   = id
instance FromName T.Text         where fromName = id

instance ToName   TL.Text        where toName   = TL.toStrict
instance FromName TL.Text        where fromName = TL.fromStrict

instance ToName   String         where toName   = T.pack
instance FromName String         where fromName = T.unpack

instance ToName   B.ByteString   where toName   = TE.decodeUtf8
instance FromName B.ByteString   where fromName = TE.encodeUtf8

instance ToName   BL.ByteString  where toName   = TE.decodeUtf8 . B.concat . BL.toChunks
instance FromName BL.ByteString  where fromName = BL.fromChunks . (:[]) . TE.encodeUtf8

--------------------------------------------------------------------------------

instance (ToName k, ToJSON a) => ToJSON (H.HashMap k a) where
    toJSON = Object . mapKeyVal toName toJSON

instance (Eq k, Hashable k, FromName k, FromJSON a) =>
    FromJSON (H.HashMap k a) where
    parseJSON = fmap (mapKey fromName) . parseJSON

--------------------------------------------------------------------------------
-- Copied from aeson:

-- | Transform the keys and values of a 'H.HashMap'.
mapKeyVal :: (Eq k2, Hashable k2) => (k1 -> k2) -> (v1 -> v2)
          -> H.HashMap k1 v1 -> H.HashMap k2 v2
mapKeyVal fk kv = H.foldrWithKey (\k v -> H.insert (fk k) (kv v)) H.empty
{-# INLINE mapKeyVal #-}

-- | Transform the keys of a 'H.HashMap'.
mapKey :: (Eq k2, Hashable k2) => (k1 -> k2) -> H.HashMap k1 v -> H.HashMap k2 v
mapKey fk = mapKeyVal fk id
{-# INLINE mapKey #-}
@nponeccop

Key mappings can be implemented more efficiently - for example, we can use map (first toPropertyName) . M.toList which is O(n) instead of M.toList . mapKeys toPropertyName which is O(n*log n).

And we may need a more general interface. For example,

class ToJSONObject where
   ToJSONObject :: (ToPropertyName b, ToJSON c) => a -> [(b, c)]

or we may consider changing (.=) to use the ToPropertyName type class.

@basvandijk
Collaborator

Key mappings can be implemented more efficiently - for example, we can use map f . M.toList which is O(n) instead of mapKeys which is O(n*log n).

But map f . M.toList produces a list which still needs to be converted to a HashMap which is O(n*log n).

@nponeccop

It doesn't need to be converted - we can serialize the list of key-value tuples right away by passing the result of map to object:

toJSON = return . object . map (\(k, v) -> toPropertyName k .= v) . M.toList
@basvandijk
Collaborator

But note that object converts the list to a HashMap which is O(n*log n):

object :: [Pair] -> Value
object = Object . H.fromList

we may consider changing (.=) to use the ToPropertyName type class.

I fear that this can cause ambiguity when using the OverloadedStringsextension.

@nponeccop

Bummer I didn't know

@nponeccop

Is there a way to avoid constructing hashtables during serialization?

@basvandijk
Collaborator

No, serialization and deserialization both go through the Value datatype where an Object is defined as a HashMap.

Of course you can imagine a serializer for objects that doesn't construct a HashMap. However, you still want to ensure that your keys are unique so only accepting a HashMap is a nice way of guaranteeing that. It's also nicely consistent with deserialization which I think is even more important.

@bos
Owner

I think that the original request is a reasonable thing to want, but I'm not sure I want to make aeson even bigger to support it. The API is already a bit unwieldy.

By the way, I have indeed thought about adding a direct encoding function to bypass the HashMap construction. I think it would make encoding quite a bit faster. I'm not concerned about duplicate keys. The current serializer will choose a winner at random if there's a duplicate key, which isn't a very sensible behaviour to try to defend :-)

@bos
Owner

See this blog post for my current thinking around direct encoding.

@ibotty

regarding your open questions: i like the homogeneous arrays, as that's what i usually use, but the same idea for objects sounds unreasonable for me. i constantly have to deal with heterogeneous objects. i guess most are, so for me the type would get into the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.