New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add saner ByteString instances #187
Comments
I'm in favor of adding and documenting such an instance. Adding this newtype would be a fine alternative as well. Also, credit where credit is due: I believe it was snoyberg who wrote |
I don't think we should reintroduce the ByteString instances with base64 implementations. It would really confuse people to first have a text-based ByteString instance, then no instance, then a base64-based instance. I agree that it would be nice to have a Base64 newtype. However, I don't think aeson is the right library to provide this newtype because I can imagine you also want to use this type for other instances like postgresql-simple's To/FromField. It's better to export this newtype from its own package together with instances for aeson and other libraries. |
Hm, fair enough, but I imagine aeson maintainer should make the package as they'll be able to update it with any aeason changes accordingly. |
I would take a patch to base64-bytestring that added a |
How about putting the newtype in base64-bytestring and adding dependency on base64-bytestring in aeson which can then provide instances? I think it's either that or creating a package solely for the instances (while still adding the newtype to base64-bytestring). What do you think? |
I now think introducing a type, whose only purpose it is to change the encoding in some medium, is the wrong way to go. Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium. If we would represent binary data using a Base64 newtype we need to wrap and unwrap it at every place we are constructing and deconstructing binary data. This is annoying and obscures the fact that we are just working with binary data. I think it's better if the encoding of some Haskell data type in some particular medium is described in the particular encoding function which in this case is toJSON/parseJSON. This also has the benefit that we can use different encodings for different mediums (for example you can store binary data directly into a PostgreSQL database without first encoding it to base64). We could introduce some simple helper functions (maybe in base64-bytestring) which would simplify the encoding/decoding in text-based mediums like: encodeToText :: ByteString -> Text
encodeToText = TE.decodeUtf8 . Base64.encode
decodeFromText :: (Monad m) => Text -> m ByteString
decodeFromText = either fail return . Base64.decode . TE.encodeUtf8 These can be used as in: data SomeRecord = SomeRecord
{ field1 :: Int
, field2 :: ByteString
, field3 :: Bool
} instance ToJSON SomeRecord where
toJSON r = object [ "field1" .: field1 r
, "field2" .: Base64.encodeToText (field2 r)
, "field3" .: field3 r
]
instance FromJSON SomeRecord where
parseJSON = withObject "SomeRecord" $ \obj ->
SomeRecord <$> obj .: "field1"
<*> ((obj .: "field2") >>= Base64.decodeFromText)
<*> obj .: "field3" One disadvantage of this approach is that we can't currently use these functions when automatically deriving ToJSON/FromJSON instances using GHC Generics or Template Haskell. I think the best way to solve that is to introduce codec overwrites as in something like: data Options = Options
{ ...
, fieldCodecs :: [(String, Codec)]
...
}
data Codec = forall a. (Typeable a) => Codec
{ codecEncode :: a -> Value
, codecDecode :: Value -> Parser a
}
base64Codec :: Codec
base64Codec = Codec
{ codecEncode = toJSON . Base64.encodeToText
, codecDecode = withText "base64" Base64.decodeFromText
}
deriveJSON
defaultOptions{fieldCodecs = [("field2", base64Codec)]}
''SomeRecord This could be implemented rather efficiently for the Template Haskell encoder but probably not so efficiently for the GHC Generics encoders since they need to do a lookup in the fieldCodecs association for each field dynamically. The implementation of this does require the use of cast but hopefully that's not too inefficient. |
FWIW, I have a |
FWIW we also have OTOH @basvandijk's could be interesting, but I'd rather do it only for |
+1 on having this instance exported from aeson, a dependency on base64-bytestring is acceptable. |
I don't have an opinon on this. It looks like there are 4 votes for adding this newtype and one against. Another option is to add this instance to a blessed orphans package (#432). OTOH the dependency is very light. Sounds like the first step should be to add this newtype to base64-bytestring no matter how we choose to proceed. |
I ran into this issue yesterday. Comments and observations. @basvandijk writes "Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium." I would like to see a larger justification for this statement, because it runs against much that I hold to be true. In any case, I do not think it applies in this instance. I am wrapping a newtype around my Bytestring precisely because I want the encoding/decoding of my data to be fully encapsulated in the aeson process. Otherwise, I have to add an extra step on either the client or the server side everywhere I use that type. This would be particularly onerous because the uses are asymmetrical. On the server side, compiling with ghc, this bytestring gets read from the file system. On the client side, compiling with ghcjs, it gets converted to a blob. I have used additional encoding/decoding functions as you suggest, in fact, I have those exact ones, and they are error prone, becuse they do not encode he semantics of the encoding. They are tricky to maintain and difficult for othesr to understand. It exists in our production system and it causes me dread every time I get near it. I will give the newtype solution a try today to see if I run into anything ugly. |
AFAIK the "canonical" solution to this for last few years is |
@Fuuzetsu IMHO preferring single encoding is wrong a choice. Someone might as well have data in base16 (hex) too. It's a bit like Wrapping business can be solved independently of |
if you want different encoding then you can use newtype, this issue is about having a default |
While thinking about this again I would like to slightly adapt my previous opinion on the matter: Usually when you have a string of binary data somewhere in your record it represents a particular thing, i.e.: an image, blob, hash, ASCII text, etc.. In those cases wouldn't it be better to wrap that string of bytes with a specific newtype telling your users what the bytes mean? So I guess I'm advocating using a dedicated newtype for each type of bytestring instead of using a data MyRecord = MyRecord
{ ...
, blob :: ByteString
, hash :: ByteString
, ...
} You should have: data MyRecord = MyRecord
{ ...
, blob :: Blob
, hash :: Hash
, ...
} where newtype Blob = Blob ByteString
instance ToJSON Blob where ...
instance FromJSON Blob where ...
newtype Hash = Hash ByteString
instance ToJSON Hash where ...
instance FromJSON Hash where ...
In the latter it wouldn't hurt using: newtype Blob = Blob ByteString64 deriving (ToJSON, FromJSON)
newtype Hash = Hash ByteString64 deriving (ToJSON, FromJSON) to get the JSON instances for free. What do you folks think about this? Would this be to much hassle in practise? |
it's no different than |
Yes you do get an extra newtype wrapper W ( BTW I'm not opposing providing a |
@bgamari well spotted! |
...or good stalking ;-) |
@basvandijk Might i miss something but type says: |
@djleonskennedy, can you rephrase the question? I'm not sure I follow. |
ByteString instances were removed because they allowed people to try and insert arbitrary binary data that JSON can't handle.
Could we not re-add ByteString instances that leveraged something like base64 encoding (which JSON can take I believe) to work around this limitation?
The text was updated successfully, but these errors were encountered: