Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add saner ByteString instances #187

Open
Fuuzetsu opened this issue Mar 5, 2014 · 22 comments
Open

Add saner ByteString instances #187

Fuuzetsu opened this issue Mar 5, 2014 · 22 comments

Comments

@Fuuzetsu
Copy link
Member

Fuuzetsu commented Mar 5, 2014

ByteString instances were removed because they allowed people to try and insert arbitrary binary data that JSON can't handle.

Could we not re-add ByteString instances that leveraged something like base64 encoding (which JSON can take I believe) to work around this limitation?

<mgsloan> Fuuzetsu: Best thing to do is to avoid putting bytestrings in json,
          but when it's unavoidable:
          https://gist.github.com/mgsloan/bfe4bf2250c1d656a2b1  [05:03]
@mgsloan
Copy link

mgsloan commented Mar 5, 2014

I'm in favor of adding and documenting such an instance. Adding this newtype would be a fine alternative as well.

Also, credit where credit is due: I believe it was snoyberg who wrote ByteString64, for our uses at FP Complete.

@basvandijk
Copy link
Member

I don't think we should reintroduce the ByteString instances with base64 implementations. It would really confuse people to first have a text-based ByteString instance, then no instance, then a base64-based instance.

I agree that it would be nice to have a Base64 newtype. However, I don't think aeson is the right library to provide this newtype because I can imagine you also want to use this type for other instances like postgresql-simple's To/FromField.

It's better to export this newtype from its own package together with instances for aeson and other libraries.

@Fuuzetsu
Copy link
Member Author

Hm, fair enough, but I imagine aeson maintainer should make the package as they'll be able to update it with any aeason changes accordingly.

@bos
Copy link
Collaborator

bos commented Mar 18, 2014

I would take a patch to base64-bytestring that added a Base64 newtype, but obviously that (along with a little supporting machinery) is all the patch could introduce; it could not add a dependency on aeson.

@Fuuzetsu
Copy link
Member Author

How about putting the newtype in base64-bytestring and adding dependency on base64-bytestring in aeson which can then provide instances?

I think it's either that or creating a package solely for the instances (while still adding the newtype to base64-bytestring). What do you think?

@basvandijk
Copy link
Member

I now think introducing a type, whose only purpose it is to change the encoding in some medium, is the wrong way to go. Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium. If we would represent binary data using a Base64 newtype we need to wrap and unwrap it at every place we are constructing and deconstructing binary data. This is annoying and obscures the fact that we are just working with binary data. I think it's better if the encoding of some Haskell data type in some particular medium is described in the particular encoding function which in this case is toJSON/parseJSON. This also has the benefit that we can use different encodings for different mediums (for example you can store binary data directly into a PostgreSQL database without first encoding it to base64).

We could introduce some simple helper functions (maybe in base64-bytestring) which would simplify the encoding/decoding in text-based mediums like:

encodeToText :: ByteString -> Text
encodeToText = TE.decodeUtf8 . Base64.encode

decodeFromText :: (Monad m) => Text -> m ByteString
decodeFromText = either fail return . Base64.decode . TE.encodeUtf8

These can be used as in:

data SomeRecord = SomeRecord
  { field1 :: Int
  , field2 :: ByteString
  , field3 :: Bool
  }
instance ToJSON SomeRecord where
  toJSON r = object [ "field1" .: field1 r
                    , "field2" .: Base64.encodeToText (field2 r)
                    , "field3" .: field3 r
                    ]

instance FromJSON SomeRecord where
  parseJSON = withObject "SomeRecord" $ \obj ->
                SomeRecord <$> obj .: "field1"
                           <*> ((obj .: "field2") >>= Base64.decodeFromText)
                           <*> obj .: "field3" 

One disadvantage of this approach is that we can't currently use these functions when automatically deriving ToJSON/FromJSON instances using GHC Generics or Template Haskell. I think the best way to solve that is to introduce codec overwrites as in something like:

data Options = Options
  { ...
  , fieldCodecs :: [(String, Codec)]
    ...
  }

data Codec = forall a. (Typeable a) => Codec 
  { codecEncode :: a -> Value
  , codecDecode :: Value -> Parser a
  }

base64Codec :: Codec
base64Codec = Codec
  { codecEncode = toJSON . Base64.encodeToText
  , codecDecode = withText "base64" Base64.decodeFromText
  }


deriveJSON 
  defaultOptions{fieldCodecs = [("field2", base64Codec)]}
  ''SomeRecord

This could be implemented rather efficiently for the Template Haskell encoder but probably not so efficiently for the GHC Generics encoders since they need to do a lookup in the fieldCodecs association for each field dynamically.

The implementation of this does require the use of cast but hopefully that's not too inefficient.

@snoyberg
Copy link
Contributor

FWIW, I have a ByteString64 data type in our codebase with exactly this purpose. I'd be +1 on having that functionality exported from either base64-bytestring or aeson.

cutsea110 added a commit to cutsea110/owl-auth that referenced this issue May 13, 2014
@phadej
Copy link
Collaborator

phadej commented Jun 5, 2016

FWIW we also have ByteString64.

OTOH @basvandijk's could be interesting, but I'd rather do it only for template-haskell derivations first.

@winterland1989
Copy link
Contributor

+1 on having this instance exported from aeson, a dependency on base64-bytestring is acceptable.

@bergmark
Copy link
Collaborator

I don't have an opinon on this. It looks like there are 4 votes for adding this newtype and one against. Another option is to add this instance to a blessed orphans package (#432). OTOH the dependency is very light. Sounds like the first step should be to add this newtype to base64-bytestring no matter how we choose to proceed.

@cliffordbeshers
Copy link
Contributor

I ran into this issue yesterday. Comments and observations.

@basvandijk writes "Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium." I would like to see a larger justification for this statement, because it runs against much that I hold to be true.

In any case, I do not think it applies in this instance. I am wrapping a newtype around my Bytestring precisely because I want the encoding/decoding of my data to be fully encapsulated in the aeson process. Otherwise, I have to add an extra step on either the client or the server side everywhere I use that type. This would be particularly onerous because the uses are asymmetrical. On the server side, compiling with ghc, this bytestring gets read from the file system. On the client side, compiling with ghcjs, it gets converted to a blob. I have used additional encoding/decoding functions as you suggest, in fact, I have those exact ones, and they are error prone, becuse they do not encode he semantics of the encoding. They are tricky to maintain and difficult for othesr to understand. It exists in our production system and it causes me dread every time I get near it.

I will give the newtype solution a try today to see if I run into anything ugly.

@Fuuzetsu
Copy link
Member Author

AFAIK the "canonical" solution to this for last few years is ByteString64 newtype though I would still like it to live inside aeson as ByteString instance as it's often a hassle when you find out late into existing codebase that you'd like JSON instances and you end up rewrapping throughout whole codebase: nothing major but inconvenient.

@phadej
Copy link
Collaborator

phadej commented Jul 27, 2017

@Fuuzetsu IMHO preferring single encoding is wrong a choice. Someone might as well have data in base16 (hex) too. It's a bit like Sum & Product for Monoid.

Wrapping business can be solved independently of aeson. (I can imagine using generics-sop Codes + Coercible to do oneliner conversion).

@Fuuzetsu
Copy link
Member Author

if you want different encoding then you can use newtype, this issue is about having a default

@basvandijk
Copy link
Member

While thinking about this again I would like to slightly adapt my previous opinion on the matter:

Usually when you have a string of binary data somewhere in your record it represents a particular thing, i.e.: an image, blob, hash, ASCII text, etc.. In those cases wouldn't it be better to wrap that string of bytes with a specific newtype telling your users what the bytes mean?

So I guess I'm advocating using a dedicated newtype for each type of bytestring instead of using a ByteString or ByteString64 directly. For example, instead of having:

data MyRecord = MyRecord
  { ...
  , blob :: ByteString
  , hash :: ByteString
  , ...
  }

You should have:

data MyRecord = MyRecord
  { ...
  , blob :: Blob
  , hash :: Hash
  , ...
  }

where

newtype Blob = Blob ByteString

instance ToJSON Blob where ...
instance FromJSON Blob where ...

newtype Hash = Hash ByteString

instance ToJSON Hash where ...
instance FromJSON Hash where ...

In the latter it wouldn't hurt using:

newtype Blob = Blob ByteString64 deriving (ToJSON, FromJSON)
newtype Hash = Hash ByteString64 deriving (ToJSON, FromJSON)

to get the JSON instances for free.

What do you folks think about this? Would this be to much hassle in practise?

@Fuuzetsu
Copy link
Member Author

it's no different than blob :: ByteString64, hash :: ByteString64 except now you need an instance for every one of your wrappers (Blob, Hash, …) that you have to write rather than having 1 instance. So it seems strictly worse from the "amount of code" perspective. Of course knowing we can't swap Blob and Hash around by accident has its own merits but unsure if we can use that as an argument here…

@basvandijk
Copy link
Member

Yes you do get an extra newtype wrapper W (newtype W = W ByteString64 deriving (ToJSON, FromJSON)) for every different type of bytestring which indeed causes more code.

BTW I'm not opposing providing a ByteString64 newtype. Exporting it somewhere from base64-bytestring would be nice.

@bgamari
Copy link

bgamari commented Feb 21, 2018

For the record, @phadej has uploaded his base64-bytestring-type package to Hackage.

@phadej
Copy link
Collaborator

phadej commented Feb 21, 2018

@bgamari well spotted!

@hvr
Copy link
Member

hvr commented Feb 21, 2018

...or good stalking ;-)

@djleonskennedy
Copy link

@basvandijk Might i miss something but type says:
encode :: ToJSON a => a -> ByteString -- ByteString
and then:
No instance for (ToJSON ByteString)
How to deal with it without workarounds with generics, implement own instance?
Regards

@bgamari
Copy link

bgamari commented Dec 29, 2019

@djleonskennedy, can you rephrase the question? I'm not sure I follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests