Add saner ByteString instances #187

Fuuzetsu · 2014-03-05T05:14:28Z

ByteString instances were removed because they allowed people to try and insert arbitrary binary data that JSON can't handle.

Could we not re-add ByteString instances that leveraged something like base64 encoding (which JSON can take I believe) to work around this limitation?

<mgsloan> Fuuzetsu: Best thing to do is to avoid putting bytestrings in json,
          but when it's unavoidable:
          https://gist.github.com/mgsloan/bfe4bf2250c1d656a2b1  [05:03]

The text was updated successfully, but these errors were encountered:

mgsloan · 2014-03-05T05:28:32Z

I'm in favor of adding and documenting such an instance. Adding this newtype would be a fine alternative as well.

Also, credit where credit is due: I believe it was snoyberg who wrote ByteString64, for our uses at FP Complete.

basvandijk · 2014-03-17T17:40:17Z

I don't think we should reintroduce the ByteString instances with base64 implementations. It would really confuse people to first have a text-based ByteString instance, then no instance, then a base64-based instance.

I agree that it would be nice to have a Base64 newtype. However, I don't think aeson is the right library to provide this newtype because I can imagine you also want to use this type for other instances like postgresql-simple's To/FromField.

It's better to export this newtype from its own package together with instances for aeson and other libraries.

Fuuzetsu · 2014-03-18T00:46:25Z

Hm, fair enough, but I imagine aeson maintainer should make the package as they'll be able to update it with any aeason changes accordingly.

bos · 2014-03-18T16:31:25Z

I would take a patch to base64-bytestring that added a Base64 newtype, but obviously that (along with a little supporting machinery) is all the patch could introduce; it could not add a dependency on aeson.

Fuuzetsu · 2014-03-18T18:27:22Z

How about putting the newtype in base64-bytestring and adding dependency on base64-bytestring in aeson which can then provide instances?

I think it's either that or creating a package solely for the instances (while still adding the newtype to base64-bytestring). What do you think?

basvandijk · 2014-03-18T21:29:04Z

I now think introducing a type, whose only purpose it is to change the encoding in some medium, is the wrong way to go. Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium. If we would represent binary data using a Base64 newtype we need to wrap and unwrap it at every place we are constructing and deconstructing binary data. This is annoying and obscures the fact that we are just working with binary data. I think it's better if the encoding of some Haskell data type in some particular medium is described in the particular encoding function which in this case is toJSON/parseJSON. This also has the benefit that we can use different encodings for different mediums (for example you can store binary data directly into a PostgreSQL database without first encoding it to base64).

We could introduce some simple helper functions (maybe in base64-bytestring) which would simplify the encoding/decoding in text-based mediums like:

encodeToText :: ByteString -> Text
encodeToText = TE.decodeUtf8 . Base64.encode

decodeFromText :: (Monad m) => Text -> m ByteString
decodeFromText = either fail return . Base64.decode . TE.encodeUtf8

These can be used as in:

data SomeRecord = SomeRecord
  { field1 :: Int
  , field2 :: ByteString
  , field3 :: Bool
  }

instance ToJSON SomeRecord where
  toJSON r = object [ "field1" .: field1 r
                    , "field2" .: Base64.encodeToText (field2 r)
                    , "field3" .: field3 r
                    ]

instance FromJSON SomeRecord where
  parseJSON = withObject "SomeRecord" $ \obj ->
                SomeRecord <$> obj .: "field1"
                           <*> ((obj .: "field2") >>= Base64.decodeFromText)
                           <*> obj .: "field3"

One disadvantage of this approach is that we can't currently use these functions when automatically deriving ToJSON/FromJSON instances using GHC Generics or Template Haskell. I think the best way to solve that is to introduce codec overwrites as in something like:

data Options = Options
  { ...
  , fieldCodecs :: [(String, Codec)]
    ...
  }

data Codec = forall a. (Typeable a) => Codec 
  { codecEncode :: a -> Value
  , codecDecode :: Value -> Parser a
  }

base64Codec :: Codec
base64Codec = Codec
  { codecEncode = toJSON . Base64.encodeToText
  , codecDecode = withText "base64" Base64.decodeFromText
  }


deriveJSON 
  defaultOptions{fieldCodecs = [("field2", base64Codec)]}
  ''SomeRecord

This could be implemented rather efficiently for the Template Haskell encoder but probably not so efficiently for the GHC Generics encoders since they need to do a lookup in the fieldCodecs association for each field dynamically.

The implementation of this does require the use of cast but hopefully that's not too inefficient.

snoyberg · 2014-04-20T05:32:40Z

FWIW, I have a ByteString64 data type in our codebase with exactly this purpose. I'd be +1 on having that functionality exported from either base64-bytestring or aeson.

ref. haskell/aeson#187

phadej · 2016-06-05T13:11:22Z

FWIW we also have ByteString64.

OTOH @basvandijk's could be interesting, but I'd rather do it only for template-haskell derivations first.

winterland1989 · 2016-06-07T03:18:08Z

+1 on having this instance exported from aeson, a dependency on base64-bytestring is acceptable.

bergmark · 2017-03-19T00:58:35Z

I don't have an opinon on this. It looks like there are 4 votes for adding this newtype and one against. Another option is to add this instance to a blessed orphans package (#432). OTOH the dependency is very light. Sounds like the first step should be to add this newtype to base64-bytestring no matter how we choose to proceed.

cliffordbeshers · 2017-07-27T17:32:05Z

I ran into this issue yesterday. Comments and observations.

@basvandijk writes "Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium." I would like to see a larger justification for this statement, because it runs against much that I hold to be true.

In any case, I do not think it applies in this instance. I am wrapping a newtype around my Bytestring precisely because I want the encoding/decoding of my data to be fully encapsulated in the aeson process. Otherwise, I have to add an extra step on either the client or the server side everywhere I use that type. This would be particularly onerous because the uses are asymmetrical. On the server side, compiling with ghc, this bytestring gets read from the file system. On the client side, compiling with ghcjs, it gets converted to a blob. I have used additional encoding/decoding functions as you suggest, in fact, I have those exact ones, and they are error prone, becuse they do not encode he semantics of the encoding. They are tricky to maintain and difficult for othesr to understand. It exists in our production system and it causes me dread every time I get near it.

I will give the newtype solution a try today to see if I run into anything ugly.

Fuuzetsu · 2017-07-27T17:41:27Z

AFAIK the "canonical" solution to this for last few years is ByteString64 newtype though I would still like it to live inside aeson as ByteString instance as it's often a hassle when you find out late into existing codebase that you'd like JSON instances and you end up rewrapping throughout whole codebase: nothing major but inconvenient.

phadej · 2017-07-27T17:49:43Z

@Fuuzetsu IMHO preferring single encoding is wrong a choice. Someone might as well have data in base16 (hex) too. It's a bit like Sum & Product for Monoid.

Wrapping business can be solved independently of aeson. (I can imagine using generics-sop Codes + Coercible to do oneliner conversion).

Fuuzetsu · 2017-07-28T02:31:26Z

if you want different encoding then you can use newtype, this issue is about having a default

basvandijk · 2017-07-28T08:12:04Z

While thinking about this again I would like to slightly adapt my previous opinion on the matter:

Usually when you have a string of binary data somewhere in your record it represents a particular thing, i.e.: an image, blob, hash, ASCII text, etc.. In those cases wouldn't it be better to wrap that string of bytes with a specific newtype telling your users what the bytes mean?

So I guess I'm advocating using a dedicated newtype for each type of bytestring instead of using a ByteString or ByteString64 directly. For example, instead of having:

data MyRecord = MyRecord
  { ...
  , blob :: ByteString
  , hash :: ByteString
  , ...
  }

You should have:

data MyRecord = MyRecord
  { ...
  , blob :: Blob
  , hash :: Hash
  , ...
  }

where

newtype Blob = Blob ByteString

instance ToJSON Blob where ...
instance FromJSON Blob where ...

newtype Hash = Hash ByteString

instance ToJSON Hash where ...
instance FromJSON Hash where ...

In the latter it wouldn't hurt using:

newtype Blob = Blob ByteString64 deriving (ToJSON, FromJSON)
newtype Hash = Hash ByteString64 deriving (ToJSON, FromJSON)

to get the JSON instances for free.

What do you folks think about this? Would this be to much hassle in practise?

Fuuzetsu · 2017-07-28T09:01:59Z

it's no different than blob :: ByteString64, hash :: ByteString64 except now you need an instance for every one of your wrappers (Blob, Hash, …) that you have to write rather than having 1 instance. So it seems strictly worse from the "amount of code" perspective. Of course knowing we can't swap Blob and Hash around by accident has its own merits but unsure if we can use that as an argument here…

basvandijk · 2017-07-28T10:16:08Z

Yes you do get an extra newtype wrapper W (newtype W = W ByteString64 deriving (ToJSON, FromJSON)) for every different type of bytestring which indeed causes more code.

BTW I'm not opposing providing a ByteString64 newtype. Exporting it somewhere from base64-bytestring would be nice.

bgamari · 2018-02-21T20:31:47Z

For the record, @phadej has uploaded his base64-bytestring-type package to Hackage.

phadej · 2018-02-21T20:38:43Z

@bgamari well spotted!

hvr · 2018-02-21T22:35:52Z

...or good stalking ;-)

djleonskennedy · 2019-12-29T23:01:09Z

@basvandijk Might i miss something but type says:
encode :: ToJSON a => a -> ByteString -- ByteString
and then:
No instance for (ToJSON ByteString)
How to deal with it without workarounds with generics, implement own instance?
Regards

bgamari · 2019-12-29T23:32:51Z

@djleonskennedy, can you rephrase the question? I'm not sure I follow.

cutsea110 added a commit to cutsea110/owl-auth that referenced this issue May 13, 2014

TODO: I expect future aeson library

a517724

ref. haskell/aeson#187

bergmark added the help wanted label May 9, 2016

bergmark added needs decision blocked by upstream/should be done externally labels Mar 19, 2017

bergmark added help wanted and removed help wanted labels Oct 18, 2017

steve-chavez mentioned this issue Dec 8, 2017

Change RPC/UPDATE query to a parametrized query, avoid payload json encoding PostgREST/postgrest#1005

Merged

domenkozar mentioned this issue Oct 9, 2019

To/FromHttpApiData ByteString instance fizruk/http-api-data#1

Closed

hasufell mentioned this issue May 17, 2023

Support AbstractFilePath commercialhaskell/path#189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add saner ByteString instances #187

Add saner ByteString instances #187

Fuuzetsu commented Mar 5, 2014

mgsloan commented Mar 5, 2014

basvandijk commented Mar 17, 2014

Fuuzetsu commented Mar 18, 2014

bos commented Mar 18, 2014

Fuuzetsu commented Mar 18, 2014

basvandijk commented Mar 18, 2014

snoyberg commented Apr 20, 2014

phadej commented Jun 5, 2016

winterland1989 commented Jun 7, 2016

bergmark commented Mar 19, 2017

cliffordbeshers commented Jul 27, 2017

Fuuzetsu commented Jul 27, 2017

phadej commented Jul 27, 2017 •

edited

Fuuzetsu commented Jul 28, 2017

basvandijk commented Jul 28, 2017

Fuuzetsu commented Jul 28, 2017

basvandijk commented Jul 28, 2017

bgamari commented Feb 21, 2018

phadej commented Feb 21, 2018

hvr commented Feb 21, 2018

djleonskennedy commented Dec 29, 2019

bgamari commented Dec 29, 2019

Navigation Menu

Add saner ByteString instances #187

Add saner ByteString instances #187

Comments

Fuuzetsu commented Mar 5, 2014

mgsloan commented Mar 5, 2014

basvandijk commented Mar 17, 2014

Fuuzetsu commented Mar 18, 2014

bos commented Mar 18, 2014

Fuuzetsu commented Mar 18, 2014

basvandijk commented Mar 18, 2014

snoyberg commented Apr 20, 2014

phadej commented Jun 5, 2016

winterland1989 commented Jun 7, 2016

bergmark commented Mar 19, 2017

cliffordbeshers commented Jul 27, 2017

Fuuzetsu commented Jul 27, 2017

phadej commented Jul 27, 2017 • edited

Fuuzetsu commented Jul 28, 2017

basvandijk commented Jul 28, 2017

Fuuzetsu commented Jul 28, 2017

basvandijk commented Jul 28, 2017

bgamari commented Feb 21, 2018

phadej commented Feb 21, 2018

hvr commented Feb 21, 2018

djleonskennedy commented Dec 29, 2019

bgamari commented Dec 29, 2019

phadej commented Jul 27, 2017 •

edited