Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Blob types? #1179

Closed
ProofOfKeags opened this issue May 25, 2021 · 10 comments · Fixed by #1323
Closed

Support for Blob types? #1179

ProofOfKeags opened this issue May 25, 2021 · 10 comments · Fixed by #1323

Comments

@ProofOfKeags
Copy link

Perhaps I'm misunderstanding or misusing this language but I would like to be able to consume a value directly into a Haskell Bytestring. My first idea was to build a decoder that consumed a text value limited to an even number of characters, but I figured since that wasn't already in the library (given it seems like it would be common), I'm guessing that there was a reason to not do this. What is the appropriate way to use non-utf8 values in a dhall config?

@mmhat
Copy link
Collaborator

mmhat commented May 26, 2021

AFAIK there is none at the moment.
You might be interested in the discussion of #1092.
#1121 is somehow related as well.

@Gabriella439
Copy link
Contributor

Gabriella439 commented May 30, 2021

Let's call this type of value Bytes. I think the main thing that would help move this forward is deciding on the syntax for Bytes literals.

@ProofOfKeags
Copy link
Author

My first suggestion would be to use hexadecimal encodings since they are byte aligned and are the required utf8 encoding needed to be in dhall configs.

\xDEADBEEF

or something like that. Is it doable in the syntax tree or are there going to be conflicts?

@Gabriella439
Copy link
Contributor

Here are two possible alternatives that I might suggest for how to encode Bytes literals:

  • Use a syntax similar to the hexadecimal notation that we permit for Natural numbers, but with quotes, like this:

    0x"FEED"
  • Don't provide syntax for Bytes literals at all, and instead provide a bytes builtin of the following type

    bytes : Natural  Bytes

    … which could be used like this:

    bytes 0xFEED

cc: @blbarker, since #1215 reminded me of this

@mmhat
Copy link
Collaborator

mmhat commented Aug 26, 2021

@Gabriel439 I think I prefer the first option for the following reason: With the 0x" prefix it is immediately clear that we deal with a blob whereas in the second alternative this is not known until we feed the natural to bytes. Consider the following code:

let blob : Natural = ./data.dhall
in
bytes blob

During import resolution the implementation will choose a numeric type to hold the content of ./data.dhall just to discover later that this was never meant to be used in this context and it could have chosen a byte array to store the data instead. I am not sure if that really makes a difference though.
On the other hand, it might be more comprehensible for the developer as well.

Another option that comes to my mind is base16:FEED (or b16:FEED) similar to the sha256:FEED hash sums we already use in our integrity checks. That way we might be able to extend the blob type with other encodings if that is requested at some point at maintain consistency within the language.

@Gabriella439
Copy link
Contributor

Gabriella439 commented Aug 26, 2021

@mmhat: I think the main reason I'd prefer quotations instead of a base16: prefix is that if we add other types of numeric literal notations then we can also turn them into valid Bytes literals in the same way by adding quotes

For example, suppose that we standardize support for bitwise numeric literals as suggested in #1215 (e.g. 0b10111000), then we could also make that valid notation for Bytes literals by adding quotes (e.g. 0b"10111000")

@Gabriella439
Copy link
Contributor

Also, I just realized a reason why a bytes : Natural → Bytes built-in wouldn't work, because we'd have no way to represent Bytes with leading zeros, because 0x00FF from 0xFF represent the same Natural number

@mmhat
Copy link
Collaborator

mmhat commented Aug 26, 2021

@Gabriel439 Interestingly when I wrote about extending the blob type I was thinking more about base32: or base64: encodings rather than binary or octal representations and I have no idea how we would add those using the prefix system for numeric literals.
Apart from that I really like that the quotations reflects the close relationship to Text values.

@ProofOfKeags
Copy link
Author

I think the quoted version seems good. From my non-PL-designer POV, I don't have a strong preference for any particular implementation as long as it isn't overly burdensome. And as we all know, it can be quite easy to bikeshed over trivial stuff. If there's no obvious reason not to go with quotations I think we should just do that and call it a day.

@IamfromSpace
Copy link

I also have a few interesting use cases to throw out there for a Bytes type, where files can be imported as such.

One case is for generating powerpoint files (I'm using Dhall basically anywhere and everywhere I can these days!), which is just a zip of a bunch of XML, for the most part. Using dhall to-directory-tree is awesome here, as it does all the things I would otherwise have to do manually here. The only thing left at this point is to copy resources like images and fonts into the directory before zipping. It would be especially cool if I could just import the images in dhall and plop them right into the director tree output so they were scooped up automatically. That would also mean some references could be made explicit, rather than implicit.

Another case would be for something like CloudFormation for AWS Lambda. A zip file of the code is essentially the deployable unit. In some configurations, it's helpful to have a SHA256 of the zip in the CF itself. If the zip could be imported as bytes and then sha256 was a function from and to Bytes that could be rendered out to Text that would again eliminate otherwise custom build steps in some cases.

Hope these use-cases are helpful!

mmhat added a commit to mmhat/dhall-lang that referenced this issue Feb 10, 2023
 * Currently only base-16 represenation
 * No functions to do something meaningful with Bytes values

Fixes dhall-lang#1179
mmhat added a commit that referenced this issue Feb 26, 2023
Currently only base-16 encoded literals are supported.
Functions to do something meaningful with Bytes values are not in the scope of this proposal.

Fixes #1179

---------

Co-authored-by: Gabriella Gonzalez <GenuineGabriella@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants