Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Support for importing from JSON #121

Open
Gabriella439 opened this issue Mar 20, 2018 · 51 comments
Open

Discussion: Support for importing from JSON #121

Gabriella439 opened this issue Mar 20, 2018 · 51 comments

Comments

@Gabriella439
Copy link
Contributor

One pretty heavily requested feature is importing JSON values directly into Dhall. The most common requested reasons for doing this are:

  • Interop with existing JSON infrastructure (i.e. reusing shared JSON configuration files or API endpoints between tools)
  • Taking advantage of Dhall's declarative import system to orchestrate tying together multiple heterogeneous inputs

I'm open to the idea although I probably won't implement it until the import semantics are standardized (any day now :) ). In the meantime, though, I can still gather feedback on whether Dhall should support this feature and to flesh out what it might look like if it were proposed.

This would most likely be similar in spirit to Dhall's existing support for importing raw text using as Text. In other words, you would be able to write ./foo.json as JSON to import a JSON file into Dhall), however there are still some open issues that need to be resolved.

For those who are in favor of this feature, the easiest way to drive this discussion is to discuss how Dhall should behave when importing the following JSON expressions which cover most of the corner cases that I'm aware of:

[ 1, null ]
[ 1, true ]
[ { "x": 1 }, { "x":1, "y":true } ]

For each of these imports, should Dhall:

  • reject the import?
  • accept the import with a type annotation (i.e. a sum type or Optional type)?
    • if so, what type annotation(s) would allow the import to succeed?
  • both of the above (i.e. be strict without a type annotation and more lenient with a type annotation)?
@aleator
Copy link

aleator commented Mar 27, 2018

My humble opinion about this is that Dhall should always require a type annotation, regardless of how 'guessable' the imported type is. The rationale is that even though your list stores only ints now it may end up with booleans later and it feels inappropriate to implicitly guess when importing files. Better would be to make dhall-guess-the-type-of-this-json-bit for this purpose

Also, I think that only implicit conversion that is sensible is to equate null & missing values to optional.
Thus [ 1, null ] could be importable using List (Optional Integer) and [ { "x": 1 }, { "x":1, "y":true } ] as List {x:Integer,y:Optional Bool} while [ 1, true ] would never be importable (atleast until dhall grows enough dt for the type to contain an interpretation function).

@Profpatsch
Copy link
Member

Profpatsch commented Mar 27, 2018

I don’t think the idea is to guess types. From #326:

./foo.json as JSON : List { name : Text, age : Natural }

@aleator
Copy link

aleator commented Mar 27, 2018

Well, since outlined options above are "reject" and "accept with type annotation" I thought that there would be a form where type annotation wasn't necessary ie. type would arise from imported data. Sorry about my confusion.

@Profpatsch
Copy link
Member

Profpatsch commented Mar 27, 2018

I think the last section was about how JSON should be typed in dhall by default (yet still with deterministic rules what gets which type).

For example the aforementioned

./foo.json as JSON : List { name : Text, age : Natural }

would accept [ { "name": "me", "age": 23 } ] but not
[ { "name": "me", "age": 23, "occupation": "programmer" }, { "name": "Mel" } ], while just

./foo.json as JSON

could accept the latter and would type it as List { name: Text, age: Optional Natural, occupation: Optional String }.

@Gabriel439 I personally think dynamically adding optionals would only make sense if Dhall can infer the needed fields from usage, which it can’t. Since everywhere else types are not optional (and can’t be inferred) I think this would break consistency (and maybe bring up the expectation that it infers from usage).

@Gabriella439
Copy link
Contributor Author

Yeah, my personal preference is for a mandatory type signature, too. I just didn't want to bias the discussion at the very beginning. My reasoning is that it would be very weird for this:

[ 1 ]

... to have an inferred type of List Integer, whereas this:

[ 1, null ]

... has an inferred type of List (Optional Integer). Adding an element to a list shouldn't change its type and (like other people mentioned) wouldn't be consistent with other design decisions in Dhall.

However, there is still the question of whether or not Dhall should allow importing this JSON:

[ 1, true ]

... using a type annotation with a sum type like this:

./foo.json as JSON : List < Left : Integer, Right : Bool >

The main downside of that proposal that I'm aware of is that you have to specify what happens if you start nesting sum types or if you have sum types with multiple constructors that wrap the same type. My inclination is to still reject that, but I just wanted to mention it because dhall-to-json does support this in the opposite direction:

$ dhall-to-json
    let Either = constructors < Left : Integer | Right : Bool >

in  [ Either.Left 1, Either.Right True ]
<Ctrl-D>
[ 1, true ]

@aleator
Copy link

aleator commented Mar 27, 2018

Well, I was referring to "typed by default" as guessing. I think there are two use cases for dhall-from-json:

  1. To get some static piece of data easily into dhall. This is, in my mind best served by external conversion program, which you run just once.

  2. You want to use different bits of json as input to a dhall program or the data you are importing changes sometimes. Now, in this case you probably want direct support in dhall. However, in this case, no single json snippet is going to be able to tell you what the exact shape of the (future) data will be, so the defaulting mechanism is probably not so hot idea.

As a final thought, how about adding import plugins (using similar scheme as in pandoc) to dhall? You would supply dhall with a program/script that can output dhall expressions and then import other bits of data through that script. For example, you could do something like:

echo './myCSV using CsvConverter {name:Text,Age:natural}' | dhall --plugin=CsvConverter 

This would allow testing different JSON import schemes or interacting with other more task specific data sources. Successful data providing plugins could be merged to dhall after they've seen some real world use. (This could also handle things like dhall-lang/dhall-haskell#292)

@Gabriella439
Copy link
Contributor Author

Yeah, I like the plugin idea, although I would prefer to do it through the Haskell API instead of the command line

@Profpatsch
Copy link
Member

Profpatsch commented Mar 28, 2018

Untagged unions should be different from sum types in my opinion.

I wouldn’t have expected dhall-to-json to throw the tags away to be honest.

@aleator
Copy link

aleator commented Mar 28, 2018

Command line vs. Haskell API depends on who you wish to write plugins. I would guess that today most dhall is consumed by Haskell programs and the plugin is easiest and safest to add there.

However, if you use dhall from command line a lot then you'll need to build your own binary. Not a problem for Haskell users but probably a bit of a hurdle for the rest.

@Gabriella439
Copy link
Contributor Author

Keep in mind that the long term goal of Dhall is language bindings other than Haskell. So ideally there would be a language binding in that user's preferred language that they could use to customize the import resolution behavior.

The main reason I want to avoid a plugin API is that then I have to standardize the semantics and interface for plugins and every Dhall implementation would need to support that standard plugin semantics.

Note that in the long run I don't want users to have to use any binaries at all. The integration with their preferred language should be through a library rather than a subprocess call to some dhall or dhall-to-json executable.

In other words: I agree with the goal that users shouldn't have to build their own binaries, but I believe that the correct solution to that goal is to finish standardizing import semantics in order to create more language bindings rather than make the binaries of one implementation hyper-customizable.

@aleator
Copy link

aleator commented Apr 12, 2018

I met an another case where having some kind of extended importing would be useful.

I'm using dhall to describe some course exercises. Now, some exercises are in want of bibliography links and all I have is a large bibtex file. In this case I converted the bibliography, partly and by hand, into dhall so I could import the required entries.

It would've been nice if I could've imported the .bib file directly. Doing the bib->dhall conversion means that the .bib file is no longer the primary data source and that I need to write a converter from dhall to bib to make use of the entries that I converted into dhall.

Perhaps extending the syntax so that import foo using <dhall-expression> is valid would be a start for doing something like this?

@marcb
Copy link

marcb commented Sep 30, 2018

I've been pondering this for a little while and I feel that as JSON should be typed but that a tool should exist to take a corpus of example / expected payloads should be able to provide a type- to ease utilisation of as JSON.

The tool could also possibly take two corpus' that reflect both valid and error JSON responses so allow for a union type to cover both circumstances...

@madjar
Copy link

madjar commented Nov 30, 2018

I've spent the afternoon making a toy json-to-dhall tool (https://gist.github.com/madjar/252c517644c0e13ef28a2a7ca71f5fa4). It's very prototypey code, and just supports most basic types, as well as optionals and dynamic maps (mapKey/mapValue).

The question is: if I want to transform this into something that's actually useful, where should it live:

  • Some external project?
  • As part of the dhall-json package?
  • In some other form?

@Gabriella439
Copy link
Contributor Author

@madjar: We want to add this to the language standard and once it's there then it will live in all implementations of the standard using the as JSON keyword (i.e. in the dhall-haskell project, for example). The first step is to review your code and see if that matches how people expect the as JSON feature to behave. I will try to review your code more closely tonight.

The key thing to emphasize is that the standardization process and agreeing upon the desired behavior is the bottleneck here because once it is standardized then I expect it will be pretty straightforward to update the implementation to match.

@madjar
Copy link

madjar commented Nov 30, 2018

@Gabriel439 If your review it closely, then I'll have to apologize for the quality. It was kind of rushed this afternoon. The approach I've take is the one describe in dhall-lang/dhall-json#23 (comment), under "Convert and type check together", which is to recursively traverse both the json Value and the dhall Expr, accepting only values that exactly match the given type.

Having this tool made the conversion of a json file and the definition of its dhall type quite nice, allowing to incrementally add the missing parts to the type definition while having quick feedback.

But I understand that you see this not as tooling, but as part of the language, thus requiring more standardization than "whatever the tool does". I'll familiarize myself with the processes of the project, then.

Thanks!

@Nadrieril
Copy link
Member

My opinion on this is that this would be extremely cool.
Regarding type annotations, here is what I had in mind: importing without a type annotation should be ok, because when we write [ "a", "b" ] in dhall we don't need a type annotation. So requiring one to import a similar bit of JSON seem unnatural. However, without a type annotation the type checker would be very strict and disallow any kind of mixing of types. Essentially, it would parse the JSON as it would a similar dhall expression.
This has a nice side-effect that doing echo "./data.json as JSON" | dhall type would give a type for the json payload.
Now when type annotations are added, the data can be more flexible, for example transforming nulls into Nones etc as mentioned above. But I feel this can be left for a later stage, since it would be rather more complex.

@ari-becker
Copy link
Collaborator

ari-becker commented Apr 8, 2019

I agree that a as JSON mechanism should take some kind of type definition as a parameter, instead of trying to magically generate a type from the parsed JSON. I think that such a notion is more "Dhall-ish", which is to say, that input should be type-checked instead of blindly trusted.

With that said, I don't think that the type inputs for as JSON should be statically defined, e.g. let Strings = List Text in ./data.json as (JSON Strings). I think that this inherently limits the value of an as JSON language feature due to the dynamic nature of much JSON output.

Consider, for instance, the dhall-terraform-output script which takes Terraform's JSON output and assembles both a type and a record from that output. Because the record keys are variable, it's not possible to define a Dhall type for arbitrary Terraform JSON output ahead of time (or rather, it is, but it would be fragile). However, this doesn't mean that Terraform's JSON output doesn't follow a predictable pattern, and ideally, upon parsing Terraform's JSON output, it would be best to verify that the JSON output fits that pattern, and possibly even get a type that fit the predicted pattern.

What's the best way to do that? I don't know. Maybe, instead of as (JSON Type), we have some kind of as (JSON (? -> Type)), the idea being that it takes a function that produces a type instead of a static type? I have a feeling that a solution would veer on dependent typing, which the language standard doesn't support (yet?), and that's a Pandora's box of its own.

Probably it would be best to start with as (JSON Type), be strict about accepted input, and do so with the explicit caveat that it's still a partial solution and doesn't expect to be universally useful for any kind of JSON input.

@alexanderkjeldaas
Copy link

I think requiring type annotations will make it mostly impossible to import large JSON structures like CloudFormation data.

@Gabriella439
Copy link
Contributor Author

@alexanderkjeldaas: Wouldn't they also fail to import without type annotations? Usually those kinds of JSON files mix records of various types

@antislava
Copy link

@alexanderkjeldaas : Please, try the (new) json-to-dhall tool (in https://github.com/dhall-lang/dhall-haskell) and share your experience/issues. The tool requires a type annotation (schema) but does support union types and should be able to handle situations when different types are "mixed".

@feliksik
Copy link

Json-to-dhall is great!

I'm not sure how 'done' it is considered to be? Does this unlock this issue, i.e. creating the syntax for the core language and command line dhall utility to import X as Json and import X as Yaml?

This would be great!

@feliksik
Copy link

The decoder idea sounds powerful, and seems a special case of this:

let myJson = import my.json as Text
let decodeJson : Text -> MyType = import ./json.dhalldecoder as decoder : Text -> MyType
let parsed : MyType = decodeJson myJson

Is that correct?

If this is the case, we can also solve the text manipulation issue view plugins/decoders:
#631

I'm sure there are some concerns here :-)

It is a powerful idea, but the decoding thing may possibly benefit from some more maturation.

How bad is it to have a syntactic sugar like import as X? Sure it's arbitrary what formats to import, but it seems like relatively simple to implement and support?

but how do you deal with the security challenges?

@joneshf
Copy link
Collaborator

joneshf commented Jul 17, 2019

I think the issue in question is #613. In particular, this comment.

@jneira
Copy link
Collaborator

jneira commented Jul 17, 2019

@feliksik: I would actually be fine baking in language support for JSON specifically instead of waiting for a more general solution. Dhall's future as a language is already intimately tied to its ability to displace JSON/YAML, so I think it's fine to special-case support for JSON

I think we could support custom encoders (including for json and yaml) and direct importing only from json and yaml due their special status.
For example, when the dhall executable will support dhall --to-json (see dhall-lang/dhall-haskell#1096) it would be a little bit weird to have:

/path/to/dhall <  import json as { path = "/path/to/dhall", args= "--from-json"}

@Gabriella439
Copy link
Contributor Author

The biggest issue is that an import could run an arbitrary executable. However, we could do something similar to the referential sanity check (i.e. only local imports can run executables, since they are trusted anyway). After all, we already trust local imports to send environment variables in custom headers (i.e. toMap { Authorization = "token ${env:GITHUB_TOKEN}" }). So I'm not too concerned about that, but I can see people complaining about it if they weren't already familiar with Dhall's threat model.

The second biggest issue is that relying on external executables complicates Dhall's distribution model (compared to native as JSON support). It's no worse than what we have today (since users currently have to separately install {json,yaml}-to-dhall) but it would be a much smoother user experience if JSON support was built into the language. In my experience, ease of distribution has a large impact on adoption rates.

@gregwebs
Copy link

gregwebs commented Sep 1, 2019

Take a look at how cue is doing this: https://cuelang.org/
They can work with existing YAML to check it or import it.

Whereas I found that although dhall-kubernetes is great, I am spending a massive amount of time converting just a single existing valid configuration from YAML to dhall. Solely due to this waste of time I will have to try using cue instead.

TypeScript took off I think in large part because of the ease of transition:

  • One can create separate type definition files and apply them to existing javascript libraries that do not come with definitions. You can type all the interfaces you use without changing the code.
  • Javascript files can be immediately imported into TypeScript (used as TypeScript files) and you can slowly add better type annotations

Ideally dhall would support not just importing but also applying type definitions to existing files. Either feature though would help work with the existing world.

If dhall is going to be designed to interoperate with the outside world then it does need to work with different formats. Plugins need to be able to produce a common data structure that preserve file location information so that users can get good error messages. It might be possible for now to tell users to convert their YAML, etc to JSON and that they won't get good information about the file location of their error.

@Gabriella439
Copy link
Contributor Author

@gregwebs: We're pretty close. We've had yaml-to-dhall and json-to-dhall for a while now and I think we've worked out most of the design for them. I think the main remaining step is upstreaming them into the language

@singpolyma
Copy link
Collaborator

@gregwebs Would neither of the yaml-to-dhall tools work for your use case if you're converting an existing file?

@gregwebs
Copy link

gregwebs commented Sep 1, 2019

Yes, in theory yaml-to-dhall will work for me, I didn't realize it was in dhall-json. That could make the process of using existing files go from hours to minutes!

In practice, however it doesn't actually work for dhall-kubernetes. This is because most of the K8s fields are actually optional. dhall-kuberntes is designed so that one will write definitions with the help of defaults so you will write:

defaults.Deployment //  { metadata = defaults.ObjectMeta }

However, yaml-to-dhall doesn't know about these defaults and complains about missing fields.

I think this is a separate issue reported here. But there is some relation here since dhall-kubernetes works fine when writing in dhall but is unable to import yaml. It seems like dhall-to-yaml needs a --defaults flag.

@Nadrieril
Copy link
Member

I think we all agree that ./foo.json as JSON : { x: Text, y: List Natural } is unambiguous in the absence of unions and Optionals. Should we standardize just that and decide later if we want to allow more features ? Or do we think the design space is unclear enough that we don't want to commit to even that ?

@Gabriella439
Copy link
Contributor Author

@Nadrieril: The ambiguity was not an issue for me. I think this is worth standardizing now

@sjakobi
Copy link
Collaborator

sjakobi commented Mar 22, 2020

./foo.json as JSON : { x: Text, y: List Natural }

The use of as here is a bit inconsistent with as Text and as Location. In those cases, the result is a actually of type Text or Location. With JSON we're just specifying the source format, but return a different type.

How about using e.g. from instead?

./foo.json from JSON : { x: Text, y: List Natural }

Or maybefromFormat?!

@sjakobi
Copy link
Collaborator

sjakobi commented Mar 22, 2020

Another idea:

./foo.json from JSON as { x: Text, y: List Natural }

@Gabriella439
Copy link
Contributor Author

@sjakobi: Yeah, I like from JSON as … syntax

@philandstuff
Copy link
Collaborator

I prefer

./foo.json from JSON : { x: Text, y: List Natural }

because using a colon to indicate “this thing has that type” is a well-established part of the language - type assertions, empty lists, empty merge, etc, whereas the existing meaning of as is something else (as started upthread).

@SiriusStarr
Copy link
Collaborator

I would love the ability to import from JSON and agree that the use of : to indicate type is probably the best option.

Would one have to fully specify the type of the JSON to import it, or could one specify only the desired structure (with anything else being polymorphic). E.g. if I have the JSON

{
  "firstName": "John",
  "lastName": "Smith",
  "age": 27,
}

but only care about the name fields, would this import allow you to do

./person.json from JSON : { firstName: Text, lastName: Text }

or would you have to do

./person.json from JSON : { firstName: Text, lastName: Text, age: Natural }

@Profpatsch
Copy link
Member

Profpatsch commented Mar 26, 2020

Would love to see ./foo.json as JSON : { … } soon (don’t introduce another keyword please, just require a type annotation like with empty lists).

@Gabriella439
Copy link
Contributor Author

I created a separate issue to track the idea of customizable parsers: #989

... since that's one way we might address this (by implementing JSON support within the language)

@Gabriella439
Copy link
Contributor Author

Gabriella439 commented Aug 20, 2020

One way we can make progress on this issue is to split it into two steps:

  • Standardize support for importing a JSON value as a plain Prelude.JSON.Type
  • Standardize a keyword that can convert Prelude.JSON.Type to a more strongly-typed representation

@mujx
Copy link

mujx commented Apr 19, 2021

@Gabriel439 If I understood correctly, the issue is blocked by the lack of an implementation / spec?

Seems like an easy way to move forward is to have something like the following (which was already suggested)

./file.json as Json : FileConfig

which doesn't introduce any new keyword nor it needs support for Prelude.JSON.Type since the import would be converted immediately with dhallFromJSON.

@Gabriella439
Copy link
Contributor Author

@mujx: Yeah, this requires a change to the standard since it cannot be implemented entirely within dhall-json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests