Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] XML DSLs #60

Open
ncannasse opened this issue Jun 5, 2019 · 20 comments

Comments

@ncannasse
Copy link
Member

commented Jun 5, 2019

Followup regarding my previous RFC #57

First, thank you for all the comments, after reading the whole discussion and taking time to think about it, I would like to update my proposal.

I agree it is hard to have both the goal of "Universal" block strings and still support XML syntax correctly. Some also mentioned that one of the interesting parts about the feature were some kind of IDE support for syntax highlight and autocomplete, so having the feature too abstract would prevent this.

I still think that some kind of XML DSL syntactic support is - if not absolutely needed - at least interesting enough to have in Haxe. HTML is here to stay for a long time and the XML document model that comes with it can be used for many different other applications as well outside of web client code.

I also think we should not support a single particular DSL such as JSX which spec can evolve and change in unexpected ways, or be entirely deprecated by another alternative syntax - whatever the trending framework happens to be in JS world.

So here's a revised syntax proposal that is trying to ensure that most XML based DSLs will be supported, while still trying to make the strict minimal assumptions about the DSL syntax.

An XML DSL node would be in the form:

<nodename CODE?>

Where CODE (optional) is explained below.

We would then try match with the corresponding closing XML DSL node in the form:

</nodename CODE?>

And we would allow self-closing nodes in the form:

<nodename CODE?/>

CODE section can be anything, expect the > character.

But this creates some invalid syntaxes, for instance the following, because of the comparison inside CODE section:

var x = <node value="${if( a < b ) 0 else 1 }"/>

So I propose that CODE additionally check for opening/closing curly braces {} and ignore their whole content. So for instance the following would be perfectly valid:

var x = <yaml {
   some yaml code with balanced {}
} />;

We could additionally treat \{ as escape sequence for the following case:

var x = <node value="\{"/>

I think this version of the syntax gives enough flexibility with minimal assumptions. Using curly braces is done in order to ensure that reentrency is fully supported so any Haxe code within the DSL will be correctly handled.

@RealyUniqueName

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

So, if I need some CODE in an opening tag, then I have to duplicate it in a closing tag?
Something like this?

var x = <div style="color:red"> <h1>Hello, world!</h1> </div style="color:red">;
@kLabz

This comment has been minimized.

Copy link

commented Jun 5, 2019

So I propose that CODE additionally check for opening/closing curly braces {} and ignore their whole content. So for instance the following would be perfectly valid:
We could additionally treat { as escape sequence for the following case:

Would doing the same for double quotes too be a problem?

@Aurel300

This comment has been minimized.

Copy link

commented Jun 5, 2019

@RealyUniqueName

So, if I need some CODE in an opening tag, then I have to duplicate it in a closing tag?

I think the idea is that the optional CODE bit can be whatever in both tags, i.e. it can be different, or entirely omitted in the closing tag. Correct me if I'm wrong @ncannasse .

@ncannasse

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

@Aurel300 correct
I have fixed the yaml example typo.

@skial skial referenced this issue Jun 6, 2019

Closed

Haxe Roundup 482 #624

1 of 1 task complete
@markknol

This comment has been minimized.

Copy link
Member

commented Jul 18, 2019

I am using coconut quite a lot lately for my project, I have to say don't use XML syntax at all, but just the function render() '<div>${content}</div>' and that actually works surprising good, and has nice highlighting. I would almost say it's good enough.

So if I had a vote then I would propose an alternative (it's not something I invented but comes from @back2dos or @kevinresol), which is my backtick proposal, but with n=1 (n being the number of delimiting backticks). so that would allow things like this:

  • var x = `any dsl here`
  • var x = ``any dsl here`` valid too
  • var x = ````any dsl here```` valid too
  • var x = `any dsl here`` invalid, unbalanced backticks
  • var x = `welcome at "${name}"'s website` supports string interpolation, no quote escaping needed
  • var xml = ```<xml>test</xml>``` could be
    processed as xml, or by coconut, or by heaps or ..
  • var xml = `<img class="active"/>` support selfclosing tags
  • static var shader = @:hxsl `hxsl { my shader code }`
  • ` > <<< > >> >>'">>>` allow unbalanced braces/quotes etc
  • ```markdown # hello world! ``` could be processed as markdown.
  • `` <nested>`expr`</nested> `` where the nested expr can be processed separately too. That's where unlimited amount of delimiters comes in nicely.

This is fairly clean in my opinion, and has room for many applications, and less weird than the proposed one or the existing inline markup. Or else I'd like to keep the current inline markup, and be cool that it doesn't allow self closing tags, which I think is fair compromise too.

@ncannasse

This comment has been minimized.

Copy link
Member Author

commented Jul 18, 2019

@markknol this proposal does not deal well with reentrency : increasing the number of backticks at every level everytime you need an extra depth seems very bad design.

@back2dos

This comment has been minimized.

Copy link
Member

commented Jul 18, 2019

As noted multiple times on slack, I think we're trying to cover two very loosely related problem domains via a single language feature. It was a pretty bad terrible idea from my side and I apologize for not having seen that up front.

There are two things that we're after (everyone to a varying degree):

  1. an XML-ish syntax for the purposes of declaring UI (or in fact directly embedding and XML-based DSL or even XML/HTML itself)
  2. block strings of sorts, that allow embedding arbitrary code into Haxe

XML-based UI markup DSL

The proper solution here is to have a well-defined grammar ... whatever the specifics. I think UI development is an important enough use case to warrant first class support. That comes with proper syntax highlighting, which presupposes proper syntax. It also requires proper auto-completion, which requires specific insertion points, which again presupposes proper syntax.

If what it takes to have this is to use domkit's markup as is, then that's still a 100 times better then some wacky "universal" solution. Haxe-style comment support aside, it's practically a superset of JSX anyway (ignoring the fact that the code inside is Haxe and not JS).

I absolutely agree that we shouldn't tie ourselves to some spec controlled by an isolated group (as it would be in the case of JSX). We should come up with something properly designed. If it can lean on universally understood syntax, that's great. And if trivial changes allow our spec to cover a little more ground, they're worth considering. Example: allowing - and : and @ in attribute names, would give library authors to support XML namespaces, data/aria HTML attributes and Vue/Angular directives. Whether such concession are worth the trouble needs to be decided on a case-by-case basis.

However, so far Nicolas has squarely rejected the notion of any well-defined syntax. At the same time, with the recent improvements in string interpolation and with a plugin for proper highlighting single quoted strings give a much better developer experience. I would thus propose to either:

  • agree that a well-defined syntax is a good thing
  • remove inline markup from Haxe 4, so that when we finally agree that doing better than Notepad is a worthwhile endeavor, we can add this back with a well-defined syntax without breaking tons of code

Block strings

I'm not sure I have a very strong opinion on this one, because I don't see overwhelmingly many use cases here. I see even less that would require injecting Haxe code and then reentering into an absolutely foreign language again.

For this reason, I'd go with what Mark mentioned, although I'm leaning towards not supporting interpolation out of the box. It's trivial to shove the string into MacroStringTools.formatString if so desired.

The main advantage is that then you can embed ANY string into Haxe code (e.g. PHP code, which has $ident all over the place) without having to modify it / do any kind of escaping. If something in the string collides with the delimiter, you just add more backticks to the delimiter. You do not have to change the original string.

Being able to just dump verbatim (non-escaped whatsoever) target language source code into Haxe is a nice use case (at least in my esteem). Same goes for text assets, scripts or whatever.

I'm eager to see a use case that requires all this reentrancy dance. And no, UI markup doesn't count, because as I've stated that deserves its own syntax ;)

@kevinresol

This comment has been minimized.

Copy link

commented Jul 19, 2019

  • agree that a well-defined syntax is a good thing
  • remove inline markup from Haxe 4, so that when we finally agree that doing better than Notepad is a worthwhile endeavor, we can add this back with a well-defined syntax without breaking tons of code

Shit I have a lot of code using inline markup already. So I prefer the first option.

@kLabz

This comment has been minimized.

Copy link

commented Jul 19, 2019

I have a lot too (and keep finding bugs, latest being HaxeFoundation/haxe#8565 which should still apply to this RFC) but I'd prefer one of those two solutions to this RFC.

@szczepanpp

This comment has been minimized.

Copy link

commented Jul 19, 2019

this proposal does not deal well with reentrency

Shouldn't DSL processing macro code handle interpolation and deal with reentrancy? This feature is very context/domain specific, e.g. in JSX it makes a difference whether you use interpolation where element is expected or when it's just a string value in an element attribute, while with current string interpolation everything is plain and simple - whatever gets produced by the interpolated variable or piece of code will always end up getting Std.string-ed and reentrancy is only allowed in nested strings.

The implication is that DSL could define its own interpolation / reentrancy escaping mechanism.

@Aurel300

This comment has been minimized.

Copy link

commented Jul 19, 2019

@szczepanpp as was said in one of the previous inline markup discussions, re-entrancy is a parser problem, not an application problem – it cannot be solved with macros interpreting the code, since markup lexing must happen a long time before an AST even exists.

A specific case where syntax that has the same opening and ending tags is problematic:

var x = `here is some markup, and interpolation: ${someFunction(`more markup!`)}`;

The intention is clearly to parse this as:

var x = "here is some markup, and interpolation: " + someFunction("more markup!");

But to the parser it is:

var x = "here is some markup, and interpolation: ${someFunction(" more markup! ")}";

Triggering an unexpected token/identifier more after the first string. Without default semantics for the DSL syntax, the parser cannot make assumptions about syntax like ${...} – is it part of the DSL? Should it be treated as Haxe code? Hence to nest this properly, you would need an increasing number of backticks for each layer, as @ncannasse said:

var x = ``here is some markup, and interpolation: ${someFunction(`more markup!`)}``;

I did propose parser-level macros before, but I think that would require huge changes to the compiler.

@ncannasse

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2019

There are two different things here:

  • block strings : I'm not very sure it's useful to have not-reentrant block strings. Because you can already have them either in Strings (with interpolation) or in separate files (using macros to embed in Haxe code if necessary)

  • xml dsls : as I said several times already, xml dsls are a good addition that can cover many cases (jsx, but not only). We "only" need to decide for a minimal subset of syntax. One (original implementation) does not allow self-closing nodes. The current one (at the top of this thread) allows for them, but introduces some semantics to curly braces {} - in nodes attributes part only.

So let's please focus on the topic. We will not drop the feature from Haxe 4, and I doubt we will find something that everyone agrees 100%, so please only comment with either alternate solutions in the specified domain (which is Xml-based reentrant DSL) or with issues that this proposal might have overlooked.

EDIT : Please consider that I will ignore any thumb down that does not come with an actual clearly expressed point of view. Language design is not a Facebook context.

@kevinresol

This comment has been minimized.

Copy link

commented Jul 19, 2019

With semantics (compiler is aware of attributes and Haxe code inside attributes):

var x = <div onclick={() -> trace("{")}/>

vs "minimal assumption":

var x = <div onclick={() -> trace("\{")}/>

or even:

var x = <div onclick={() -> trace("/>")}/>

I prefer the first one.

@kLabz

This comment has been minimized.

Copy link

commented Jul 19, 2019

We "only" need to decide for a minimal subset of syntax

That's what @back2dos is proposing with a parser that can support a wide range of xml-ish possibilities, but with a proper AST.

@ncannasse

This comment has been minimized.

Copy link
Member Author

commented Jul 19, 2019

@kevinresol it's not about "prefering". Of course I would also "prefer" not to have to escape the unbalanced curly braces in strings within reentrency syntax.

But you should consider:
a) how rare this is actually going to happen
b) vs how much of a "precise" syntax will constraint the potential usages

Designing a language is not about making things perfect for a given usage, but more about making sure that the each additional feature gives a large amount of possibilities to the developer. That's how I came up with macros in the first place (among other examples).

@kLabz yes, just my suggestion is to keep the syntax to very strict minimum. This still allows for some syntax highlighting. Completion can be provided by macros already, as we do for some strings already.

@kevinresol

This comment has been minimized.

Copy link

commented Jul 19, 2019

It is all about balance and compromises.

You always mentioned the design of the macro system was a "minimal" one.
But remember you chose to design it with Haxe semantics. With today's standard the macro system should have been designed to simply dump everything after the macro keyword as raw bytes to the macro API. (perhaps plus some end delimiters and some escaping mechanisms)

The example was to demonstrate that I prefer @back2dos's minimum than yours.

@kLabz

This comment has been minimized.

Copy link

commented Jul 19, 2019

b) vs how much of a "precise" syntax will constraint the potential usages

Do you have anything in mind? I genuinely cannot think of one.

@RealyUniqueName

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

I still think backticks is our best option here. And I still don't understand how does our parser work, so I'm not sure if following is possible, but...

What if we allow increasing amount of backticks on re-entrance?
Most of the time (actually, almost all the time, I believe) it will look like this:

var x = `<whatever anything="{" otherthing="/>" />`;

In some rare cases one would need a reentrancy and then it will look like this:

var x = `my-dsl ``sub-dsl`` awesome-job`;

I mean if parser spots a different amount of sequential backticks, that should be considered an "opening tag" for a nested markup literal.
In this case if you need to add more nested literals, you don't have to edit all the delimiters.

In an extremely rare case (honestly, I don't remember myself in need of the third level of strings in string interpolation for example):

var x =  `my-dsl ``<sub>```sub-sub dsl```</sub>`` awesome-job`;

or maybe even (if possible to implement)

var x =  `my-dsl ``<sub>`sub-sub dsl`</sub>`` awesome-job`;

Yes, that doesn't look pretty, But I don't believe that will happen more than once a year :)

And as mentioned in previous discussions it's up to DSL developer to handle any Haxe code injections in their DSL.
E.g. for this sample

var jsx = `<div onclick={() -> trace("{")}/>`;

macro developer should manually detect curlies and pass their content to Context.parseString()

This proposal requires two escape sequences - \` and \\ (if immediately followed by backtick). But backtick is quite a rare character.

@PlopUser

This comment has been minimized.

Copy link

commented Aug 10, 2019

Tree-like structures should rely on a proper syntax fully integrated to common Haxe specification. Tree-like structure are not just relative to XML, a processor should be used to target XML, or another tree like structure. Why choose XML DSL syntax when you could target more types of tree like structure. With a proper syntax, then we are not stack to XML limited syntax but open to enhancement of haxe syntax itself.

Integrating XML syntax directly mixed with common Haxe language put user on a Frankestein paradigma... not pure XML not pure Haxe syntax.

Ceylon language has a good example of clear syntax tree structure : see below an extract of the spec.
Ceylon is a hightly typed language, having variadic parameter (value parameter that accepts multiple arguments), named argument list, type union and intersection, inference...

Constructor call use a syntax free of new keyword.

1.3.6. Named arguments and tree-like structures

Ceylon's named argument lists provide an elegant means of initializing objects and collections. The goal of this facility is to replace the use of XML for expressing hierarchical structures such as documents, user interfaces, configuration and serialized data.

//with a bit a haxe transcription and a little modification to named argument call "({" -> "{" 

var page:Html = Html {    // constructor call with named argument list and inference 
    doctype = html5;
    Head { title = "Haxe home page"; };    // constructor call with named argument list
    Body {              
        H2 ( 'Welcome to Haxe version : ${config.version}! ' ),   // regular constructor call
        P ( "Now get your code on :)" )
    };
}

print( page.toXML() );
print( page.toObject() );
print( page.myCustomFormat() );

ceylon stuctured data
ceylon variadic parameters

@djaonourside

This comment has been minimized.

Copy link

commented Aug 16, 2019

@PlopUser you can already use the similar aproach in haxe. Although it's a little bit redundant.

new Html({ 
	doctype: "html5",
	children: 
	[
		new Head({
			title: "Haxe home page"
		}),
		new Body({
			children:
			[
				new H2('Welcome to Haxe version : ${config.version}!'),
				new P("Now get your code on")
			]
		})
	]
});  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.