Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] XML DSLs #60

Open
ncannasse opened this issue Jun 5, 2019 · 28 comments
Open

[RFC] XML DSLs #60

ncannasse opened this issue Jun 5, 2019 · 28 comments

Comments

@ncannasse
Copy link
Member

ncannasse commented Jun 5, 2019

Followup regarding my previous RFC #57

First, thank you for all the comments, after reading the whole discussion and taking time to think about it, I would like to update my proposal.

I agree it is hard to have both the goal of "Universal" block strings and still support XML syntax correctly. Some also mentioned that one of the interesting parts about the feature were some kind of IDE support for syntax highlight and autocomplete, so having the feature too abstract would prevent this.

I still think that some kind of XML DSL syntactic support is - if not absolutely needed - at least interesting enough to have in Haxe. HTML is here to stay for a long time and the XML document model that comes with it can be used for many different other applications as well outside of web client code.

I also think we should not support a single particular DSL such as JSX which spec can evolve and change in unexpected ways, or be entirely deprecated by another alternative syntax - whatever the trending framework happens to be in JS world.

So here's a revised syntax proposal that is trying to ensure that most XML based DSLs will be supported, while still trying to make the strict minimal assumptions about the DSL syntax.

An XML DSL node would be in the form:

<nodename CODE?>

Where CODE (optional) is explained below.

We would then try match with the corresponding closing XML DSL node in the form:

</nodename CODE?>

And we would allow self-closing nodes in the form:

<nodename CODE?/>

CODE section can be anything, expect the > character.

But this creates some invalid syntaxes, for instance the following, because of the comparison inside CODE section:

var x = <node value="${if( a < b ) 0 else 1 }"/>

So I propose that CODE additionally check for opening/closing curly braces {} and ignore their whole content. So for instance the following would be perfectly valid:

var x = <yaml {
   some yaml code with balanced {}
} />;

We could additionally treat \{ as escape sequence for the following case:

var x = <node value="\{"/>

I think this version of the syntax gives enough flexibility with minimal assumptions. Using curly braces is done in order to ensure that reentrency is fully supported so any Haxe code within the DSL will be correctly handled.

@RealyUniqueName
Copy link
Member

So, if I need some CODE in an opening tag, then I have to duplicate it in a closing tag?
Something like this?

var x = <div style="color:red"> <h1>Hello, world!</h1> </div style="color:red">;

@kLabz
Copy link

kLabz commented Jun 5, 2019

So I propose that CODE additionally check for opening/closing curly braces {} and ignore their whole content. So for instance the following would be perfectly valid:
We could additionally treat { as escape sequence for the following case:

Would doing the same for double quotes too be a problem?

@Aurel300
Copy link
Member

Aurel300 commented Jun 5, 2019

@RealyUniqueName

So, if I need some CODE in an opening tag, then I have to duplicate it in a closing tag?

I think the idea is that the optional CODE bit can be whatever in both tags, i.e. it can be different, or entirely omitted in the closing tag. Correct me if I'm wrong @ncannasse .

@ncannasse
Copy link
Member Author

@Aurel300 correct
I have fixed the yaml example typo.

@markknol
Copy link
Member

markknol commented Jul 18, 2019

I am using coconut quite a lot lately for my project, I have to say don't use XML syntax at all, but just the function render() '<div>${content}</div>' and that actually works surprising good, and has nice highlighting. I would almost say it's good enough.

So if I had a vote then I would propose an alternative (it's not something I invented but comes from @back2dos or @kevinresol), which is my backtick proposal, but with n=1 (n being the number of delimiting backticks). so that would allow things like this:

  • var x = `any dsl here`
  • var x = ``any dsl here`` valid too
  • var x = ````any dsl here```` valid too
  • var x = `any dsl here`` invalid, unbalanced backticks
  • var x = `welcome at "${name}"'s website` supports string interpolation, no quote escaping needed
  • var xml = ```<xml>test</xml>``` could be
    processed as xml, or by coconut, or by heaps or ..
  • var xml = `<img class="active"/>` support selfclosing tags
  • static var shader = @:hxsl `hxsl { my shader code }`
  • ` > <<< > >> >>'">>>` allow unbalanced braces/quotes etc
  • ```markdown # hello world! ``` could be processed as markdown.
  • `` <nested>`expr`</nested> `` where the nested expr can be processed separately too. That's where unlimited amount of delimiters comes in nicely.

This is fairly clean in my opinion, and has room for many applications, and less weird than the proposed one or the existing inline markup. Or else I'd like to keep the current inline markup, and be cool that it doesn't allow self closing tags, which I think is fair compromise too.

@ncannasse
Copy link
Member Author

@markknol this proposal does not deal well with reentrency : increasing the number of backticks at every level everytime you need an extra depth seems very bad design.

@back2dos
Copy link
Member

As noted multiple times on slack, I think we're trying to cover two very loosely related problem domains via a single language feature. It was a pretty bad terrible idea from my side and I apologize for not having seen that up front.

There are two things that we're after (everyone to a varying degree):

  1. an XML-ish syntax for the purposes of declaring UI (or in fact directly embedding and XML-based DSL or even XML/HTML itself)
  2. block strings of sorts, that allow embedding arbitrary code into Haxe

XML-based UI markup DSL

The proper solution here is to have a well-defined grammar ... whatever the specifics. I think UI development is an important enough use case to warrant first class support. That comes with proper syntax highlighting, which presupposes proper syntax. It also requires proper auto-completion, which requires specific insertion points, which again presupposes proper syntax.

If what it takes to have this is to use domkit's markup as is, then that's still a 100 times better then some wacky "universal" solution. Haxe-style comment support aside, it's practically a superset of JSX anyway (ignoring the fact that the code inside is Haxe and not JS).

I absolutely agree that we shouldn't tie ourselves to some spec controlled by an isolated group (as it would be in the case of JSX). We should come up with something properly designed. If it can lean on universally understood syntax, that's great. And if trivial changes allow our spec to cover a little more ground, they're worth considering. Example: allowing - and : and @ in attribute names, would give library authors to support XML namespaces, data/aria HTML attributes and Vue/Angular directives. Whether such concession are worth the trouble needs to be decided on a case-by-case basis.

However, so far Nicolas has squarely rejected the notion of any well-defined syntax. At the same time, with the recent improvements in string interpolation and with a plugin for proper highlighting single quoted strings give a much better developer experience. I would thus propose to either:

  • agree that a well-defined syntax is a good thing
  • remove inline markup from Haxe 4, so that when we finally agree that doing better than Notepad is a worthwhile endeavor, we can add this back with a well-defined syntax without breaking tons of code

Block strings

I'm not sure I have a very strong opinion on this one, because I don't see overwhelmingly many use cases here. I see even less that would require injecting Haxe code and then reentering into an absolutely foreign language again.

For this reason, I'd go with what Mark mentioned, although I'm leaning towards not supporting interpolation out of the box. It's trivial to shove the string into MacroStringTools.formatString if so desired.

The main advantage is that then you can embed ANY string into Haxe code (e.g. PHP code, which has $ident all over the place) without having to modify it / do any kind of escaping. If something in the string collides with the delimiter, you just add more backticks to the delimiter. You do not have to change the original string.

Being able to just dump verbatim (non-escaped whatsoever) target language source code into Haxe is a nice use case (at least in my esteem). Same goes for text assets, scripts or whatever.

I'm eager to see a use case that requires all this reentrancy dance. And no, UI markup doesn't count, because as I've stated that deserves its own syntax ;)

@kevinresol
Copy link

  • agree that a well-defined syntax is a good thing
  • remove inline markup from Haxe 4, so that when we finally agree that doing better than Notepad is a worthwhile endeavor, we can add this back with a well-defined syntax without breaking tons of code

Shit I have a lot of code using inline markup already. So I prefer the first option.

@kLabz
Copy link

kLabz commented Jul 19, 2019

I have a lot too (and keep finding bugs, latest being HaxeFoundation/haxe#8565 which should still apply to this RFC) but I'd prefer one of those two solutions to this RFC.

@piotrpawelczyk
Copy link

this proposal does not deal well with reentrency

Shouldn't DSL processing macro code handle interpolation and deal with reentrancy? This feature is very context/domain specific, e.g. in JSX it makes a difference whether you use interpolation where element is expected or when it's just a string value in an element attribute, while with current string interpolation everything is plain and simple - whatever gets produced by the interpolated variable or piece of code will always end up getting Std.string-ed and reentrancy is only allowed in nested strings.

The implication is that DSL could define its own interpolation / reentrancy escaping mechanism.

@Aurel300
Copy link
Member

@szczepanpp as was said in one of the previous inline markup discussions, re-entrancy is a parser problem, not an application problem – it cannot be solved with macros interpreting the code, since markup lexing must happen a long time before an AST even exists.

A specific case where syntax that has the same opening and ending tags is problematic:

var x = `here is some markup, and interpolation: ${someFunction(`more markup!`)}`;

The intention is clearly to parse this as:

var x = "here is some markup, and interpolation: " + someFunction("more markup!");

But to the parser it is:

var x = "here is some markup, and interpolation: ${someFunction(" more markup! ")}";

Triggering an unexpected token/identifier more after the first string. Without default semantics for the DSL syntax, the parser cannot make assumptions about syntax like ${...} – is it part of the DSL? Should it be treated as Haxe code? Hence to nest this properly, you would need an increasing number of backticks for each layer, as @ncannasse said:

var x = ``here is some markup, and interpolation: ${someFunction(`more markup!`)}``;

I did propose parser-level macros before, but I think that would require huge changes to the compiler.

@ncannasse
Copy link
Member Author

ncannasse commented Jul 19, 2019

There are two different things here:

  • block strings : I'm not very sure it's useful to have not-reentrant block strings. Because you can already have them either in Strings (with interpolation) or in separate files (using macros to embed in Haxe code if necessary)

  • xml dsls : as I said several times already, xml dsls are a good addition that can cover many cases (jsx, but not only). We "only" need to decide for a minimal subset of syntax. One (original implementation) does not allow self-closing nodes. The current one (at the top of this thread) allows for them, but introduces some semantics to curly braces {} - in nodes attributes part only.

So let's please focus on the topic. We will not drop the feature from Haxe 4, and I doubt we will find something that everyone agrees 100%, so please only comment with either alternate solutions in the specified domain (which is Xml-based reentrant DSL) or with issues that this proposal might have overlooked.

EDIT : Please consider that I will ignore any thumb down that does not come with an actual clearly expressed point of view. Language design is not a Facebook context.

@kevinresol
Copy link

kevinresol commented Jul 19, 2019

With semantics (compiler is aware of attributes and Haxe code inside attributes):

var x = <div onclick={() -> trace("{")}/>

vs "minimal assumption":

var x = <div onclick={() -> trace("\{")}/>

or even:

var x = <div onclick={() -> trace("/>")}/>

I prefer the first one.

@kLabz
Copy link

kLabz commented Jul 19, 2019

We "only" need to decide for a minimal subset of syntax

That's what @back2dos is proposing with a parser that can support a wide range of xml-ish possibilities, but with a proper AST.

@ncannasse
Copy link
Member Author

@kevinresol it's not about "prefering". Of course I would also "prefer" not to have to escape the unbalanced curly braces in strings within reentrency syntax.

But you should consider:
a) how rare this is actually going to happen
b) vs how much of a "precise" syntax will constraint the potential usages

Designing a language is not about making things perfect for a given usage, but more about making sure that the each additional feature gives a large amount of possibilities to the developer. That's how I came up with macros in the first place (among other examples).

@kLabz yes, just my suggestion is to keep the syntax to very strict minimum. This still allows for some syntax highlighting. Completion can be provided by macros already, as we do for some strings already.

@kevinresol
Copy link

kevinresol commented Jul 19, 2019

It is all about balance and compromises.

You always mentioned the design of the macro system was a "minimal" one.
But remember you chose to design it with Haxe semantics. With today's standard the macro system should have been designed to simply dump everything after the macro keyword as raw bytes to the macro API. (perhaps plus some end delimiters and some escaping mechanisms)

The example was to demonstrate that I prefer @back2dos's minimum than yours.

@kLabz
Copy link

kLabz commented Jul 19, 2019

b) vs how much of a "precise" syntax will constraint the potential usages

Do you have anything in mind? I genuinely cannot think of one.

@RealyUniqueName
Copy link
Member

RealyUniqueName commented Jul 19, 2019

I still think backticks is our best option here. And I still don't understand how does our parser work, so I'm not sure if following is possible, but...

What if we allow increasing amount of backticks on re-entrance?
Most of the time (actually, almost all the time, I believe) it will look like this:

var x = `<whatever anything="{" otherthing="/>" />`;

In some rare cases one would need a reentrancy and then it will look like this:

var x = `my-dsl ``sub-dsl`` awesome-job`;

I mean if parser spots a different amount of sequential backticks, that should be considered an "opening tag" for a nested markup literal.
In this case if you need to add more nested literals, you don't have to edit all the delimiters.

In an extremely rare case (honestly, I don't remember myself in need of the third level of strings in string interpolation for example):

var x =  `my-dsl ``<sub>```sub-sub dsl```</sub>`` awesome-job`;

or maybe even (if possible to implement)

var x =  `my-dsl ``<sub>`sub-sub dsl`</sub>`` awesome-job`;

Yes, that doesn't look pretty, But I don't believe that will happen more than once a year :)

And as mentioned in previous discussions it's up to DSL developer to handle any Haxe code injections in their DSL.
E.g. for this sample

var jsx = `<div onclick={() -> trace("{")}/>`;

macro developer should manually detect curlies and pass their content to Context.parseString()

This proposal requires two escape sequences - \` and \\ (if immediately followed by backtick). But backtick is quite a rare character.

@PlopUser
Copy link

Tree-like structures should rely on a proper syntax fully integrated to common Haxe specification. Tree-like structure are not just relative to XML, a processor should be used to target XML, or another tree like structure. Why choose XML DSL syntax when you could target more types of tree like structure. With a proper syntax, then we are not stack to XML limited syntax but open to enhancement of haxe syntax itself.

Integrating XML syntax directly mixed with common Haxe language put user on a Frankestein paradigma... not pure XML not pure Haxe syntax.

Ceylon language has a good example of clear syntax tree structure : see below an extract of the spec.
Ceylon is a hightly typed language, having variadic parameter (value parameter that accepts multiple arguments), named argument list, type union and intersection, inference...

Constructor call use a syntax free of new keyword.

1.3.6. Named arguments and tree-like structures

Ceylon's named argument lists provide an elegant means of initializing objects and collections. The goal of this facility is to replace the use of XML for expressing hierarchical structures such as documents, user interfaces, configuration and serialized data.

//with a bit a haxe transcription and a little modification to named argument call "({" -> "{" 

var page:Html = Html {    // constructor call with named argument list and inference 
    doctype = html5;
    Head { title = "Haxe home page"; };    // constructor call with named argument list
    Body {              
        H2 ( 'Welcome to Haxe version : ${config.version}! ' ),   // regular constructor call
        P ( "Now get your code on :)" )
    };
}

print( page.toXML() );
print( page.toObject() );
print( page.myCustomFormat() );

ceylon stuctured data
ceylon variadic parameters

@djaonourside
Copy link

@PlopUser you can already use the similar aproach in haxe. Although it's a little bit redundant.

new Html({ 
	doctype: "html5",
	children: 
	[
		new Head({
			title: "Haxe home page"
		}),
		new Body({
			children:
			[
				new H2('Welcome to Haxe version : ${config.version}!'),
				new P("Now get your code on")
			]
		})
	]
});  

@fullofcaffeine
Copy link

fullofcaffeine commented Aug 21, 2019

@djaonourside I don't like the over-usage of new there, it's too verbose and pollutes the code. Might be better to just use wrapper functions in a class that is added to the current context through using or model the data structure using recursive Enums.

EDIT: Here's a good example: https://github.com/ciscoheat/mithril-hx#implement-the-mithril-interface.

@longde123
Copy link

<nodename CODE?> is a strict minimal assumptions about the DSL syntax.

@longde123
Copy link

longde123 commented Sep 26, 2019

<nodename CODE?> mean  two more questions.
1     <node > CODE?</node>  is unavailable 
 eg   <node > sdfafs"${x }  </node>
2    loop  is unavailable     <node  {for x in  datas}>    <node value="${x }">  </node> 

1 possible choices won't support it
2 possible choices dsl <node for =" x | $ { datas}"> <node value="${x }"> </node>

@farteryhr
Copy link

farteryhr commented Feb 10, 2020

some random idea (or keyboard rolling)..

``this {rules:"all", content-type:"nonsense"}``
    `outer {onclick:function(){}, field:expr, xml-attr:"fieldname", style:{ css-attr: SomeHolyCss(a,b,c) }}`
        text
        `inner`
        `/
        `self-closing {maybe:"some other way?"}/`
        `$variable
        `${[
            back_to_haxe,
            `nest` test $interp `/
        ]}
        ``${interpolation.from.the.rules.all.level}
    `/
``/

"longer surrounding goes outside" is inspired from lua's long bracket.

used to imagine some type of extended json that look like

`"tag"{"nattr":1,"sattr":"wow"}[1,2,3,`"nested"{}[]] //whatever leading symbol as the tag

for being somehow isomorphic to xml.

honestly not in favor of < and > that puts parser in danger. instead make good use of the only `.
but you're still able to embed xml inside it.

`xml`
<div style=`${{some-holy-css: SomeHolyCss(a,b,c)}}>`${someText.someText()}</div>
`/

you may regard `/ `$ `word `(start of DSL, seems it must be followed by a space) (of the largest count of backticks, the outermost level) as token?

alternative (forces braces, no space required, so self-closing (no gain on character count)):

`tag{}content`nest{}`/`/`tag{}`nest{}`/content`/

possibility of `// as inline comment?

@grepsuzette
Copy link

In response to RealyUniqueName message from Jul 19, 2019:

I still think backticks is our best option here.
In some rare cases one would need a reentrancy and then it will look like this:
var x = my-dsl ``sub-dsl`` awesome-job;

I love the backticks solution but as Nicolas said, it may be well-suited for libs but not for Haxe itself.

Is there another way to enable reentrancy?
Actually I think there is:

var x = `my-dsl ${`sub-dsl a b c`} awesome-job`;
var y = `my-dsl ${`sub-dsl a ${`sub-sub-dsl x y z`} c`} awesome-job`;
  • Using this approach you can get as deep as you want.
  • It becomes quite lispy too (a recursive tree like (dsl arg1 arg2 ...)).

The downside? If you have ${ that you don't want to be compiled,
they need to be escaped:

var dirty_example_needing_escape = ```kask
    // "kask" is a competitor of "haxe", predating its syntax while adding nothing.
    // Here is an example:
    var s = 'Current time is \${Date.now().toString()}';
```;

In that example if the escape is omitted, the resulting string will have the current date embedded. For the rare cases this happens and you were actually not writing some Haxe code, inspecting the '${' within the blockstring is a lot more straightforward than having to count the backquotes and check their balancement, because it just becomes, "is this supposed to be compiled"? Rather than "where the hell does that end and begin"?

Xml or markdown can be used with it too as in markKnol example:

function render() `<Box><input type="text" value="don't click"></Box>`;
var mardown = ```markdown
# Hello universe

This is my **blog** I started in the stellar year *2020*.
```;

I don't think this prevents having the XML DSL syntax later, but as far as block quotes go, it seems both solid and simple to me.

@grepsuzette
Copy link

Hum might be that this was already answered by @Aurel300 after reading more carefully. Can you confirm?

@iongion
Copy link

iongion commented Mar 7, 2020

Could this be a bit broader than just XML, I am always amazed how beautifully LINQ is supported in C#

class LINQQueryExpressions
{
    static void Main()
    {
        
        // Specify the data source.
        int[] scores = new int[] { 97, 92, 81, 60 };

        // Define the query expression.
        IEnumerable<int> scoreQuery =
            from score in scores
            where score > 80
            select score;

        // Execute the query.
        foreach (int i in scoreQuery)
        {
            Console.Write(i + " ");
        }            
    }
}

So if DSL support in Haxe is being give a thought, it would be nice to at least be aware of this too, who knows, maybe even a more generic and awesome feature of the language would emerge.

Currently, we've migrated a large code base on Typescript (and React) and while it has its advantages, it is limited to the web. Would change to Haxe in a blink if React / JSX porting would be more idiomatic React-ish. Current JSX support with macros or strings has too many gotchas for one's brain, although the effort is clearly impressive and amazing.

@tokiop
Copy link

tokiop commented Jul 11, 2020

Missing background on current inline-xml-markup's limitations, domkit, jsx usage/needs, macros, compiler's parsing constraints, so I might completely miss the point, please take it as a basic haxe user proof of interest !

Xquery language allows mixing xml nodes and xquery expressions between {} which makes it straightforward to build arbitrary xml, and use the language features for templating/logic :

let $htmlGaleriesList :=
<div id="myGaleries">
  <h2 curly-in-attribute="{"{"}">My Galeries</h2>
  {
    for $gallery in $galeries
      let $uuid := makeUuid($gallery["id"])
      return
      <div id="{$uuid}">
        <h3>{$gallery["name"]}</h3>
        <img class="hero" src="{$gallery["heroImage"]["src"]}"/>
        <ul>
        {
          for $img in $gallery["images"])
            <li><img src="{$img["src"]}"/></li>
        }
        </ul>
      </div>
    }
  }
  <button>Load more</button>
</div>
  • does this example illustrate moderate "multiple level of reentrency" ?
  • is this the point of a reentrant general XML DSL ?
  • would something similar possible in Haxe with Nicolas' proposal ?

Regarding var x = <yaml { some yaml code with balanced {} } />;, I understand it would be required for parsing self closing tags while keeping the compiler parsing the absolute minimum of xml's syntax. But it would allow and maybe encourage abusing invalid xml as a block string.

IMO xml's quality is more that it is a standard, than it's syntax. I fear allowing xml-looking syntax to produce invalid xml, building code and tools on this "feature", would make better xml support difficult in the future. It seem to be the case of domkit's src={t} unquoted attribute markup for example.

If no spec-compatible subset of xml support is possible because of technical or engineering-hour limitation, no xml support seems better than a invalid/hackish one. Current inline-xml-markup without out of the box compiler support is already confusing.

If real xml support is possible, it might be a robust and powerful tool, compatible with any DSL that is indeed xml, allowing <yaml>anything</yaml> and build upon/fix current inline-xml-markup situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests