Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about syntax #52

Closed
fachammer opened this issue Sep 18, 2019 · 3 comments

Comments

@fachammer
Copy link
Contributor

commented Sep 18, 2019

As I'm currently developing a tree-sitter grammar for Lux I've looked at the syntax file you committed and I have some questions:

  1. Is it valid to have consecutive commas inside a number or can there only be one comma between two numbers? So are the following valid naturals?

    • 1,,0 (= 10)
    • 123,,,,,456 (= 123456)
  2. Are there plans to support exponential notation for fracs (and potentially revs)? e.g.

    • +1.0e4 (= 10000)
    • +123.4e-5 (= 0.01234)
    • .1234e-1 (= 0.01234; for revs there would need to be some restrictions on the exponent so that the resulting number is still between 0 and 1)
      I'm not saying that it's particularly necessary, but it might be convenient sometimes.
  3. Can the sign for fractions be optional? E.g. are

  4. Regarding juxtaposition of expressions: My guess is that there are some rules about which expressions can be placed next to each other without whitespace as separator and which cannot. For example: There is no ambiguity in separating two texts next to each other

     "text-a""text-b" 
    

    I'm guessing this would just parse as two separate texts even though there is no whitespace between them. The same goes for all enclosed literals (forms, tuples, records).
    However, there is some ambiguity for example with tags:

     #tag#followed-by-tag
    

    Are these two separate tags or would this be a syntax error?
    And what about identifiers:

     a.bc.d
    

    My guess is that this would be parsed as 2 identifiers (a.bc and .d). The parser couldn't know that I actually meant that this should be the identifiers a.b and c.d because I didn't write a whitespace. However, the parser could state that the expression as a whole (not separated by whitespace) is not a valid identifier since it has more than one dot inside. In this case putting a whitespace at the right position could fix the ambiguity.
    So basically, my question is whether the parser gives a syntax error when there is a potentially erroneous situation like the one above with the identifiers and forces you to fix the ambiguity or if it just starts a new identifier when it sees a dot followed by a character even if there is no whitespace to separate it from the previous identifier.

@fachammer

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

@eduardoejp Do you have any input on this?

Also some additional questions came up for me:

  • What does the dot inside an identifier represent?
  • What does it mean when an identifier has two dots in front?
  • How would an expression like identifer-1..identifier-2 be parsed?
@eduardoejp

This comment has been minimized.

Copy link
Member

commented Sep 28, 2019

Hello, @fachammer.

Sorry for the delay. I've been super busy this week.

Is it valid to have consecutive commas inside a number or can there only be one comma between two numbers?

You can have as many consecutive commas as you want.

The commas are there exclusively for human convenience, so the parser just strips out any commas that it finds.

Are there plans to support exponential notation for fracs (and potentially revs)? e.g.

Fracs: You can use exponential notation. I forgot to add that to the document.

Revs: You cannot use exponential notation. There are no plans to do so either.

Fortunately, there is less of a need for exponentials while using revs, than while using fracs, because of the range of numbers they cover.

Can the sign for fractions be optional?

Nope.

The v0.5 compiler has the sign be optional, but I have since made it mandatory for v0.6 onward.

The code you point to hasn't been worked on in a while and isn't currently being used.

It was written prior to making the sign mandatory.

Your parser should always assume either a positive (+s) or a negative (-) sign, no matter what.

Are these two separate tags or would this be a syntax error?

Two separate tags.

My guess is that this would be parsed as 2 identifiers (a.bc and .d)

Your intuition is correct.


In general, using spaces is a good idea to keep code legible, but Lux doesn't assign any meaning to whitespace.

As far as Lux is concerned, whitespace is allowed purely for programmer convenience.

The only situation when whitespace would be necessary, would be to distinguish different tokens which, if written right next to one another, could be mistaken for a single token.

For example:

  • 123 456 vs 123456.
  • foo bar vs foobar.

However, in any situation where 2 tokens can be written right next to one another without ambiguity, no whitespace is necessary.


What does the dot inside an identifier represent?

Both identifiers and tags are comprised of 2 elements, a text which identifies a module, and another text that just identifies the identifier/tag within the module.

This is due to the fact that identifiers are used primarily to give names to definitions, which live inside of modules, and tags are also defined in the context of modules.

The dot (.) separates the name of the module from the local name of a definition or a tag.

This is done in this way: module-name.local-name.

What does it mean when an identifier has two dots in front?

The 2 dots (..) are a shorthand for "this module".

For example:

Let's say that I'm in module foo and I define function bar.

Later on, I want to use function bar in the definition of something else inside my module.
But let's say that bar is a common name in my library, and there are other bar functions in other modules I import.

So long is I didn't locally import any of the other bar functions into my foo module, there is no ambiguity if I just refer to bar (the compiler just assumes you're referring to foo.bar and nothing else).
But, if I want to avoid any potential confusion with a reader of my code, I might want to write foo.bar instead, so there is no potential ambiguity for the human reader (emphasis on human, since there would be no ambiguity for the compiler anyway).
Alternatively, I could just write ..bar, and the compiler would just translate that into foo.bar, since I'm already in the foo module.


There is also a single-dot syntax for identifiers/tags that looks like this: .bar.

The single-dot is just a shorthand for referring to the lux module. So .bar is the same as lux.bar.

This is simply because the lux module is used throughout the language, and having that shorthand can save a few keystrokes.


How would an expression like identifer-1..identifier-2 be parsed?

It would fail.

The parser would read identifer-1. Then it would read the dot (.) and assume that identifer-1 was the module part of an identifier.

Then it would read the second dot (.), which is an invalid character for a segment of an identifier, and you'd be shown a syntax error.

@fachammer

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

Alright, thanks a lot for the detailed answer!

@fachammer fachammer closed this Sep 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.