Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with current syntax of values in singleton expressions #713

Closed
hanjoosten opened this issue Oct 21, 2017 · 9 comments
Closed

Issues with current syntax of values in singleton expressions #713

hanjoosten opened this issue Oct 21, 2017 · 9 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior Discussion

Comments

@hanjoosten
Copy link
Member

hanjoosten commented Oct 21, 2017

The problem

When looking at the current syntax, I see some minor adaptions that have to be made.

When using the value "Mario's pizza" as an atom, this gives problems when we want to use this atom in an expression (as a singleton). Currently that should be denoted as 'Mario's pizza'. It is obvious that the parser would complain.

This is due to that fact that in an expression, we have chosen to omit the double quotes, that normally is that container for a string value, like in the population statement. If there was an obligation to write the double quotes, even in singleton expressions, we would have to write it as '"Mario's pizza"'. This looks rather ugly.

Proposed solution (that is implemented as of Ampersand v3.9.0)

Atoms (i.e. objects, strings, dates, integers, etc.) are denoted using the same syntax both in the POPULATION statement and in an expression. Thus, string-like expressions (e.g. for objects and strings) are embedded in double quotes. Other REPRESENT TYPES use their own representations.

An expression like r ; "Mario's pizza" could be a valid expression. And so could be s \/ 42
This way of denoting atoms in expressions seems pretty natural.

Note that this implies that adl-files that use single quoted atoms, will no longer compile; the error that is generated is

Lexer error at line xxx , file yyy
Unexpected character '''

Edge cases

There is an edge case, in which the solution above does not work, because the atom's name could interfere whith the syntax of expressions. An example is when the minus sign is used, as in r; -33, where it could be interpreted as a relational operator (the complement of 33), or as part of an atom's name (referring to the negative number '-33').

To enable that all values can be used in singleton expressions, values MAY be wrapped in curly brackets. For the above edge cases, we would write r;-33 or r;-{33} for the complement. If we want to use a negative number, we would need to write r;{-33}.

If other edge cases would exist (even if we cannot think of any currenty), the curly brackets will save the day in all such cases.

@hanjoosten hanjoosten changed the title Minor issues with current syntax of values in singleton expressions Issues with current syntax of values in singleton expressions Oct 21, 2017
@hanjoosten hanjoosten added bug Indicates an unexpected problem or unintended behavior Discussion labels Oct 21, 2017
@hanjoosten hanjoosten self-assigned this Oct 21, 2017
@stefjoosten
Copy link
Contributor

stefjoosten commented Nov 1, 2017

The edge cases are not really edge cases. The problem remains that not all strings are allowed as atoms. Only now we have a different set of strings that isn't allowed. For example the atom:

-3{aapje}

Before this change, we could write '-3{aapje}' and now we cannot write this atom anymore.

Maybe we have to think a little harder, because swapping one set of unacceptable atoms for another set is an arbitrary decision.

@stefjoosten
Copy link
Contributor

@hanjoosten I suggest we do not promote this change to the master until we have a better story to tell.
@sjcjoosten I'd like your opinion too.
I'll give my take on this in the following comment

@stefjoosten
Copy link
Contributor

stefjoosten commented Nov 1, 2017

Atoms

Currently we roughly have the following notations for atoms:

  • For numbers: the number formats that are SQL-compatible, e.g. 3, 3E-2, 3.24, etc.
  • For strings: SQL-compatible strings, which is anything between double quotes with double quotes and backslashes escaped with a backslash.
  • For dates: any ISO-compliant date notation as used in SQL, e.g. 12-04-1987.
  • For other atoms: anything between single quotes.

What would be wrong when we change the fourth alternative to:

  • For other atoms: anything between single quotes with single quotes and backslashes escaped with a backslash..

Consequences

To do away with the single quotes gives problems for which we have to devise solutions. One solution is to reserve curly brackets, as Han suggests, at the cost of not being able to use curly brackets in an atom tekst.
Replacing the single quotes by double quotes in cases that the atom is alphanumeric solves the problems. In all other cases, the single quotes must not be used. Optionally, they can be replaced by curly brackets. In some edge cases this can be nescessary to avoid ambiguity.

To maintain single quotes, but to escape them as in strings is simpler to explain, and gives better compatibility with old code.

@hanjoosten
Copy link
Member Author

@stefjoosten comes to an untrue conclusion in the above comment.
To use the atom "-3{aapje}" in an expression, one simply can do that:

r;"-3{aapje}"[SomeConceptWithTypeAlphanumeric];s

It seems natural to include the double quotes in this syntax, for by doing so, we know that anything inbetween these quotes is part of the string. Double quotes within the atom's value must be escaped.

@sjcjoosten
Copy link
Contributor

Here's a couple of thoughts:

  • given a built-in type and a representation, we need to be able to decide whether the representation is valid, and how to convert it to the internal Haskell value. For instance, "aap\nnoot\nmies" could be a valid string, and would be converted to the Haskell value through 'read'. Given that 19850112 is an integer, it would be valid as well. However, 19850112 is not a valid string, and "aap\nnoot\nmies" is not a valid integer.
  • for parsing reasons, we must be able to distinguish between declared relations, operators and constants. One way to do this, is to have everything that is not an operator be a relation or constant, and the difference between relations and constants is determined by whether it is declared or not.
  • for educational reasons, it would be good to have syntactic restrictions that allows us to distinguish the above without needing to refer to a list of operators or declarations. That way, errors could be not a valid operator: $, relation not declared: aap and not a valid constant: -3{aapje}. This would also simplify parsing
  • we do not need to distinguish between constants per sé, we could let the type system do that: if I write the expression born;19850112, then the type system could figure out that I intended 19850112 to be a date (it is an ISO compliant date, as per Stef's suggestion).
  • we would sometimes need to disambiguate constants, in the usual way, so: 19850112[Date] or 19850112[Integer].
  • I also propose to have concepts with no valid representation, which could hold internal identifiers (for sessions, and for concepts that are fully determined by its properties).
  • We could still have things between single quotes as Stef proposes, but I suggest to let them stand for things that have no valid representation. That means that 'aap' = 'noot' might be true! In fact, I would suggest that you will even have to write V;'aap';V = V in order to insist that 'aap' is non-empty. The automatic rules (ie axioms) would be 'aap' |- I and 'aap';V;'aap' = 'aap'. If you don't want one of the latter two to hold, one should declare a relation for it.
  • Perhaps I should not have made that last point for the sake of this discussion. It might be best to only focus on representable constants (atoms you can identify with code).

@hanjoosten
Copy link
Member Author

hanjoosten commented Nov 1, 2017

Except for the last two of the points made by @sjcjoosten , this all is nearly finished in the branch I am currently working on. I expect this to be finished somewhere next week.

@sjcjoosten
Copy link
Contributor

sjcjoosten commented Nov 1, 2017

That's great to hear! Then I can reassure Stef that you don't really need the last two points. Creating a declaration for aap and noot, and adding the rules I mentioned would already do the trick.

@stefjoosten
Copy link
Contributor

I'm quite reassured. I have no reason for sticking to single quoted atoms as an expression. This is how I understand the result of this discussion:

  • The parser produces only atoms with a unique technical type: e.g. "this is a string", 28745, 12-02-2001. Let us call these "representable atoms".
  • Han has done away with the single-quote notation 'aap', which is an expression that stands for the (anonymous) relation with one pair ("aap","aap"). That is an important step to solving the Quote issue. Besides, it clears the way for a possible future extension towards a user-defined non-representable atom, as @sjcjoosten suggests.

I'll try to explain it in the user documentation. Please keep an eye on that, because this type of detail can confuse novices.

@RieksJ
Copy link
Contributor

RieksJ commented Dec 4, 2017

@hanjoosten: can you verify that the first entry in this issue (which I have edited), is correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior Discussion
Projects
None yet
Development

No branches or pull requests

4 participants