Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corner Cases #9

Closed
marler8997 opened this issue Aug 27, 2014 · 21 comments
Closed

Corner Cases #9

marler8997 opened this issue Aug 27, 2014 · 21 comments

Comments

@marler8997
Copy link

After writing my SDL parser I took some of the corner cases I was testing and tested your parser with them. You can take the text here and uncomment the lines case by case to see what the parser does.

//
// Case 1: If brace appears on it's own line, the parser silently ignores
//         everthing inside the curly-braces and does not throw an error
//
//parent
//{
//  child1
//  child2
//}

//
// Case 2: Multiline-comment makes braces after newlines acceptable
//
//tag/*
//  Yay, now I can put my brace on it's own line!
//*/ {
//  child
//}

//
// Case 3: Parser does not accept an attribute as the first value of an anonymous tag
//
//attribute="value"

//
// Case 4: This is not suppose to be valid SDL (I asked Daniel about this one)
//
//tag"hello"


//
// Case 5: Both of these are valid SDL but the parser doesn't like them
//        (I also had to ask Daniel about this one)
//        They both create anonymous tags.
//        NOTE: this would only be valid for tag names, not attributes
//
//namespace: "value"
//namespace:"value"

//
// Case 6: semicolon ends tag even after open brace
//
// tag {;} // Accepted by parser
// tag {}  // Not accepted by parser

//
// Case 7: semicolon after open-brace is ignored
//
//tag {;
//child
//}


//
// Case 8: Parser silently ignores all of these and does not throw an error.
//         Currently this is invalid SDL, but it may become valid in SDLv2 as
//         a way of escaping keywords as tag-names (and possibly attribute names).
//
//:my-tag
//:another-tag :still-doesnt-work
//:yet-another;:and-another
//:null :0
//
@s-ludwig
Copy link
Contributor

Case 6 looks awkward, did you check this with Daniel? Either stray semicolons should always be ignored or never...

Case 8: You wrote "currently this is valid SDL, but may become valid in SDLv2" do you mean that it should be invalid for SDL1?

@marler8997
Copy link
Author

Case 6: I didn't check with Daniel on this one yet. I'm going through his tests to see if he covers this case. This may or may not be valid tag { ... } may not be valid if there is no newline between the braces.

Case 8: yes, I meant "currently this is invalid SDL". I fixed my typo.

@s-ludwig which of these issues should we require get fixed if we were to use this parser for DUB? If it were up to me I would probably want at least cases 1, 2, and 3 fixed.

Here's the consequences for using the parser without fixing the other cases:

Case 4

Causes the parser to accept invalid SDL. Not a huge deal since no one will likely omit the space between the tag name and the value:

dependency"vibe-d"

However, if the parser were to be fixed later then it would break anyone who accidently didn't put a space here.

Case 5 and 8

Not huge deals because DUB isn't using the namespaces (correct?). Hopefully no one is using colons at all as control characters in their SDL.

Case 6 and 7

I'm not sure how Daniel feels about including the entire curly-brace section on one line so I'm really not sure what the right way to fix this is. Either way, this one is not a huge deal since no one's dub package file should have anything like this.

@marler8997
Copy link
Author

I just tested Daniel's java parser and his has some of the same issues. What he has been telling me is valid and invalid SDL isn't always the same as what his parser accepts and rejects. This is disappointing but it's not too surprising since SDL doesn't have a grammar. His parser has the same problems in the following cases:

Case 2

Case 3

Case 4

Case 5

Case 6

Case 7

I'm not sure what to do here. I think before we move forward on SDL we really need to get a grammar for the language.

@s-ludwig
Copy link
Contributor

I don't see most of those as a blocker, although it is true that any accepts-invalid bug may lead to breakage later on. IMO, case 1 is the only one that is critical. Case 3, just to make sure that I don't misunderstand, is not supposed to be valid SDL, right?

The grammar should be a priority, but it shouldn't block DUB from using it. There will be examples, so that it is very unlikely for people to start hitting most corner cases that have been mentioned so far (the brace-after-newline case seems to be the only one that is somewhat likely). And before the next version is out and before there is a considerable number of packages using SDL, it will take some months, so there is plenty of time to work on the SDL spec in the meantime.

@Abscissa
Copy link
Collaborator

case 1 likely wouldn't be an issue for dub unless dub's schema actually uses anonymous tags with children (unlikely, I assume), or silently ignores any unsupported tags (probably not the best thing anyway).

@marler8997
Copy link
Author

Actually I believe case 3 is suppose to be valid SDL, but you are right that this isn't an issue for DUB since DUB doesn't use anonymous tags. However I would definitely push that we fix Case 1 before using it in DUB. A curly brace on it's own line actually isn't an anonymous tag, it's a syntax error. And even if it was an anonymous tag, this parser doesn't treat it as one, it treats it as a comment (completely ignores it). This is even more of a problem because the SDL spec does not say that curly-braces can't be on their own line (I had to ask Daniel that one). So when a user does make the mistake of doing this they will get obfuscated errors about tags missing their child tags.

subPackage
{
    name "my-name"
}
// Results in DUB error: subPackage missing required 'name' tag
// Should say something like: curly-braces can't be on their own line, move
// the open-brace for the subPackage tag to the same line

@s-ludwig
Copy link
Contributor

Regarding case 3, quoting the language guide:

Note: Anonymous tags must have at least one value
Anonymous tags must have one or more values. They cannot contain only attributes. This design decision was taken to avoid confusion between a tag with a single value and an anonymous tag containing only one attribute.

# Not allowed: An anonymous tag with a single attribute (and no values)...
size=5

# ...because it could easily be confused with a tag having a single value
size 5

@marler8997
Copy link
Author

Ah thanks I missed that. Case 3 is good then.

@marler8997
Copy link
Author

So, I never got an answer from you @s-ludwig . Should we wait for Case 1 to be resolved before we consider using this parser?

@s-ludwig
Copy link
Contributor

As long as it gets fixed before the first pre-release there shouldn't be an issue. If the parser is constructed anything like what I'd imagine, it should also be a trivial fix.

@marler8997
Copy link
Author

Before the first pre-release of DUB?

@s-ludwig
Copy link
Contributor

Exactly, the first (alpha/beta) release of 0.9.23 or 1.0.0, depending on what we decide to be the next version.

@marler8997
Copy link
Author

Oh ok, @Abscissa and @s-ludwig . Is fixing case 1 something that can be done before the next release of DUB?

@Abscissa
Copy link
Collaborator

Abscissa commented Sep 2, 2014

Thanks for the heads-up.

Cases 1 and 8 are legitimate bugs in SDLang-D because they differ from the Java implementation's behavior. So those two should be fixed. I've created separate issues for them: #12 and #13 respectively.

To ensure maximum compatibility, SDLang-D should (whenever reasonably possible) follow the original Java lib's actual behavior. If the Java lib's behavior changes (for example, to fix unintended incorrect behavior), then and only then should SDLang-D follow suit.

If/When Daniel publishes a formal "official" grammar SDL, then I may reevaluate this policy.

Aside from cases 1 and 8 (which, as mentioned above, I've now created separate issues for), the rest of these cases behave identically in both SDLang-D and Daniel's original Java lib. Therefore I'm closing this ticket.

However, if an updated version of the Java lib is released which changes the behavior for any of these eight cases (or anything else), then please file a new ticket for each case affected.

@Abscissa Abscissa closed this as completed Sep 2, 2014
@marler8997
Copy link
Author

@Abscissa and @s-ludwig . I've contacted Daniel and wanted to confirm that we are OK with the following:

  • Any amount of whitespace between tag namespaces and attribute values are OK.
html     :     img       html     src     :    "my-image.jpg" // VALID SDL
  • Only whitespace (or a comment) may come between the open-brace of a tag and the newline. Also, close braces must be on their own line
tag {
    first-child // VALID
}

tag { first-child //...INVALID
}

tag {
    first-child } // INVALID

tag { first-child } // INVALID

If you guys are OK with these rules in SDL, I'm going to try to get a grammar approved from Daniel. If not, then maybe we can make an "unofficial" SDL that uses a different grammar.

@Abscissa
Copy link
Collaborator

Abscissa commented Sep 3, 2014

html : img html src : "my-image.jpg" // VALID SDL

I think you meant:

html : img html src = "my-image.jpg" // VALID SDL

(Ie, second colon should have been an equals sign.)

Yea, I'm fine all of that, including the curly brace stuff. Heck, that's how everything already works anyway.

Besides, permitting the extra options for the placement of curly braces will just make SDL seem more freeform than it really is, so people would be that much more confused when "opening brace on its own line" doesn't work.

And it's better than Python-style "enforcement" of formatting, because nothing is silently accepted with the wrong behavior. Either it works as expected or it's a loud error.

So I'm good with all that.

@marler8997
Copy link
Author

Ok.

@Abscissa and @s-ludwig : I've created a potential grammar here at http://marler.info/sdl.xhtml. Let me know if you have a chance to look at it and if you have any corrections or can think of any cases that make this grammar incorrect. After I confirm the grammar seems to be correct, I'll try to get Daniel to verify this is what he has in mind.

Note: I didn't include any information on what an SDL literal is, just assume it's a normal SDL literal.

@Abscissa
Copy link
Collaborator

Abscissa commented Sep 3, 2014

I've created a potential grammar here at http://marler.info/sdl.xhtml.

Cool. Things I've noticed so far:

  1. That doesn't appear to take into account semicolon separators. Remember, semicolon can be used as an alternative to newline: tag1; tag2; tag3. Or is NEWLINE implicitly considered to include semicolons?
  2. I need to double-check how both parsers handle consecutive semicolons. Not sure whether that is/should be permitted just like consecutive newlines: tag1; ; tag 2
  3. I wonder if maybe the tag grammar should be split into separate alternate forms, so that we can encode the "anon tags must have at least one value" rule in the grammar itself.

@marler8997
Copy link
Author

That doesn't appear to take into account semicolon separators. Remember, semicolon can be used as an alternative to newline: tag1; tag2; tag3. Or is NEWLINE implicitly considered to include semicolons?

Ah yes thanks for catching that. I've updated the Grammar to handle semi-colons. I also fixed an ambiguity in the tag production rule by requiring a tag be ended in one of 4 ways (NEWLINE, EOF, CURLY_BRACE_SECTION, SEMI_COLON).

I need to double-check how both parsers handle consecutive semicolons. Not sure whether that is/should be permitted just like consecutive newlines: tag1; ; tag 2

The java parser completely dies when you do this. (ArrayIndexOutOfBoundsException). Since the grammar and the Java parser already disallow this I think it's good to keep this as invalid SDL.

I wonder if maybe the tag grammar should be split into separate alternate forms, so that we can encode the "anon tags must have at least one value" rule in the grammar itself.

Actually the grammar already requires that each tag have at least a name or at least one value. Notice the first branch in the tag production rule requires you either have a name or at least one literal.

Thanks for your help. I'll keep thinking about this let me know if you find anything else.

@Abscissa
Copy link
Collaborator

@marler8997: Do you remember whether you spoke to Daniel abut Case 8? Do you know whether it was his intention to allow it at some point?

@marler8997
Copy link
Author

I just looked through the cases and my emails to try to familiarize myself again. It looks like when I asked him about it he just said "Not valid".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants