You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user I'd like pytrnsys to correctly parse attached ddck file and replace :unitAssigned correctly.
Background
TRNSYS' deck grammar is context-sensitive. The LABELS "directive" is a prime example for that. The following is correct "TRNSYS syntax"
UNIT 1 TYPE 2
INPUTS 1
0 0
LABELS 6
HELLO
EQUATIONS 1
x = 7
resulting in labels EQUATIONS, 1, x, = and 7.
Compare this to
UNIT 1 TYPE 2
INPUTS 1
0 0
LABELS 1
HELLO
EQUATIONS 1
x = 7
which would result in the equation x=7. However, it's not possible to write an unambiguous context-free grammar for parsing the above two versions correctly, as the correct parse depends on the context "the number of labels expected" (1 or 6, here).
Most grammar compilers out there are for context free grammars - the library for parsing ddcks which we're using, lark, definitely is - and most programming languages have a context free grammar. More precisely, lark can accept ambiguous grammars and will then try and generate all possible parses of the input and finally select the most "probable" one as the parse it returns to the user. I don't know how it deems one parse more "probable" than another. We should think of this as an implementation detail.
That's what makes this bug very insidious. Just by changing the version of the lark package we depend upon, the user can get different results. Also rearranging stuff in the ddck file can lead to different results - moving the UNIT with LABELS to the very end of the file might save the day for the example attached. So things are very brittle - and slow, on top of that. Being slow is also a result of lark having to generate all possible parses of the input.
Possible remedies
Write your own context-sensitive parser (started here but very early days)
Extend the lark grammar to not allow "keywords" like EQUATIONS for LABELS, etc.
1 Would have the advantage of accuracy and speed and the disadvantage of having to do quite some work.
2 Would be less work, still potentially not very fast and may not ultimately be the most flexible way.
Only 1. would be able to ensure the right number of equations, labels, inputs etc. at parse stage. This is not possible with approach 2.
The text was updated successfully, but these errors were encountered:
As a user I'd like pytrnsys to correctly parse attached ddck file and replace
:unitAssigned
correctly.Background
TRNSYS' deck grammar is context-sensitive. The
LABELS
"directive" is a prime example for that. The following is correct "TRNSYS syntax"resulting in labels
EQUATIONS
,1
,x
,=
and7
.Compare this to
which would result in the equation
x=7
. However, it's not possible to write an unambiguous context-free grammar for parsing the above two versions correctly, as the correct parse depends on the context "the number of labels expected" (1 or 6, here).Most grammar compilers out there are for context free grammars - the library for parsing ddcks which we're using,
lark
, definitely is - and most programming languages have a context free grammar. More precisely,lark
can accept ambiguous grammars and will then try and generate all possible parses of the input and finally select the most "probable" one as the parse it returns to the user. I don't know how it deems one parse more "probable" than another. We should think of this as an implementation detail.That's what makes this bug very insidious. Just by changing the version of the
lark
package we depend upon, the user can get different results. Also rearranging stuff in the ddck file can lead to different results - moving theUNIT
withLABELS
to the very end of the file might save the day for the example attached. So things are very brittle - and slow, on top of that. Being slow is also a result oflark
having to generate all possible parses of the input.Possible remedies
EQUATIONS
forLABELS
, etc.1 Would have the advantage of accuracy and speed and the disadvantage of having to do quite some work.
2 Would be less work, still potentially not very fast and may not ultimately be the most flexible way.
Only 1. would be able to ensure the right number of equations, labels, inputs etc. at parse stage. This is not possible with approach 2.
The text was updated successfully, but these errors were encountered: