Skip to content
Permalink
Branch: master
Commits on Apr 16, 2019
  1. Bare function calls are statements...

    Jeffrey Goff
    Jeffrey Goff committed Apr 16, 2019
    Terms like 'go' and 'nl' are function calls, and much Perl 5/6,
    they don't need parentheses. So let's go through and replace all of
    the function calls with an actual <statement> rule.
    
    The idea is that in a few revisions, most of the function bodies
    will be <statement>+. We'll compress those to <function-declaration>
    which has a little glitch in it at the moment.
  2. Refactored so that I can keep import statements separate.

    Jeffrey Goff
    Jeffrey Goff committed Apr 16, 2019
    Now that I know that 'import' and 'module' can only occur at the
    start of the module, I've refactored things so that 'import foo.'
    is in the main TOP block, so it won't participate in the body
    of the program.
    
    This is also why I renamed 'import-statement' to 'import-declaration'.
    I'm more used to Perls 5 and 6 where 'use' statements can occur most
    anywhere in programs. This distinction is important in Perl 6 where
    your import statements can affect the syntax after it's been included,
    but in picat apparently it's really not.
    
    I've also named the "child" rule 'program-block' because I might
    get around to creating an official <program> block at some point.
    In BNF "program" (or by convention the last rule) is the root of
    the grammar. In Perl 6, TOP takes that role on.
    
    It's also easier to do this in order so I don't have to flip to the
    bottom of the file to see what the root declaration is.
    
    Next time we'll get back to the (sort of abandoned)
    <function-declaration> thing, and actually make it work.
  3. Another woopsie found and fixed...

    Jeffrey Goff
    Jeffrey Goff committed Apr 16, 2019
    Tiny little change - I forgot that I'm testing this grammar as the
    *entire* file, so I restored the ^ at the top of the blocks.
    
    In the next step I think I'm going to do a little more
    infrastructure work and build out a proper <program> declaration
    block. This will also move the bulk of what we're working on
    here down one level.
    
    It'll be a big change, but what's going to happen is that TOP will
    no longer contain the entire file worth of matching, instead it'll
    look roughly like this:
    
    rule TOP
      {
      ^ <program> $
      }
    
    rule program
      {
      <comment>* <import-declaration>?
    
        [
        ... first file's worth...
        ||
        --- second file's worth...
        ]
      }
    
    Yes, this does look much like the official grammar, and that's
    intentional. Naming is hard, and if I have an established guide
    I'll follow that. And what's a better guide than the official
    reference?
    
    I've not followed it until this point simply because I can, and
    I can show that you can do a lot on your own without a guide.
  4. Fix up imports

    Jeffrey Goff
    Jeffrey Goff committed Apr 16, 2019
    This comes in two pieces. The first and most obvious change is that
    'import foo' is no longer an official statement. If it were, that would
    mean that "import foo" could occur anywhere in the file, and I now know
    that's not legal. So out it goes, and I have to restore a bit of the
    earlier grammar.
    
    I've also made a slightly more subtle change, and reverted what were
    <statement>s above the <import-declaration> to their original <comment>
    name. This may be a sign that <comment>s really won't be part of
    the <statement> rule in the long term, but I'll reserve judgement
    on that.
    
    Also, the official grammar says that a 'module' declaration exists
    above the 'import' declaration, but I'm not testing that just yet.
    
    I could do that easily, and add another test... I think I might just
    do that as the next step.
  5. Cheating with functions...

    Jeffrey Goff
    Jeffrey Goff committed Apr 16, 2019
    This is actually not what I'd do while designing this grammar. I'd
    actually jump ahead to what I know is coming up, but this is meant
    to be both a git log and a tutorial of sorts.
    
    I'm also seeing a potential issue, because what I want to do is
    be able to replace 'go' with "zero or more statements" eventually,
    but that means that 'function_name => import x.' will be legal,
    because 'import x' is a legitimate statement right now.
    
    That doesn't feel right, so I've just consulted the official grammar.
    Yes, there is one, in the user manual. I haven't done that before,
    but tonight felt like the right time.
    
    It says that there's an overarching 'program' construct that has an
    optional (that's what the [] around it in the grammar means - it's
    a BNF grammar, and that's how they call options.) module declaration
    then an optional import statement.
    
    So next thing will be to fix that mistake, and then we can get on
    with the proper buildup.
Commits on Apr 15, 2019
  1. Breakin' up isn't hard to do

    Jeffrey Goff
    Jeffrey Goff committed Apr 15, 2019
    Break up all of the statement blocks, now that I've made the point
    that there will be some things in there that won't be statements.
    I'm sure that the language has more than just "foreach" loops.
    
    So, at least the 'end' terms won't be statements on their own. We'll
    keep an eye out for those, and maybe even do those on the next pass.
    
    Nothing else really changed, although I should point out again that
    if 'nl .' is illegal in the actual language, that we have changed
    the actual spec a little bit. Again, I think it'll be simple to remove
    the space if I need to, probably by changing from a rule that makes
    the whitespace optional to a token that makes the whitespace illegal.
  2. Break apart statements at separators

    Jeffrey Goff
    Jeffrey Goff committed Apr 15, 2019
    These would be the ',' and '.'. All I've really done here so far is
    (with a few exceptions) separated out the statements. If you've got
    a good eye, you might see that I seemed to miss a few things. Those
    are intentional, actually. (unless I missed one, in which case...
    wait until the next commit.
    
    Take a quick look at:
    
      'foreach(I in 1..N)
         Root = sqrt(I)' ','
         'writeln([I, cond(Root == 1.0*round(Root), open, closed)])
      end' ','
    
    You'll see that I broke out the ',' but I didn't break out the
    'foreach(I in 1..N)' bit above it. This is mostly to remind me that
    I really don't think "foreach(...)" is a statement in itself, but
    in fact it can better be thought of as a "header" to a block.
    
    In Perl 6, we have constructs like:
    
    while {
      #`{ Code here.. }
    }
    
    When I was talking about statements earlier, you might remember that
    I said I considered things a statement if they could be run stand-alone.
    Well, 'foreach(I in 1..N)' has a problem. You can't really run that
    without knowing what code you're foreach'ing over.
    
    So the full 'foreach..end' thing is kind of a statement in that the
    entire thing is what gets compiled, but it's got statements *inside*.
    How we usually do this when doing compiler stuff is we'd treat it
    like:
    
    rule foreach-statement
      {
      'foreach' '(...)' <statement>+ 'end' '.'
      }
    
    If the language were more block-structured, then you could even do
    something fancy like:
    
    rule foreach-statement
      {
      'foreach' '(...)' <block>
      }
    
    rule block
      {
      '{' <statement>* '}'
      }
    
    with the advantage that you can reuse the '{'..'}' bit for other
    constructs like:
    
    rule with-statement
      {
      'with' <variable> <block>
      }
    
    In passing, I'll note that I've done enough work with this that I can
    read what's up there at sight and be able to tell what it should be able
    to parse.
    
    I just walk left-to-right and mentally translate the terms, so I'd
    read that as "with $foo { ... }" without really having to reference
    other lines. Variables are a compact way of "understanding" "$foo",
    "@bar", "$bar[32][$ding]" in other languages.
    
    I also have a list of corner cases there, like $bar[32][$dig] that I
    mentioned, so I can run over the list to see if there's a combination
    of variables and the 'with' construct that doesn't work together.
    
    A better explanation of that might be:
    
    rule modulus
      {
      <variable> '%' <constant>
      }
    
    '$x % 32' is valid, so is '@foo[3] % -7'... but '$x % "FOO"' wouldn't
    make sense, because while 32 and -7 are constants, so are things
    like strings. That wouldn't work together, so I might want to try
    something like this:
    
    rule modulus
      {
      <variable> '%' [ <number> | <variable> ]
      }
    
    But it's past my cutoff time, and I'm going to turn into a pumpkin
    if I don't commit and get to bed.
  3. Create the basis of a "statement".

    Jeffrey Goff
    Jeffrey Goff committed Apr 15, 2019
    It might be about time to talk about naming things. Language design
    has a long and storied history, and you'll see that some parts have
    somewhat confusing names.
    
    Like "statement" here. I've combined <import-statement> and <comment>
    here into one "statement" with the idea of being able later on to
    add things like "assignment-statement" to it later on. But one thing
    that you might be thinking is that "But...but... a statement *states"
    something, a comment really doesn't have anything to say."
    
    This is true, but comments behave like <import-statement>
    syntactically. They can be on their own line like an assignment,
    and they can kind of "hang out" after the import-statement,
    
    The comma is a statement separator, and if you look at lines like:
    
    'initial_state([2,4,1,7,5,3,8,6])' '.' <statement>+
    
    so is a period. So it can take the place of a statement, like this:
    
       write(L), nl,
    
    which also has two statements in a row. It looks like a duck
    (it can take the place of a statement), it quacks like a duck
    so I'll call it a duck. True, you can't stick another statement
    after it and expect it to compile and execute, but that's just
    kind of a quirk of the comment tag - besides, we still allow
    
      write(L), /* foo */ nl,
    
    to compile, although not many people use this because it's hard to
    read.
  4. Clean up some of the terms, separate out '.' for later use

    Jeffrey Goff
    Jeffrey Goff committed Apr 15, 2019
    Basically getting my head back into this. I'll note in passing that the
    first time I ran this test it *might* have "infinilooped", shorthand
    for "got stuck." I removed the lib/.precomp directory just in case
    and it ran just fine afterwards.
    
    I think the next thing to do won't advance the grammar per se, but it
    will add a litle more structure, and it'll create a term that's
    going to see a lot more use.
  5. Adding a "regression" as a note more than anything...

    Jeffrey Goff
    Jeffrey Goff committed Apr 15, 2019
    I've got a whole corpus of test files to look through, I'll check
    that to see if any of the function names has spacing between the
    last character of the name and the opening parentheses. I basically
    forgot about that possibility, but part of this git log is
    recording both the mistakes and the mainline of changes.
Commits on Apr 14, 2019
  1. Adding a function name

    Jeffrey Goff
    Jeffrey Goff committed Apr 14, 2019
    There are a bunch of things I could have done here, but of course
    I chose this. There's not much except for the real meat of things,
    the function bodies and a few random function calls. I'll want
    eventually to create something like a function rule, and it'll
    likely look something like this:
    
    rule function-definition
      {
      <function-name> <function-arguments>? <function-body> '.'
      }
    
    and the first term of this happens to be <function-name>. It's
    simple, but it'll work, and is a simple way to start in on
    our first "real" rule.
    
    Now, this *does* actually change things slightly, And in recognition
    of this I'm going to add a regression test as a reminder that
    we've potentially broken our grammar. It shouldn't cause much of
    a problem, but admittedly these things do tend to build up, and
    it's a good idea to keep an eye on them, or at least acknowledge
    the fact that there may be a problem here.
  2. Moving a little faster now...

    Jeffrey Goff
    Jeffrey Goff committed Apr 14, 2019
    Let's tackle a little bigger bite now. I'm going to take a guess
    that 'import modulename.' is picat's way to import a module. Seems
    a bit risky to me, but I'm guessing that's what's happening here.
    
    So, let's encapsulate this bit. First, we can call this three terms
    in a row. 'import' is what we'd call a "keyword." 'modulename' is...
    well, a module name, and '.' is the end of.. well, what I guess we
    can call a statement, for lack of a better word.
    
    I'm going to guess it's the equivalent of 'use Module::Name;' in Perl,
    and call the whole thing "import-statement". So, according to the
    rules we came up with in our last commit, 'import' and 'modulename'
    can probably be separated by any amount of whitespace we want,
    
    So, these things should be part of a rule and not a token. We could
    just jump in and write:
    
    rule import-statement
      {
      'import' \w+ '.'
      }
    
    and capture 'import module_name .' ad nauseam, but that wouldn't be
    as fun as what's coming up. 'modulename' here should really be
    given its own name here, so let's create a ... not a rule, but a
    token for that.
    
    This is also handy because we could (though it's highly unlikely)
    find out that module names can be arbitrary strings, and we don't
    want to have to come back and change our code.
    
    So, if we write:
    
    token module-name
      {
      \w+
      }
    
    and use that in place of our \w+ above. That also gives us a name
    for what "arguments" 'import' takes, and reads clearer.
  3. Aand the first real fix.

    Jeffrey Goff
    Jeffrey Goff committed Apr 14, 2019
    There are probably other ways to fix this problem, but I've
    chosen this simply because it's expedient, and while it could
    bite me later I don't think it'll be a problem.
    
    It also gives me a chance to explain the crucial (if not only)
    difference between tokens and rules. Suppose for a moment we had
    'main' '=>' 'go' as part of our rule. We don't yet, but bear with
    me.
    
    Inside the rule, this matches 'main => go', 'main     => go' or
    'main=>go', with varying amounts of whitespace between the terms.
    Rules effectively insert \s* between each term for us and make things
    more convenient, so we don't have to add that manually and account
    for each kind of whitespace.
    
    Tokens don't. They're meant for the "lower level" things like
    '12.375', '$name' and even '"foo bar"' Each of these forms a single
    unit, in fact you could call them <number>, <identifier> and <string>
    respectively.
    
    <string> is a special case because it can have whitespace inside it,
    but the entire "foo bar" thing is a single "object". So, now, let's
    take a closer look at what's going on here.
    
    The rule (in part) <comment> <comment> <comment> worked before the
    change, because it was *really* <comment> \s* <comment> \s* <comment>.
    But, when we changed this from <comment> <comment> to <comment>+,
    Perl 6 assumes we know what we're doing and stops inserting whitespace.
    
    So it now "expands" to <comment> <comment> <comment> again, without
    the \s* in between. There are other ways to turn the whitespace
    behavior back on, but we'll save those for later in the grammar.
    For now, it's easy enough to insert the whitespace at the end of
    the original token.
  4. Whoa, another problem.

    Jeffrey Goff
    Jeffrey Goff committed Apr 14, 2019
    Why doesn't <comment> <comment> <comment> equal <comment>+? It...
    should, you'd think.. That's what (buckle down here...) finite
    automta theory teaches us, at least. But finite automata like
    traditional regexes isn't what Perl 6 uses. Instead it uses
    Parser Expression Grammars, which are a composable way to build
    your grammars.
    
    There are a few ways to fix this problem, and we'll talk about what
    I chose in the next commit.
  5. Add the other style of comment.

    Jeffrey Goff
    Jeffrey Goff committed Apr 14, 2019
    Now, you'll notice a few things here. I've replaced every comment,
    flying or otherwise, with the <comment> block. And where the original
    file had '% foo % bar', because I'm going to assume that it's C++-style
    and is valid until the end of the line, I'll turn that into a single
    <comment> tag.
    
    This is quite a bit of work, and I'd definitely suggest learning either
    rectangular edit mode in vim, or doing a careful search-and-replace on
    those terms. For the moment I'm going to just do the replacement and
    not collapse the <comment>s into <comment>+.
  6. Add our first token

    Jeffrey Goff
    Jeffrey Goff committed Apr 14, 2019
    Let's get the first comment style taken care of. The '/*'.. '*/'
    comments usually are unbalnced, so we don't need to use styles
    like C< '/*' ~ '*/' .+? > which will account for balanced delimiters.
    
    We'll want to merge '%' style comments in later, but let's ignore
    that for now, and proceed term-by-term.
Commits on Apr 13, 2019
  1. Quick fix at last...

    Jeffrey Goff
    Jeffrey Goff committed Apr 13, 2019
    Okay, back to what I was explaining.
    
    I've broken up the file *very* crudely at what I think "logical boundaries"
    are going to be in the picat language.
    
    I see that ',' is used in place of semicolons, and '.' looks like it
    terminates a block. '%...' looks like it's a comment, as is '/*' .. '*/'.
    The familiar '=>' is used to separate name from body, and %-style comments
    can fly from the end.
    
    All good things to know but I'm most concerned right now where text
    starts and stops. So I'll go through the simple boundaries, not
    forgetting the comments, and make sure things work.
    
    You'll note as well that I've added this around the text blocks:
    
    ^ [ ... || ... ] $
    
    That way I can just add another '||' and the text in case I need to
    work with a different file. The ^ .. $ makes sure the parser goes
    through the entire text file.
  2. Woops, first little misatke...

    Jeffrey Goff
    Jeffrey Goff committed Apr 13, 2019
    (Make that *second*...)
    
    I'm going to horribly abuse git as I go along on this project.
    Specifically I'm *deliberately* committing a broken repository to show
    both that even I make mistakes :) and what kind of issues you'll encounter
    when doing this sort of work.
    
    In this case I've just broken up both of these files into their logical
    units as I see them, and forgotten a "'" somewhere.
    
    On my system, the tests have gone from 'ok' to:
    
    > 6 t/01-basic.t
    ===SORRY!=== Error while compiling /home/jgoff/Documents/Repos/drforr/perl6-Language-Picat/lib/Language/Picat/Grammar.pm6 (Language::Picat::Grammar)
    Missing quantifier on the left argument of %
    at /home/jgoff/Documents/Repos/drforr/perl6-Language-Picat/lib/Language/Picat/Grammar.pm6 (Language::Picat::Grammar):304
    
    ('6' is a command-line alias I wrote that's just:
    
    alias 6 = "perl6 -Ilib"
    )
    
    Missing quantifier on the left ar... <shakes head> what?...
    
    --cut here--
    % goal_state(Goal) => Goal = [1,2,3,4,5,6,7,8,9,10,11,12].'
    --cut here--
    
    Oh, that's where I missed the last '...
  3. A beginning is a very delicate time...

    Jeffrey Goff
    Jeffrey Goff committed Apr 13, 2019
    The main players in this are corpus-passed/*, t/01-basic.t and
    lib/Language/Picat/Grammar.pm6. The grammar is currently set up to
    blindly recognize ("parse" is too strong of a word) the two sample
    files.
    
    It just looks for the entire string, makes sure we've reached the end
    of the file in both cases, and then reports a successful match. The
    '$' term may seem extraneous, but it's very important indeed. It
    probably should be factored out, but that's too much trouble at this
    point.
    
    I've also added 'use Grammar::Tracer' at the start, along with a
    workaround of 'no precompilation' to take care of a tiny issue on my
    Linux box that actually bugged me for about a week.
    
    In any case, we can be sure that the grammar in question "parses" the
    entire set of files.
Newer
You can’t perform that action at this time.