Skip to content
ThetaDev edited this page Aug 19, 2022 · 13 revisions

Reading

To do macros we need to read. To read we need to match delimiters and build a token tree to know where a macro invocation ends.

But JS complicates this because delimiters can appear inside of a regex literal and deciding if / is the start of a regex or the division operator depends on parsing context.

The read algorithm follows but see also the DLS 14 paper which goes into more detail.

Give lookbehind to the reader

So to handle the problem of / we can use "almost one" lookbehind to disambiguate. Algorithm:

skip over comments

if tok is /
    if tok-1 is ()
        if tok-2 in "if" "while" "for" "with"
            tok is start of regex literal
        else
            tok is divide
    else if tok-1 is {}
        if isBlock(tok-1)
            // named or anonymous function
            if tok-2 is () and tok-3 is "function" or tok-4 is "function"
                if function expression   // how to determine is described below         
                    tok is divide
                else                            
                    tok is start of regex literal
            else
                tok is start of regex literal
        else
            tok is divide
    else if tok-1 in punctuator // e.g. ";", "==", ">", "/", "+", etc.
        tok is start of regex literal
    else if tok-1 in keywords and not "this"
        // though some keywords will eventually result in a parse error (eg. debugger, break)
        tok is start of regex literal
    else
        tok is divide


assignOps =  ["=", "+=", "-=", "*=", "/=", "%=", 
                "<<=", ">>=", ">>>=", "&=", "|=", "^=", ","];

binaryOps = ["+", "-", "*", "/", "%","<<", ">>", ">>>", 
                "&", "|", "^","&&", "||", "?", ":", 
                "instanceof", "in",
                "===", "==", ">=", "<=", "<", ">", "!=", "!=="];

unaryOps = ["++", "--", "~", "!", "delete", "void", "typeof", "throw", "new"];

function isBlock(tok)
    if tok-1 is ( or [
        // ... ({...} ...)
        return false
    else if tok-1 is ":" and parent token is {}
        // ... {a:{...} ...}
        return isBlock(the parent {})
    else if tok-1 is one of assignOps unaryOps binaryOps
        // ... + {...}
        // ... typeof {...}
        return false
    else if tok-1 is one of "return" "yield"
        // handle ASI
        if lineNumber(tok) isnt lineNumber(tok-1)
            // return
            // {...}
            return true
        else                                
            // return {...}
            return false
    else if tok-1 is "case"
        // case {...}
        return false
    else                                    
        return true

Depending on context, function name() {} is either a function declaration or a function expression. If it's a function expression then a following / will be interpreted as a divide but if it's a function declaration a following / will be interpreted as a regex. For example,

// a declaration so / is regex
f(); function foo() {} /42/i
vs
// an expression so / is divide
x = function foo() {} /42/i

Looking a token behind the function keyword (ignoring newlines) the following imply it is a function declaration:

; } ) ] ident literal (including regex literal so need to be careful about /)
debugger break continue else

And these imply it is a function expression.

( [ , (assignment operators) (binary operators) (unary operators)
in typeof instanceof new return case delete
throw void

And these will result in a parse error:

do break default finally for function if switch this
try var while with

What should do we do with FutureReservedWords? Treat as identifiers?

Some examples:

// `=` comes first so regex
x = /foo/

// `x` so divide
x = x / foo /

// `(` so regex
x = (/foo/)

x = 10 {/foo/}

do { /foo/ }

// `)` so actually have to look back all the way to `if` to see regex
if (true) /foo/

// `=` before the `()` so divide
x = (a) / foo

// needs to be divide since call
bar (true) /foo/

This means that inside of a macro call we have to follow this context sensitivity for regex literals. So the following reasonable macro isn't allowed:

macro rcond {
    rcond (s:expr) { instance e:expr... } => // ...
}

rcond ("foo") { 
  instance /foo}bar/
}

The "instance" makes the first / be interpreted as divide. So we could just leave this as is and call it a limitation of macros. They need to respect the same structure as JS. This might actually be ok. The above could be done as:

rcond ("foo") {
    instance "foo}bar"
}

(note that if we used case instead of instance then the following / would be interpreted as the start of a regex since case is a keyword, but in general this is a non-obvious rule for macro writers to be aware of)

Not too bad of a change I think. We're already forcing delimiter matching anyway. e.g. the following is bad because of the extra unmatched paren:

macro m {
    case m (e1: expr ( e2:expr) => // ...
}

If we want to allow macros to shadow statements like if we have another complication:

macro if { 
    case if(c:expr) => ...
}
if (true) / foo
// should be divide?!

So I think we are going to treat the reserved keywords (if, while, etc.) as really reserved. Macros can't override their meaning.

Should we disallow FutureReservedWords too (class, enum, etc.)?

Example Code

Some example macro code. In various stages of wrong and impossible.

Syntax-rules flavored macros:

macro swap {
    case swap (x:var, y:var) => {
        tmp = x;
        x = y;
        y = tmp;
    }
}
var a = 2, b = 4;
swap(a, b)


macro unless {
    case unless (condition:expr) { body:expr } => {
        if(!condition) { body } else {}
    }
}

Recursive and refers to previously defined macro:

macro rotate {
    case rotate (a:var) => ;
    case rotate (a:var, b:var, c:var ...) => {
        swap(a, b);
        rotate(b, c ...);
    }
}
var a = 2, b = 4, c = 6, d = 8;
rotate(a, b, c, d)

Syntax-case flavored macros:

macro swap {
    case swap (x:var, y:var) => {
        #'tmp = x;
        #'x = y;
        #'y = tmp;
    }
}

macro thunk {
    case thunk (e: expr) =>
        #'function() { return e; }
}

thunk(2+2)

macro let {
    case let (x:var = v:expr) { body:expr } => {
        #'(function(x) { body })(v)
    }
}

macro or {
    case or () => #'false;
    case or (e:expr) => #'e;
    case or (e1:expr, e2:expr, e3:expr...) => {
        #'let (t = e1) { t ? t : or(e2, e3) }
    }
}

macro cond {
    case cond { default: def:expr } => {
        #'def
    }
    case cond { case condition:expr => val:expr, ... default => def:expr } => {
        #'condition ? val : cond { ... default => def }
    }
}

var type = cond {
    case (x === null) => "null",
    case Array.isArray(x) => "array",
    case (typeof x === "object") => "object",
    default => typeof x
};


macro {
  case let(x_1:var = v_1:expr, ... , x_n:var = v_n:expr) 
    { body: expr } => {
    // ....
  }
}   

macro {
  case cond (e:expr) {
    case pat_1:expr => val_1:expr,
    ...
    case pat_n:expr => val_n:expr,
    default => val:expr
  }
}

Misc:

  • Optional cases (like default above)?
  • primitive forms to fall back on?
  • have the interesting stuff be just another macro?

macro primitive design

forms and names

Possible primitive design that can then build other more expressive macro forms.

macro name {
  function(stx) {
    return stx;
  }
}

The macro name {...} wraps a transformer function. macro is the signal to the expander that what follows is a macro definition. At any macro call matching the macro name the expander passes the read tree to the transfomer and replaces it with the transformer result.

So how do we name macros? Some possibilities...

In the macro form:

macro name {
  function(stx) {
    return stx;
  }
}

(probably the best option)

In function name:

macro {
  function name(stx) {
    return stx;
  }
}

(really don't like this)

As assign:

name = macro {
  function name(stx) {
    return stx;
  }
}

(not too bad but confuses compile time vs runtime semantics?)

As assign with var:

var name = macro {
  function name(stx) {
    return stx;
  }
}

(even worse, what happens if we forget the var? No global object but this is what a programmer is thinking of)

How are these scopped? Block scoping, ignores shadoing via hoisting. So:

function foo(m) {
  m();
  macro m { ... }
  var m
}

Or should we say that macro definitions are hoisted to the top of their scope? And scope follows normal JS semantics (so no blocks?). But then what about new macro forms? eg:

function foo() {
  // def is a "function" form
  macro def {...}

  bar();

  def name(arg1, arg2) {
    // so bar shouldn't be hoisted out of `def name(...) {...}`
    macro bar {...}
  }
}

So we could say that macros are hoisted out of known blocks (for, if, etc.) but not macros or normal function definitions. But then what if we wanted a block-like macro form. eg:

macro until { ... } // very much like while

function foo() {

  foo(); // from def inside while
  bar(); // not from def inside until

  while (x !== 4) {
    macro foo {...}
  }

  until (x === 4) {
    macro bar {...}
  }

}

Not consistant so I think macro definitions must be block scoped always.

primitive functions

To build out syntax objects we need a few primitive functions like in scheme.

syntax :: (Value) -> SyntaxObject
// aka #'

syntax-e :: (SyntaxObject) -> Value
// aka syntax->datum (unwrap all levels)
// and syntax-e (unwrap single level)
make-syntax :: (Val, SyntexObject) -> SyntaxObject
// aka datum->syntax

These names aren't quite right. What is JavaScripty?

syntax
// just overload syntax to include make-syntax power?
// usually will be using the sugar anyway

unwrapSyntax
unwrapSyntaxAll
// single function with flat?
// ugly

var t = thunk(2+2) // function(a) { return 2 + 2;}
// where
macro thunk {
  function(stx) {
    return syntax(
      function(a) { 
        return unwrapSyntax(stx.body); 
      }, stx)
  }
}

// using #{...} as sugar for syntax(...)
macro thunk {
  function(stx) {
    return #{
      function(a) { 
        return unwrapSyntax(stx.body); 
      }
    }
  }
}

macro swap {
  function(stx) {
    var x = unwrapSyntax(stx.body[0])
    var y = unwrapSyntax(stx.body[1])
    
    return #{
      var tmp = x;
      x = y;
      y = tmp;
    }
  }
}

Papers

Useful papers on macros:

  • Macros that work
  • Macros that work together
  • A Theory of Typed Hygienic Macros
  • Macro-By-Example: Deriving Syntactic Transformations from their Specifications
  • Syntactic Abstraction in Scheme
  • SuperC: Parsing All of C by Taming the Preprocessor
  • Composable and Compilable Macros
  • Refining Syntactic Sugar: Tools for Supporting Macro Development
  • Fortifying Macros
  • Composable and Compilable Macros