Formalize the adhoc parsing in jsexpr #21

aaronsheldon · 2018-08-29T21:11:48Z

I'm opening this for discussion, rather than any sort of request.

In reviewing the code of jsexpr, I have found that it is a mixed bag of catches, matches, and string escapes, which in the long run will be brittle and hard to maintain. In particular the processing of jsexpr relies heavily on waterfall knock out (i.e to reach c, cannot have been a or b). This works with small easily orthogonalized cases, but quickly becomes overwhelming with a large number of cases.

I think this could be addressed by replacing the use of the match macro with overloaded functions dispatched on Val()'ed symbols. The parser would extract the head symbol from the expression and then dispatch the found function, splatting the arguments vector of the expression to the function. Of course expressions not handled by jsexpr should raise an error.

The individual functions in would then be overloaded on the type patterns of the arguments vector. The caveat being: catching the method not found error, and adding to it the context that the particular Julia expression cannot be cross-parsed into JavaScript. This can be done with very little cost by calling hasmethod() on the appropriate signature and then raising an error on false.

Overall the refactoring would formalize the following strategy:

Search the Julia AST for expressions that can be handled, and raise an error on everything that cannot be handled (currently mostly in jsexpr)
Map remaining Julia AST into a JavaScript AST (currently sort of what struct F() contains)
Deparse the JavaScript AST into text (currently sort of what flatten/simplify do)

The point is to isolate the boundary between the code you can explicitly control, the JS AST, and the code you cannot, the Julia AST.

The strategy of method overloading the parser can also be used to to address issue #2, because the args of the ref sub-expression of a[]=b is just Symbol, where as the args of the ref sub-expression of a[2]=b are a symbol and another value, in this case an Int64. WebIO would then provide the specialized overload of the ref method to catch the custom WebIO syntax.

Thoughts?

e.g.

# Generic placeholder for the production rules for parsing Julia AST to
# JavaScript JS. Expects a Val'ed head Symbol of the expression, and
# overloads on handled signatures of the args vector of the expression.
# The return should always be a JSAST. Delete in actual implementation.
function crawl(head::Val{juliasymbol}, args...)::JSAST end

# Generic placeholder for overriding function calls. Delete in actual implementation.
function crawl(c::Val{:call}, head::Val{juliasymbol}, args...)::JAST end

# Generic placeholder for overriding marco calls. Delete in actual implementation.
function crawl(m::Val{:macro}, head::Val{juliasymbol}, args...)::JAST end

# Generic placeholder for the production rules for parsing JavaScript AST
# to String's. Expects a Val'ed head Symbol of the JS expression, and
# the a vector JSAST sub-expressions. The return should always be a String.
# Delete in actual implementation.
function deparse(head::Val{jssymbol}, args::Vector{JSAST})::String end

# Note, any string can be turned into a symbol with a call to the Symbol() 
# constructor, so jshead really can store anything. Likewise symbols can be
# returned to strings through a call to the String() constructor.
struct JSAST
    jshead::Symbol
    jsbody::Vector{JSAST}
    function JSAST(
        h::Symbol,
        b::Union{Vector{JSAST}, Vararg{JSAST, N}} = Vector{JSAST}()
    ) where N
        if isa(b, Vector{JSAST})
            return new(h, b)
        else
            return new(h, [b...])
        end
    end
end

# Dumps the raw JS string literal into the argument of the parent 
# expression that the macro was called from. The common use 
# cases are for the macro to be called as the RHS of an assignment,
# or an interpolation into another literal.
macro js(ex)
    return Expr(:call, :JSString, deparse(crawl(ex)))
end

# The expectation is that each dispatched crawl function returns a JS AST
# by calling the crawl-function recursively on deeper expressions.
# Potentially unsafe catch all for terminals.
function crawl(ex::Expr)::JSAST

    # Recurse into expressions
    if hasmethod(crawl, typeof((Val(ex.head), ex.args...)))
        return crawl(Val(ex.head), ex.args...)::JSAST

    # Bail and explain
    else
        error("Expression $ex not supported.")
    end
end

# Protect string terminals
function crawl(ex::String)
    return JSAST(Symbol(string("\"", ex, "\"")))
end

# No coercion on symbols
function crawl(ex::Symbol)
    return JSAST(ex)
end

# All other terminals
function crawl(ex::T)::JSAST

    # Push terminals that have native symbol methods, maybe unsafe
    if hasmethod(Symbol, Tuple{T})
        return JSAST(Symbol(ex))

    # Push terminals, when all else fails try interpolation. This will be a source of bugs
    # because there are no guarantees the string representation will be sensible.
    elseif hasmethod(string, Tuple{T})
        return JSAST(Symbol(string(ex)))

    # Bail and explain
    else
        error(
            "The type $T cannot be a terminal node because " *
            "neither a string method nor a Symbol method were found."
        )
    end
end

# The expectation is that each dispatched deparse function returns a bare JSString
# literal, formed by appropriate ordering and concatenation of the output of 
# recursive calls to the deparse-function.
function deparse(ex::JSAST)::String
    if length(ex.body) == 0
        return string(ex.head)
    else
        return deparse(Val(ex.head), ex.jsbody)::String
    end
end

# Allow override-able parsing of calls
function crawl(h::Val{:call}, m::Symbol, b...)::JSAST

    # Check for overrides
    if hasmethod(crawl, typeof((Val(:call), Val(m), b...)))
        return crawl(Val(:call), Val(m), b...)::JSAST

    # Otherwise pass as a call
    else
        return JSAST(:call, [crawl(m), crawl.(b)...])
    end
end

# Allow override-able parsing of macros
function crawl(h::Val{:macrocall}, m::Symbol, l::LineNumberNode, b...)::JSAST

    # Check for overrides
    if hasmethod(crawl, typeof((Val(m), b...)))
        return crawl(Val(:macrocall), Val(m), b...)::JSAST

    # Otherwise expand and crawl the current module, but let crawl determine
    # what to do with subsequent marcos
    else
        return crawl(macroexpand(@__MODULE__, Expr(:macrocall, m, l, b...); recursive=false))::JSAST
    end
end

# Example crawl and deparse
crawl(h::Val{:ref}, lhs, rhs) = JSAST(:jsref, [crawl(lhs), crawl(rhs)])
deparse(h::Val(:jsref), b::Vector{JSAST}) = string(
    deparse(b[1])::String,
    "[",
    deparse(b[2])::String,
    "]"
)

# Assignmet
crawl(h::Val{:(=)}, lhs, rhs) = JSAST(:jsref, [crawl(lhs), crawl(rhs)])
deparse(h::Val{:(=)}, b::Vector{JSAST}) = string(deparse(lhs), "=", deparse(rhs))

# Objects
function crawl(h::Val{:tuple}, b::Vararg{Expr, N})::JSAST where N
    js = JSAST(:object)
    for ex in b
        if ex.head == :(=)
            push!(js.body, crawl(ex))
        else
            error("Expression $ex is not a named argument in the tuple.")
        end
    end
    return js
end
function deparse(h::Val{:object}, b::Vector{JSAST})::String
    if length(b) > 0
        js = fill(",", 2*length(b) - 1)
        js[1:2:2*length(b) - 1] = deparse.(b)
        js = string("{", js..., "}")
    else
        js = "{}"
    end
    return js
end

# Arrays
function crawl(h::Val{:vec}, b...)::JSAST where N
    return JSAST(:vec, crawl.(b))
end
function deparse(h::Val{:vec}, b::Vector{JSAST})::String
    if length(b) > 0
        js = fill(",", 2*length(b) - 1)
        js[1:2:2*length(b) - 1] = deparse.(b)
        js = string("[", js..., "]")
    else
        js = "[]"
    end
    return js
end

# Escape JS macros, treat as terminals
function crawlmacro(h::Val{Symbol("@js-str")}, b::String)
    return crawl(b)::JSAST
end

# Escape new macros
function crawlmacro(h::Val{Symbol("@new")}, b...)
    return JSAST(:new, crawl.(b))
end
function deparse(h::Val{:new}, b::Vector{JSAST})
    return string("new", " ", deparse(b))
end

# Escape var macros
function crawlmacro(h::Val{Symbol("@var")}, b...)
    return JSAST(:var, crawl.(b))
end
function deparse(h::Val{:var}, b::Vector{JSAST})
    return string("var", " ", deparse(b))
end

In the example the crawl functions are responsible for identifying Julia AST that can be parsed, stripping out the parts that can be ignored, and reorganizing the meaningful parts into a JS AST representation, recursively. The deparse functions are then responsible for taking lists of JS AST and turning them into JS strings, presumably through recursive concatenation.

This means that in the WebIO module we could have, for example

import JSExpr.crawl
import JSExpr.deparse

# Override the intrinsic JSExpr crawl to check for single argument ref's.
crawl(h::Val{:ref}, b) = JSAST(:WebIOref, crawl(b))

# Override the intrinsic JSExpr deparse to check for single argument
# ref's. Call the intrinsic method when there are multiple arguments
function deparse(h::Val{:(=)}, b::Vector{JSAST})::String

    # The left and right hand sides are empty ref's, short circuit to a 
    # chained set-get.
    if b[1].head == :WebIOref && b[2].head == :WebIOref
        return string(
            "WebIO.setval(", 
            deparse(Val(b[1].body[1].head), b[1].body[1].body)::String,
            ", WebIO.getval(",
            deparse(Val(b[2].body[1].head), b[2].body[1].body)::String, 
            "))"
        )

    # Only the left hand side is an empty ref, call set
    elseif b[1].head == :WebIOref
        return string(
            "WebIO.setval(",
            deparse(Val(b[1].body[1].head), b[1].body[1].body)::String,
            " , ",
            deparse(Val(b[2].head), b[2].body)::String,
            ")"
        )

    # Only the right hand side is an empty ref, call get
    elseif b[2].head == :WebIOref
        return string(
            deparse(Val(b[1].head), b[1].body)::String,
            " = WebIO.getval(",
            deparse(Val(b[2].body[1].head), b[2].body[1].body)::String,
            ")"
        )

    # Otherwise call the intrinsic method
    else
        return JSExpr.deparse(h, b)::String
    end
end

*because we expect crawl and deparse to be widely overloaded, we need a certain amount of paranoia about the return types.

This should also facilitate testing and development, as you could create the module with any empty dictionary, and then push and pop prototype functions to test the output in the REPL.

I think this also falls under the general rule of "don't build parsers with regular expressions".

The text was updated successfully, but these errors were encountered:

twavv · 2019-04-01T21:02:24Z

See #25. I took some artistic liberty.

twavv · 2019-10-27T19:26:36Z

Implemented by #31

twavv closed this as completed Oct 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize the adhoc parsing in jsexpr #21

Formalize the adhoc parsing in jsexpr #21

aaronsheldon commented Aug 29, 2018 •

edited

Loading

twavv commented Apr 1, 2019

twavv commented Oct 27, 2019

Formalize the adhoc parsing in jsexpr #21

Formalize the adhoc parsing in jsexpr #21

Comments

aaronsheldon commented Aug 29, 2018 • edited Loading

twavv commented Apr 1, 2019

twavv commented Oct 27, 2019

aaronsheldon commented Aug 29, 2018 •

edited

Loading