In [1]:
/* -- 
Run this preamble if you are running scala 2.13 
Thanks to Hansol Yoon for pointing this out.
--*/

interp.repositories() ++= Seq(coursierapi.MavenRepository.of("https://maven.imagej.net/content/repositories/public/"))
import $ivy.`org.scala-lang.modules::scala-parser-combinators:1.1.2`

[32mimport [39m[36m$ivy.$                                                       [39m

# Parser Combinator Library in Scala 

We will briefly touch upon parsing for now just to understand what it is and link up to what we have learned under inductive definitions.

Parsing is the process of taking a textual representation of an program from a human readable format to an internal structure that is machine readable.

<img src="parser-process.png" alt="The parser module translates a human readable program into an abstract syntax tree representation that is machine readable" width="90%">

As you notice, the parser automates the translation between a program we may write in a text file as seen before

~~~
var x = 5;
var y = 15;
var z = 25 + y - x;
x = y + z + exp(x - y);
while ( y <= 15) 
   begin
      y = y - x ;
      if (x <= 0)
      begin
         x = -x; 
      end
   end
return (y - x)
~~~

into an abstract syntax tree that follows the inductive definition we put out in the previous lecture:

~~~
program: Program = Program(List(VarDecl(x,Const(5.0)), VarDecl(y,Const(15.0)), VarDecl(z,Plus(Const(25.0),Minus(Ident(y),Ident(x))))),List(AssignStmt(x,Plus(Ident(y),Plus(Ident(z),Exp(Minus(Ident(x),Ident(y)))))), WhileStmt(Leq(Ident(y),Const(15.0)),List(AssignStmt(y,Minus(Ident(y),Ident(x))), IfThenElseStmt(Leq(Ident(x),Const(0.0)),List(AssignStmt(x,Minus(Const(0.0),Ident(x)))),List())))),Minus(Ident(y),Ident(x)))
~~~


## A Quick Rundown on Parsing

Parsing is a large topic by itself and arguably one of the most successful application of theoretical CS into practice. Parsers are all around us today, taking inputs that we can read into structures that machines can understand. The complexity of parsers can be enormous: we can have simple reg-ex based _pattern matchers_ that run inside our packet filters on the internet to complex _natural language parsers_ that take in english sentences and parse them to extract structure and aid machine understanding. Programming language parsers are somewhat in between: they are more complicated than simple regex pattern matchers but not as complicated as a natural language with all its inherent ambiguities and complexities.

### How to build parsers? 

Parsers are built using grammars. These are very similar to the grammars that we say for inductive definitions but they are lower level grammars. Let us contrast the two types of grammars.

#### Grammar for Arithmetic Expression

Here is the grammar for parsing arithmetic expressions:

$$\begin{array}{rcll}
\mathbf{expr} & \rightarrow & \mathbf{term} ``-'' \mathbf{expr} \\ 
              & | &  \mathbf{term} ``+'' \mathbf{expr} \\
              & | & ``-'' \mathbf{expr}\\
              & | &  \mathbf{term} \\[5pt]
 \mathbf{term} & \rightarrow & \mathbf{leaf} ``*'' \mathbf{term} \\
 & | & \mathbf{leaf} ``/'' \mathbf{term} \\
 & | & \mathbf{leaf} \\[5pt]
 \mathbf{leaf} & \rightarrow & ``('' \mathbf{expr} ``)'' \\
 & | & \mathbf{fn} ``('' \mathbf{expr} ``)'' \\
 & | & \mathbf{identifier} \\
 & | & \mathbf{constant} \\[5pt]
 \mathbf{fn} & \rightarrow & ``sin'' | ``cos'' | ``log'' | ``exp''  \\[5pt]
 \mathbf{identifier} & \rightarrow & [a-zA-Z0-9\_]+ & \text{one or more characters consisting of a-z, A-Z, 0-9 and \_ } \\
 \mathbf{constant} & \rightarrow & [1-9][0-9]*(``.''[0-9]*)? & \text{look up regular expressions to understand what we just wrote down here}.\\
\end{array}$$

This grammar accepts strings such as `x + y - z * cos(z) * log(y + exp(z - x))`

Let us recall the grammar we wrote down for the _abstract syntax trees_ or _inductive definition_ of expressions

$$\begin{array}{rcc}
\textbf{Expr} & \rightarrow & Const(\textbf{Integer}) \\
& |  & Ident(\textbf{Identifier}) \\
& | & Plus( \textbf{Expr}, \textbf{Expr}) \\
& | & Minus( \textbf{Expr}, \textbf{Expr}) \\
& | & Mult(\textbf{Expr}, \textbf{Expr}) \\
& | & Div(\textbf{Expr}, \textbf{Expr}) \\
& | & Log(\textbf{Expr}) \\
& | & Exp(\textbf{Expr}) \\
& | & Sine(\textbf{Expr}) \\
& | & Cosine(\textbf{Expr}) \\\\
\textbf{Integer} & \rightarrow & \cdots\ |\  -2\ |\ -1\ |\ 0\ |\ 1\ |\ 2\ |\ \cdots \\
\textbf{Identifier} & \rightarrow & [a-z\ A-Z][a-z\ A-Z\ 0-9\ \_]*
\end{array}$$

First you notice that this grammar just has __Expr__ whereas the previous grammar has __expr__, __term__, __leaf__ and so on. Why? 

The answer happens to do with arithmetic precedence.

When humans write expressions such as ` x +  z * w ` machines have a problem since they can read it two ways:
` (x+z) * w ` or ` x + (z * w) `. These now yield two different abstract trees:
` Mult( Plus(x,z), w)` or `Plus(x, Mult(z, w))`. They yield completely different answers and represent different mathematical expressions. However, we have the notion of operator _precendence_ to interpret what happens here.
We know that `*` has higher precendence than `+`. Therefore, the second interpretation as 
`Plus(x, Mult(z, w))` is correct while the other interpretation is wrong.

Scala has a _parser combinator_ that makes it quite easy to implement parsers. We will implement a parser for the language we encountered earlier. It is a convenient way to translate handwritten programs into expressions whenever we examine this language.


In [3]:
import scala.util.parsing.combinator._

sealed trait Expr
case class Const(f: Float) extends Expr 
// 1. We cheated a bit and allowed all floating point numbers
// Also, this deviates from the grammar
case class Ident(s: String) extends Expr
// 2. We allow any string to be an identifier for now instead of the regular expression shown in the grammar.
case class Plus(e1: Expr, e2: Expr ) extends Expr
case class Minus(e1: Expr, e2: Expr) extends Expr
case class Mult(e1: Expr, e2: Expr) extends Expr
case class Div(e1: Expr, e2: Expr) extends Expr
case class Negate(e: Expr) extends Expr
case class Log(e: Expr) extends Expr
case class Exp(e: Expr) extends Expr
case class Sine(e: Expr) extends Expr
case class Cosine(e: Expr) extends Expr

[32mimport [39m[36mscala.util.parsing.combinator._

[39m
defined [32mtrait[39m [36mExpr[39m
defined [32mclass[39m [36mConst[39m
defined [32mclass[39m [36mIdent[39m
defined [32mclass[39m [36mPlus[39m
defined [32mclass[39m [36mMinus[39m
defined [32mclass[39m [36mMult[39m
defined [32mclass[39m [36mDiv[39m
defined [32mclass[39m [36mNegate[39m
defined [32mclass[39m [36mLog[39m
defined [32mclass[39m [36mExp[39m
defined [32mclass[39m [36mSine[39m
defined [32mclass[39m [36mCosine[39m

In [4]:
class ExprParser extends RegexParsers {
    def floatingPointNumber: Parser[String] = {
        """-?(\d+(\.\d*)?|\d*\.\d+)([eE][+-]?\d+)?[fFdD]?""".r
    }
    def identifier: Parser[String] = {
        """[a-zA-Z0-9_]+""".r
    }
    
    def mathOp: Parser[String] = {
        "sin" | "cos" | "exp" | "log"
    }
    
    def expr: Parser[Expr] = {
        val opt1 = term ~ ("-" ~> expr) ^^ { /*Actions */
            case s1~s2 => Minus(s1, s2) 
        }
        val opt2 = term ~ ("+" ~> expr) ^^ {
            case s1 ~ s2 => Plus(s1, s2)
        }
        
        val opt3 = term 
        
        
        
        opt1 | opt2 | opt3 
    }
    
    def term: Parser[Expr] = {
        val opt1 = leaf ~ ("*" ~> term) ^^ {
            case s1 ~ s2 => Mult(s1, s2)
        }
        val opt2 = leaf ~ ("/" ~> term) ^^ {
            case s1 ~ s2 => Div(s1, s2)
        }
        
        val opt4 = "-" ~> term ^^ {
            s => Negate(s)
        }
        
        val opt3 = leaf 
        
        opt4 | opt1 | opt2 | opt3
    }
    
    def leaf: Parser[Expr] = {
        val opt1 = floatingPointNumber ^^ {
            s => Const(s.toFloat)
        }
        val opt2 = identifier ^^ {
            s => Ident(s)
        }
        
        val opt3 = mathOp ~ ("(" ~> expr ) <~ ")" ^^ {
            case "sin" ~ e => Sine(e)
            case "cos" ~ e => Cosine(e)
            case "log" ~ e => Log(e)
            case "exp" ~ e => Exp(e)
        }
        
        val opt4 = "(" ~> (expr <~ ")") 
        
        opt1 |  opt3 | opt4 | opt2
        
        
    }
    
    
}


defined [32mclass[39m [36mExprParser[39m

In [66]:
val testExpr1 = """ x + y - z * cos(z) * log(y + exp(z - x)) """

[36mtestExpr1[39m: [32mString[39m = [32m" x + y - z * cos(z) * log(y + exp(z - x)) "[39m

In [67]:
val parsr = new ExprParser()

[36mparsr[39m: [32mExprParser[39m = $sess.cmd64Wrapper$Helper$ExprParser@5675220d

In [68]:
val e1 = parsr.parse(parsr.expr, testExpr1)

[36me1[39m: [32mparsr[39m.[32mParseResult[39m[[32mExpr[39m] = [1.42] parsed: Plus(Ident(x),Minus(Ident(y),Mult(Ident(z),Mult(Cosine(Ident(z)),Log(Plus(Ident(y),Exp(Minus(Ident(z),Ident(x)))))))))

In [69]:
val testExpr2 = "- 2.5 * x - 3.75 * y"

[36mtestExpr2[39m: [32mString[39m = [32m"- 2.5 * x - 3.75 * y"[39m

In [70]:
val e2 = parsr.parse(parsr.expr, testExpr2)

[36me2[39m: [32mparsr[39m.[32mParseResult[39m[[32mExpr[39m] = [1.21] parsed: Minus(Negate(Mult(Const(2.5),Ident(x))),Mult(Const(3.75),Ident(y)))

In [71]:
val testExpr3 = " 7.75 * cos(x) - log( y + sin(z)* sin(z) - 0.0255 * x)"

[36mtestExpr3[39m: [32mString[39m = [32m" 7.75 * cos(x) - log( y + sin(z)* sin(z) - 0.0255 * x)"[39m

In [72]:
val e3 = parsr.parse(parsr.expr, testExpr3)

[36me3[39m: [32mparsr[39m.[32mParseResult[39m[[32mExpr[39m] = [1.55] parsed: Minus(Mult(Const(7.75),Cosine(Ident(x))),Log(Plus(Ident(y),Minus(Mult(Sine(Ident(z)),Sine(Ident(z))),Mult(Const(0.0255),Ident(x))))))

In [73]:
sealed trait CondExpr
case object ConstTrue extends CondExpr
case object ConstFalse extends CondExpr
case class Geq(e1: Expr, e2: Expr) extends CondExpr
case class Leq(e1: Expr, e2: Expr) extends CondExpr
case class Eq(e1: Expr, e2: Expr) extends CondExpr
case class And(c1: CondExpr, c2: CondExpr) extends CondExpr
case class Or(c1: CondExpr, c2: CondExpr) extends CondExpr
case class Not(c: CondExpr) extends CondExpr

defined [32mtrait[39m [36mCondExpr[39m
defined [32mobject[39m [36mConstTrue[39m
defined [32mobject[39m [36mConstFalse[39m
defined [32mclass[39m [36mGeq[39m
defined [32mclass[39m [36mLeq[39m
defined [32mclass[39m [36mEq[39m
defined [32mclass[39m [36mAnd[39m
defined [32mclass[39m [36mOr[39m
defined [32mclass[39m [36mNot[39m

In [78]:
class CondExprParser extends ExprParser {
    def constBool: Parser[CondExpr] = {
        ( "true"^^{ s => ConstTrue } ) |
        ( "false"^^{s => ConstFalse} )
    }
    
    def relOp : Parser[String] = {
        ">=" | "<=" | "=="
    }
    
    def condExpr: Parser[CondExpr] = {
        ( (condClause <~ "||") ~ condExpr ^^ {
            case t1 ~ t2 => Or(t1, t2)
        }
         ) | condClause
        
    } 
    
    def condClause: Parser[CondExpr] = {
        ( (condLit <~ "&&") ~ condClause ^^ {
            case t1 ~ t2 => And(t1, t2)
        }
         ) | condLit
    }
    
    def condLit: Parser[CondExpr] = {
        val opt1 = ("!" ~> condLit) ^^ {Not(_)}
        val opt2 = ("(" ~> condExpr) <~ ")"
        val opt3 = expr ~ (">="|"<="|"==") ~ expr ^^ {
            case e1 ~">="~ e2 => Geq(e1, e2)
            case e1 ~"<="~ e2 => Leq(e1, e2)
            case e1 ~"=="~ e2 => Eq(e1, e2)
         }
        val opt4 = constBool 
        
        opt2 | opt1 | opt3 | opt4
    }
}

defined [32mclass[39m [36mCondExprParser[39m

In [79]:
val conditionParser = new CondExprParser()

[36mconditionParser[39m: [32mCondExprParser[39m = $sess.cmd77Wrapper$Helper$CondExprParser@3ec3784

In [80]:
val str1 = "x - 2*y + log(z + x - y) >= -2.5 - 3.7 * x "

[36mstr1[39m: [32mString[39m = [32m"x - 2*y + log(z + x - y) >= -2.5 - 3.7 * x "[39m

In [81]:
conditionParser.parse(conditionParser.condExpr, str1)

[36mres80[39m: [32mconditionParser[39m.[32mParseResult[39m[[32mCondExpr[39m] = [1.43] parsed: Geq(Minus(Ident(x),Plus(Mult(Const(2.0),Ident(y)),Log(Plus(Ident(z),Minus(Ident(x),Ident(y)))))),Minus(Negate(Const(2.5)),Mult(Const(3.7),Ident(x))))

In [82]:
val str2 = "x >= y || x == y && x <= 2.5 * cos(y - log(z))"

[36mstr2[39m: [32mString[39m = [32m"x >= y || x == y && x <= 2.5 * cos(y - log(z))"[39m

In [83]:
conditionParser.parse(conditionParser.condExpr, str2)

[36mres82[39m: [32mconditionParser[39m.[32mParseResult[39m[[32mCondExpr[39m] = [1.47] parsed: Or(Geq(Ident(x),Ident(y)),And(Eq(Ident(x),Ident(y)),Leq(Ident(x),Mult(Const(2.5),Cosine(Minus(Ident(y),Log(Ident(z))))))))

In [84]:
val str3 = "! x <= y - 5 || y + z >= 20.3 && ! x <= y"

[36mstr3[39m: [32mString[39m = [32m"! x <= y - 5 || y + z >= 20.3 && ! x <= y"[39m

In [85]:
conditionParser.parse(conditionParser.condExpr, str3)

[36mres84[39m: [32mconditionParser[39m.[32mParseResult[39m[[32mCondExpr[39m] = [1.42] parsed: Or(Not(Leq(Ident(x),Minus(Ident(y),Const(5.0)))),And(Geq(Plus(Ident(y),Ident(z)),Const(20.3)),Not(Leq(Ident(x),Ident(y)))))

In [91]:
val str4 = "(true || ! x == y && x - y >= z - log(y) && l <= w)"

[36mstr4[39m: [32mString[39m = [32m"(true || ! x == y && x - y >= z - log(y) && l <= w)"[39m

In [115]:
val str5 = "( y <= 15.0 || z >= 10.0)"

[36mstr5[39m: [32mString[39m = [32m"( y <= 15.0 || z >= 10.0)"[39m

In [116]:
conditionParser.parse(conditionParser.condExpr, str5)

[36mres115[39m: [32mconditionParser[39m.[32mParseResult[39m[[32mCondExpr[39m] = [1.26] parsed: Or(Leq(Ident(y),Const(15.0)),Geq(Ident(z),Const(10.0)))

In [92]:
conditionParser.parse(conditionParser.condExpr, str4)

[36mres91[39m: [32mconditionParser[39m.[32mParseResult[39m[[32mCondExpr[39m] = [1.52] parsed: Or(ConstTrue,And(Not(Eq(Ident(x),Ident(y))),And(Geq(Minus(Ident(x),Ident(y)),Minus(Ident(z),Log(Ident(y)))),Leq(Ident(l),Ident(w)))))

In [90]:
sealed trait Declaration
sealed trait Statement
case class Program(decls: List[Declaration], stmts: List[Statement], returnAtEnd: Expr) // We stripped the ReturnStmt tag since it is redundant
case class VarDecl(identifier: String, rhsExpr: Expr) extends Declaration
case class AssignStmt(identifier: String, rhsExpr: Expr) extends Statement
case class WhileStmt(cond: CondExpr, stmts: List[Statement]) extends Statement
case class IfThenElseStmt(cond: CondExpr, stmtsThen: List[Statement], stmtsElse: List[Statement]) extends Statement
case class ReturnStmt(retExpr: Expr) extends Statement

defined [32mtrait[39m [36mDeclaration[39m
defined [32mtrait[39m [36mStatement[39m
defined [32mclass[39m [36mProgram[39m
defined [32mclass[39m [36mVarDecl[39m
defined [32mclass[39m [36mAssignStmt[39m
defined [32mclass[39m [36mWhileStmt[39m
defined [32mclass[39m [36mIfThenElseStmt[39m
defined [32mclass[39m [36mReturnStmt[39m

In [105]:
class ProgramParser extends CondExprParser {
    
    def program: Parser[Program] = rep(declaration)~rep(statement) ^^ {
        case l1~l2 => {
            val l2Last = l2(l2.size -1 )
            val l2Rest = l2.slice(0, l2.size -1 )
            l2Last match {
                case ReturnStmt(rExpr) => Program(l1, l2Rest, rExpr)
                case _ => throw new IllegalArgumentException( s"Error: program must terminate in a return statement. $l2Last")
            }
        }
    }
    
    def declaration: Parser[VarDecl] = ("var" ~> identifier) ~ (":=" ~> expr) ^^ {
        case id ~ e => VarDecl(id, e)
    }
    
    def statement: Parser[Statement] = assignStatement | whileStatement | ifThenElseStatement | returnStatement
    
    def assignStatement: Parser[Statement] = (identifier <~ ":=") ~ expr ^^ { case id ~ e => AssignStmt(id, e) }
    
    def whileStatement: Parser[Statement] = "while" ~> (condExpr ~ stmtBlock) ^^ {case c ~ blk => WhileStmt(c, blk)}
    
    def ifThenElseStatement: Parser[Statement] = ("if"~> condExpr) ~ ("then" ~> stmtBlock) ~ opt("else" ~> stmtBlock) ^^ { 
        case c ~ st ~ None => IfThenElseStmt(c, st, List())
        case c ~ st1 ~ Some(st2) => IfThenElseStmt(c, st1, st2)
    }
    
    def returnStatement: Parser[Statement] = "return" ~> expr ^^ {
        ReturnStmt(_)
    }
    
    def stmtBlock: Parser[List[Statement]] = ("begin" ~> rep(statement)) <~ "end" 
    
    def parseString(str: String): Program = {
        parse(program, str)  match {
            case Success(mt, _) => mt
            case Failure(msg, _) => throw new IllegalArgumentException(msg)
            case Error(msg, _) => throw new IllegalArgumentException(msg)
        }
    }
}

defined [32mclass[39m [36mProgramParser[39m

In [122]:
val program1 = """
var x := 5
var y := 15
var z := 25 + y - x
x :=  y + z + exp(x - y)
while ( y <= 15.0 || z >= 10.0) 
   begin
      y := y - x 
      if (x <= 0)
      then begin
         x := -x
      end
   end
return (y - x)
"""

[36mprogram1[39m: [32mString[39m = [32m"""

var x := 5
var y := 15
var z := 25 + y - x
x :=  y + z + exp(x - y)
while ( y <= 15.0 || z >= 10.0) 
   begin
      y := y - x 
      if (x <= 0)
      then begin
         x := -x
[33m...[39m

In [120]:
val progParser = new ProgramParser()

[36mprogParser[39m: [32mProgramParser[39m = $sess.cmd104Wrapper$Helper$ProgramParser@57af64be

In [123]:
progParser.parseString(program1)

[36mres122[39m: [32mProgram[39m = Program(List(VarDecl(x,Const(5.0)), VarDecl(y,Const(15.0)), VarDecl(z,Plus(Const(25.0),Minus(Ident(y),Ident(x))))),List(AssignStmt(x,Plus(Ident(y),Plus(Ident(z),Exp(Minus(Ident(x),Ident(y)))))), WhileStmt(Or(Leq(Ident(y),Const(15.0)),Geq(Ident(z),Const(10.0))),List(AssignStmt(y,Minus(Ident(y),Ident(x))), IfThenElseStmt(Leq(Ident(x),Const(0.0)),List(AssignStmt(x,Negate(Ident(x)))),List())))),Minus(Ident(y),Ident(x)))

In [124]:
progParser.parse(progParser.program, program1)

[36mres123[39m: [32mprogParser[39m.[32mParseResult[39m[[32mProgram[39m] = [14.15] parsed: Program(List(VarDecl(x,Const(5.0)), VarDecl(y,Const(15.0)), VarDecl(z,Plus(Const(25.0),Minus(Ident(y),Ident(x))))),List(AssignStmt(x,Plus(Ident(y),Plus(Ident(z),Exp(Minus(Ident(x),Ident(y)))))), WhileStmt(Or(Leq(Ident(y),Const(15.0)),Geq(Ident(z),Const(10.0))),List(AssignStmt(y,Minus(Ident(y),Ident(x))), IfThenElseStmt(Leq(Ident(x),Const(0.0)),List(AssignStmt(x,Negate(Ident(x)))),List())))),Minus(Ident(y),Ident(x)))

In [127]:
val whileStmt = """while ( y <= 15.0 || z >= 10.0) 
   begin
      y := y - x 
      if (x <= 0) then 
      begin
         x := -x
      end
   end"""

[36mwhileStmt[39m: [32mString[39m = [32m"""
while ( y <= 15.0 || z >= 10.0) 
   begin
      y := y - x 
      if (x <= 0) then 
      begin
         x := -x
      end
   end
"""[39m

In [128]:
progParser.parse(progParser.whileStatement, whileStmt)

[36mres127[39m: [32mprogParser[39m.[32mParseResult[39m[[32mStatement[39m] = [8.7] parsed: WhileStmt(Or(Leq(Ident(y),Const(15.0)),Geq(Ident(z),Const(10.0))),List(AssignStmt(y,Minus(Ident(y),Ident(x))), IfThenElseStmt(Leq(Ident(x),Const(0.0)),List(AssignStmt(x,Negate(Ident(x)))),List())))