Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

解析器 #2

Closed
duangsuse opened this issue May 7, 2018 · 20 comments
Closed

解析器 #2

duangsuse opened this issue May 7, 2018 · 20 comments
Assignees
Labels
enhancement New feature or request stupid project stupid WIP Work In Progress 我太菜了,被关了起来 垃圾 duangsuse

Comments

@duangsuse
Copy link
Collaborator

https://github.com/harc/ohm

https://www.lua.org/manual/5.3/manual.html#3.4.8

和作为临时草稿的 DNF

// Complete Lite Desugared Syntax(DNF 范式, 是 duangsuse 设计的一种即使没有规则你们也能看懂的无上下文词条流模式文法描述)

// Lite 的一个比较特殊的地方在于使用缩进语义, 我也是为了好看... 不过如果使用递归下降法, 解析不是问题耶
// Lite Desugared 不包含特殊的字符串语法糖,也不包含缩进语义,标准 Lite 语法经过 Lite Lexer 和 Flatter 处理后可以交由此 JavaScript Parser 解析序列化 AST
// 强制你使用 duangsuse 喜欢的 2 空格缩进代码风格, 语言本身类似 Ruby(Ruby 岛国语言好耶)
// 有趣的语法: ![str1 str2 str3].each { |e| puts e } if a == 1 & b === :c

// math -> expr Maybe( '+' OR '-' OR '*' OR '/' OR '**' OR '%' OR '<' OR '<=' OR '>' OR '>=' OR '&' OR '|' OR '==' OR '===' OR '!=' OR '<<' OR 'and' OR 'or' ) expr
// binary -> math | cast | dot | in | square | arrow | range
// range -> expr '..' expr
// paren_expression -> '(' expression ')'
// expression -> binary | list | table | value | incDec | not | negative | call | identifier | index | bracket_block | do_block | paren_expression
// statement -> def | for | scope | while | when | if | simple_statement
// simple_statement -> break | next | import | require | return | assignment | indexLet | square | arrow | dot | incDec | call Maybe( IF expression )
// block -> Ary( statement NEWLINE ) END
// for -> FOR identifier IN expression NEWLINE block
// while -> WHILE expression NEWLINE block
// scope -> SCOPE Maybe( label ) NEWLINE block
// when -> WHEN expression NEWLINE Ary( expression Maybe( expression ) NEWLINE block ) END | when_is
// when_is -> WHEN expression NEWLINE Ary( IS Ary( expression OR ) NEWLINE block ) END
// indexLet -> expression '[' expression ']' '=' expression
// index -> expression '[' expression ']'
// if -> IF expression NEWLINE block Maybe( Ary( ELIF expression NEWLINE block ) ) Maybe( ELSE NEWLINE block )
// identifier -> Maybe( AT ) label
// def -> DEF identifier Maybe( nameList ) NEWLINE block
// call -> identifier Maybe( CALL OR exprList )
// assignment -> identifier '=' expression
// not -> '!' expression
// negative -> '-' expression
// incDec -> identifier Maybe( '++' OR '--' )
// return -> RETURN expression
// require -> REQUIRE tokensAsString()
// next -> NEXT
// break -> BREAK
// import -> IMPORT tokensAsString()
// value -> TRUE | FALSE | NIL | Number | string
// string -> '"' data '"' | stringB | stringC
// stringB -> "'" data "'"
// stringC -> ':' data
// list -> Maybe( '!' ) '[' exprList ']'
// table -> '{' kvList '}'
// kvList -> Ary( label ':' expression Maybe( ',' OR NEWLINE ) )
// arrow -> expression '->' label expression
// square -> expression '::' label
// in -> expression IN expression
// dot -> expression '.' label Maybe( '()' OR exprList )
// cast -> expression AS label
// exprList -> Ary( expr Maybe( ' ' OR ',' ) )
// nameList -> Maybe( '(' ) Ary( name Maybe( ',' OR ' ' ) ) Maybe( ')' ) | nameListB
// nameListB -> '|' Ary( name Maybe( ',' OR ' ' ) ) '|'
// bracket_block -> '{' Maybe( nameListB ) Ary( ':' simple_statement ) '}'
// do_block -> DO Maybe ( nameListB ) block
@duangsuse duangsuse added the WIP Work In Progress label May 7, 2018
@duangsuse duangsuse self-assigned this May 7, 2018
@duangsuse
Copy link
Collaborator Author

duangsuse commented May 7, 2018

Maybe( '+' OR '-' OR '*' OR '/' OR '**' OR '%' OR '<' OR '<=' OR '>' OR '>=' OR '&' OR '|' OR '==' OR '===' OR '!=' OR '<<' OR 'and' OR 'or' )

另外还有 cast | dot | in | square | arrow | range 这些 binary operator

虽然 duangsuse 太菜甚至连递归都不熟,没有获得真知的标志,也勉强可以套用别人的模式左递归完成语法

Lite 运算符优先级别:

+ - * / ** % < > <= >= != == !== === &(and) |(or) <<
as in . :: (没有 ->)(这部分不是数字运算符号,但也使用优先级系统并且可以和数字运算符号并列)
其中 as 后不是 expression, . :: 后面也不是, '.' 可以作为 binary 的原因是只有它作为单独的语句使用时才有 dot 的语义,其它时候都是 :: 的语义
++ -- ! -

  • |(or)
  • &(and)
  • < > <= >= != == !== ===
  • <<
  • ..
  • + -
  • * / %
  • ! ++ -- -
  • **
  • . :: as in

@duangsuse
Copy link
Collaborator Author

duangsuse commented May 8, 2018

d56dd9d 写了个现在还很有问题的 parser

@duangsuse
Copy link
Collaborator Author

体积不过关,已打回

虽然我的确可以完全分开 parser 和执行引擎

@duangsuse
Copy link
Collaborator Author

duangsuse commented May 8, 2018

http://www2.cs.tum.edu/projects/cup/

世界上最好的 parserc

http://jflex.de

世界上最好的 scanner generator

@duangsuse duangsuse added the stupid project stupid label May 8, 2018
@duangsuse
Copy link
Collaborator Author

// Lite parser by duangsuse, no rights reserved (lexical rules see https://ohmlang.github.io/editor)
Lite {
  // The JavaScript lexical rules
  // §A.1 Lexical Grammar -- https://es5.github.io/#A.1

  Program = CompStmt

  sourceCharacter = any

  // Override Ohm's built-in definition of space.
  space := whitespace | comment

  whitespace = "\t"
             | "\x0B"    -- verticalTab
             | "\x0C"    -- formFeed
             | " "
             | "\u00A0"  -- noBreakSpace
             | "\uFEFF"  -- byteOrderMark
             | unicodeSpaceSeparator

  lineTerminator = "\n" | "\r" | "\u2028" | "\u2029"
  lineTerminatorSequence = "\n" | "\r" ~"\n" | "\u2028" | "\u2029" | "\r\n"

  comment = multiLineComment | singleLineComment

  multiLineComment = "<####>" (~">####<" sourceCharacter)* ">####<"
  singleLineComment = "#" (~lineTerminator sourceCharacter)*

  identifier (an identifier) = ~reservedWord identifierName
  identifierName = identifierStart identifierPart*

  identifierStart = letter | "$" | "_"
                  | "\\" unicodeEscapeSequence -- escaped
  identifierPart = identifierStart | unicodeCombiningMark
                 | unicodeDigit | unicodeConnectorPunctuation
                 | "\u200C" | "\u200D"
  letter += unicodeCategoryNl
  unicodeCategoryNl
    = "\u2160".."\u2182" | "\u3007" | "\u3021".."\u3029"
  unicodeDigit (a digit)
    = "\u0030".."\u0039" | "\u0660".."\u0669" | "\u06F0".."\u06F9" | "\u0966".."\u096F" | "\u09E6".."\u09EF" | "\u0A66".."\u0A6F" | "\u0AE6".."\u0AEF" | "\u0B66".."\u0B6F" | "\u0BE7".."\u0BEF" | "\u0C66".."\u0C6F" | "\u0CE6".."\u0CEF" | "\u0D66".."\u0D6F" | "\u0E50".."\u0E59" | "\u0ED0".."\u0ED9" | "\u0F20".."\u0F29" | "\uFF10".."\uFF19"

  unicodeCombiningMark (a Unicode combining mark)
    = "\u0300".."\u0345" | "\u0360".."\u0361" | "\u0483".."\u0486" | "\u0591".."\u05A1" | "\u05A3".."\u05B9" | "\u05BB".."\u05BD" | "\u05BF".."\u05BF" | "\u05C1".."\u05C2" | "\u05C4".."\u05C4" | "\u064B".."\u0652" | "\u0670".."\u0670" | "\u06D6".."\u06DC" | "\u06DF".."\u06E4" | "\u06E7".."\u06E8" | "\u06EA".."\u06ED" | "\u0901".."\u0902" | "\u093C".."\u093C" | "\u0941".."\u0948" | "\u094D".."\u094D" | "\u0951".."\u0954" | "\u0962".."\u0963" | "\u0981".."\u0981" | "\u09BC".."\u09BC" | "\u09C1".."\u09C4" | "\u09CD".."\u09CD" | "\u09E2".."\u09E3" | "\u0A02".."\u0A02" | "\u0A3C".."\u0A3C" | "\u0A41".."\u0A42" | "\u0A47".."\u0A48" | "\u0A4B".."\u0A4D" | "\u0A70".."\u0A71" | "\u0A81".."\u0A82" | "\u0ABC".."\u0ABC" | "\u0AC1".."\u0AC5" | "\u0AC7".."\u0AC8" | "\u0ACD".."\u0ACD" | "\u0B01".."\u0B01" | "\u0B3C".."\u0B3C" | "\u0B3F".."\u0B3F" | "\u0B41".."\u0B43" | "\u0B4D".."\u0B4D" | "\u0B56".."\u0B56" | "\u0B82".."\u0B82" | "\u0BC0".."\u0BC0" | "\u0BCD".."\u0BCD" | "\u0C3E".."\u0C40" | "\u0C46".."\u0C48" | "\u0C4A".."\u0C4D" | "\u0C55".."\u0C56" | "\u0CBF".."\u0CBF" | "\u0CC6".."\u0CC6" | "\u0CCC".."\u0CCD" | "\u0D41".."\u0D43" | "\u0D4D".."\u0D4D" | "\u0E31".."\u0E31" | "\u0E34".."\u0E3A" | "\u0E47".."\u0E4E" | "\u0EB1".."\u0EB1" | "\u0EB4".."\u0EB9" | "\u0EBB".."\u0EBC" | "\u0EC8".."\u0ECD" | "\u0F18".."\u0F19" | "\u0F35".."\u0F35" | "\u0F37".."\u0F37" | "\u0F39".."\u0F39" | "\u0F71".."\u0F7E" | "\u0F80".."\u0F84" | "\u0F86".."\u0F87" | "\u0F90".."\u0F95" | "\u0F97".."\u0F97" | "\u0F99".."\u0FAD" | "\u0FB1".."\u0FB7" | "\u0FB9".."\u0FB9" | "\u20D0".."\u20DC" | "\u20E1".."\u20E1" | "\u302A".."\u302F" | "\u3099".."\u309A" | "\uFB1E".."\uFB1E" | "\uFE20".."\uFE23"

  unicodeConnectorPunctuation = "\u005F" | "\u203F".."\u2040" | "\u30FB" | "\uFE33".."\uFE34" | "\uFE4D".."\uFE4F" | "\uFF3F" | "\uFF65"
  unicodeSpaceSeparator = "\u2000".."\u200B" | "\u3000"

  reservedWord = keyword | nullLiteral | booleanLiteral

  // Note: keywords that are the complete prefix of another keyword should
  // be prioritized (e.g. 'in' should come before 'instanceof')
  keyword = break    | do        | scope      | in
          | when     | else      | elif       | if
          | as       | next      | return     | endKeyword
          | or       | for       | and        | while
          | require  | def       | import     | to

  /*
    Note: Punctuator and DivPunctuator (see https://es5.github.io/x7.html#x7.7) are
    not currently used by this grammar.
  */

  literal = nullLiteral | booleanLiteral | numericLiteral
          | stringLiteral
  nullLiteral = "nil" ~identifierPart
  booleanLiteral = ("true" | "false") ~identifierPart

  // For semantics on how decimal literals are constructed, see section 7.8.3

  // Note that the ordering of hexIntegerLiteral and decimalLiteral is reversed w.r.t. the spec
  // This is intentional: the order decimalLiteral | hexIntegerLiteral will parse
  // "0x..." as a decimal literal "0" followed by "x..."
  numericLiteral = octalIntegerLiteral | hexIntegerLiteral | decimalLiteral

  decimalLiteral = decimalIntegerLiteral "." decimalDigit* exponentPart -- bothParts
                 |                       "." decimalDigit+ exponentPart -- decimalsOnly
                 | decimalIntegerLiteral                   exponentPart -- integerOnly

  decimalIntegerLiteral = nonZeroDigit decimalDigit*  -- nonZero
                        | "0"                         -- zero
  decimalDigit = "0".."9"
  nonZeroDigit = "1".."9"

  exponentPart = exponentIndicator signedInteger -- present
               |                                 -- absent
  exponentIndicator = "e" | "E"
  signedInteger = "+" decimalDigit* -- positive
                | "-" decimalDigit* -- negative
                |     decimalDigit+ -- noSign

  hexIntegerLiteral = "0x" hexDigit+
                    | "0X" hexDigit+

  // hexDigit defined in Ohm's built-in rules (otherwise: hexDigit = "0".."9" | "a".."f" | "A".."F")

  octalIntegerLiteral = "0" octalDigit+

  octalDigit = "0".."7"

  // For semantics on how string literals are constructed, see section 7.8.4
  stringLiteral = "\"" doubleStringCharacter* "\""
                | "'" singleStringCharacter* "'"
  doubleStringCharacter = ~("\"" | "\\" | lineTerminator) sourceCharacter -- nonEscaped
                        | "\\" escapeSequence                             -- escaped
                        | lineContinuation                                -- lineContinuation
  singleStringCharacter = ~("'" | "\\" | lineTerminator) sourceCharacter -- nonEscaped
                        | "\\" escapeSequence                            -- escaped
                        | lineContinuation                               -- lineContinuation
  lineContinuation = "\\" lineTerminatorSequence
  escapeSequence = unicodeEscapeSequence
                 | hexEscapeSequence
                 | octalEscapeSequence
                 | characterEscapeSequence  // Must come last.
  characterEscapeSequence = singleEscapeCharacter
                          | nonEscapeCharacter
  singleEscapeCharacter = "'" | "\"" | "\\" | "b" | "f" | "n" | "r" | "t" | "v"
  nonEscapeCharacter = ~(escapeCharacter | lineTerminator) sourceCharacter
  escapeCharacter = singleEscapeCharacter | decimalDigit | "x" | "u"
  octalEscapeSequence = zeroToThree octalDigit octalDigit    -- whole
                      | fourToSeven octalDigit               -- eightTimesfourToSeven
                      | zeroToThree octalDigit ~decimalDigit -- eightTimesZeroToThree
                      | octalDigit ~decimalDigit             -- octal
  hexEscapeSequence = "x" hexDigit hexDigit
  unicodeEscapeSequence = "u" hexDigit hexDigit hexDigit hexDigit

  zeroToThree = "0".."3"
  fourToSeven = "4".."7"

  // === Implementation-level rules (not part of the spec) ===

  // A semicolon is "automatically inserted" if a newline or the end of the input stream is
  // reached, or the offending token is "}".
  // See https://es5.github.io/#x7.9 for more information.
  // NOTE: Applications of this rule *must* appear in a lexical context -- either in the body of a
  // lexical rule, or inside `#()`.
  sc = ";" | end | lineTerminator | comment

  // Convenience rules for parsing keyword tokens.
  break = "break" ~identifierPart
  do = "do" ~identifierPart
  scope = "scope" ~identifierPart
  in = "in" ~identifierPart
  when = "when" ~identifierPart
  else = "else" ~identifierPart
  elif = "elif" ~identifierPart
  if = "if" ~identifierPart
  as = "as" ~identifierPart
  next = "next" ~identifierPart
  return = "return" ~identifierPart
  endKeyword = "end" ~identifierPart
  or = "or" ~identifierPart
  for = "for" ~identifierPart
  and = "and" ~identifierPart
  while = "while" ~identifierPart
  require = "require" ~identifierPart
  def = "def" ~identifierPart
  import = "import" ~identifierPart
  to = "to" ~identifierPart

  // end of javascript lexical rules

  // start of expressions

  // lite operator precedence
  // |
  // &
  // < > <= >= != == !== ===
  // <<
  // to
  // + -
  // * / %
  // ** . :: as in
  // - ! ++ --
  Exp
    = OrExp

  OrExp
    = OrExp "|" AndExp -- or
    | OrExp or AndExp -- orKeyword
    | AndExp

  AndExp
    = AndExp "&" RelationExp -- and
    | AndExp and RelationExp -- andKeyword
    | RelationExp

  RelationExp
    = RelationExp "<" ShiftExp   -- greaterThan
    | RelationExp ">" ShiftExp   -- lessThan
    | RelationExp "<=" ShiftExp  -- greaterEqual
    | RelationExp ">=" ShiftExp  -- lessEqual
    | RelationExp "!=" ShiftExp  -- notEqual
    | RelationExp "==" ShiftExp  -- equal
    | RelationExp "!==" ShiftExp -- notFullEqual
    | RelationExp "===" ShiftExp -- fullEqual
    | ShiftExp

  ShiftExp
    = ShiftExp "<<" RangeExp  -- shift
    | RangeExp

  RangeExp
    = RangeExp to AddExp  -- range
    | AddExp

  AddExp
    = AddExp "+" MulExp  -- plus
    | AddExp "-" MulExp  -- minus
    | MulExp

  MulExp
    = MulExp "*" ExpExp  -- times
    | MulExp "/" ExpExp  -- divide
    | MulExp "%" ExpExp  -- remainder
    | ExpExp

  ExpExp
    = ExpExp "**" ExpExp      -- power
    | ExpExp "::" identifier  -- square
    | ExpExp as identifier  -- as
    | ExpExp in ExpExp      -- in
    | PriExp

  PriExp
    = "(" Exp ")"          -- paren
    | "-" PriExp           -- neg
    | "!" PriExp           -- not
    | identifier "++"      -- inc
    | identifier "--"      -- dec
    | literal              -- literal
    | Call                 -- callExp
    | LiteExpr             -- liteExp

  LiteExpr
    = List | Table | BracketBlock | DoBlock

  Divider
    = (", " | " " | ",")

  List
    = "[" ExpList "]"     -- simpleList
    | ":[" ExpList "]"    -- wordList

  ExpList
  = (Divider? Exp)*

  Table
    = "{" KvList "}"

  KvList
    = (identifier ":" ("," | "\n"))*

  Call
    = Call "(" ExpList ")"  -- call
    | Call "." identifier   -- callIndex
    | Call "[" Exp "]"      -- justIndex
    | Call ExpList          -- callEasy
    | identifier ~"="       -- justIdentifier

  BracketBlock
    = "{" NameListB? (":" SimpleStatement)* "}"

  NameList
    = "(" (Divider? identifier)* ")"

  NameListB
    = "|" (Divider? identifier)* "|"

  DoBlock
    = do NameListB? Block

  // end Exp part

  SimpleStatement
    = Exp     -- expressionStatement
    | Break   -- break
    | Next    -- continue
    | Import  -- import
    | Require -- require
    | Return  -- return
    | Assign  -- assignment
    | IndexEq -- indexLet
    | Arrow   -- arrowLet

  Break
    = break

  Next
    = next

  Import
    = import (~lineTerminator sourceCharacter)*

  Require
    = require (~lineTerminator sourceCharacter)*

  Return
    = return Exp?

  Assign
    = identifier "=" Exp      -- let
    | "@" identifier "=" Exp  -- letLocal

  IndexEq
    = Exp "[" Exp "]" "=" Exp

  Arrow
    = Exp "->" identifier Exp

  Statement
    = SimpleStatement  -- simpleStatement
    | Def              -- defineMethod
    | For              -- forLoop
    | While            -- whileLoop
    | Scope            -- scope
    | When             -- when
    | If               -- controlFlow
    | "\n"             -- nop

  Def
    = def identifier sc Block  -- defEasy
    | def identifier sc Exp sc  -- defExpr
    | def identifier NameList sc Block  -- def

  For
    = for identifier in Exp sc Block      -- forUsual
    | for "@" identifier in Exp sc Block  -- forLocal

  While
    = while Exp sc Block

  Scope
    = scope identifier? sc Block

  // a switch statement added in language specification 1.1
  When
    = when Exp sc (Exp sc Block) endKeyword             -- when
    | when Exp sc (identifier Exp sc Block) endKeyword  -- whenEasy
    | when Exp sc (in (Exp or)* sc Block)* endKeyword   -- whenIs

  If
    = if Exp sc Block                -- simpleEnd
    | if Exp sc CompStmt else Block  -- ifElse
    | if Exp sc CompStmt (elif Exp sc CompStmt)* (else CompStmt)? endKeyword -- ifElif

  Block
    = CompStmt endKeyword

  CompStmt
    = (Statement sc?)*
}

语法新约

@duangsuse
Copy link
Collaborator Author

由于 duangsuse 喜爱(虽然质量差)但行数极多的代码
(行数优先于代码质量)
而且 GeekApk 待做,决定快速使用 cup 生成解析器

直接在原项目上修改

重写的事情以后再说

https://github.com/duangsuse/Lite/wiki/Lite-1.0-%E5%88%B0-Lite-1.1-%E7%9A%84%E8%AF%AD%E6%B3%95%E5%8F%98%E5%8C%96

新语义特性必须实现

@duangsuse
Copy link
Collaborator Author

  • 内部方法调度支持 Lite fallback block

本来可以更简洁,因为 duangsuse 喜欢行数多的项目(我一定会改的)... 这次先不删除其实可以去掉的重复代码,而且还会添加更多多余的代码

  • Fallback 方法允许向类继承树上查找到 java.lang.Object

  • 基于对象 hash 的 singleton Fallback 方法

  • 基于 range 的 for 和 range 支持

  • 高级的块参数填充(*varargs/name=default)

  • 参考 BeanShell 为解释器实现 Java 接口

  • 参考 BeanShell 默认加载一些 Java 包

  • 删除 trace

  • and/or

  • 修复 as

  • 支持 paren

  • 给 import 和 require 明确身份

  • 自动提升数值类型

  • JsonAst

https://github.com/duangsuse/Lite/wiki/Lite-1.0-%E5%88%B0-Lite-1.1-%E7%9A%84%E8%AF%AD%E6%B3%95%E5%8F%98%E5%8C%96#def-%E8%AF%AD%E6%B3%95%E5%8F%98%E6%9B%B4

@duangsuse
Copy link
Collaborator Author

算了我还是重写吧...

这次文本处理部分使用 JFlex 和 cup

https://github.com/jflex-de/jflex/tree/master/jflex/examples/java

@duangsuse
Copy link
Collaborator Author

duangsuse commented May 8, 2018

另外字符串内联表达式(冰封说官方名称叫 string interpolation)这种东西怎么能少呢

我还是支持

  • :"a = $a"
  • :"a + 1 = ${a + 1}"

这种语法糖,这种特殊的 string 在解析时创建,取值时内联表达式被求值

@duangsuse duangsuse added android Android issue and removed android Android issue labels May 9, 2018
@duangsuse
Copy link
Collaborator Author

https://github.com/duangsuse/Lite/blob/master/pretty_new/parser.cup

RT, cup 的语法实在是不简洁的让人乍舌,我决定暂时没有这个「内建 Parser」了
以后交给引擎的都是 AstJson,当然反射 Parser 也是允许的
Parser 是 PEG.js 重写,Java 包装一下允许 Android 和 Java 平台使用

@duangsuse duangsuse added the 我太菜了,被关了起来 垃圾 duangsuse label May 9, 2018
@duangsuse
Copy link
Collaborator Author

垃圾 duangsuse, 解析器的问题有什么好说的,累了就不要做那么复杂,你看我现在都又被他们关了起来

😿

@duangsuse
Copy link
Collaborator Author

duangsuse commented May 9, 2018

大不了让他们选择要不要加上 100k 左右去使用解析器来执行代码文本

(上文「他们」是说大佬们,这里的「他们」说的是那些 Androlua 小白们)

(很可惜,去年我还是不能吊打这个文本中的「他们」·不像 ice1000 和老李那么 NB)
(很可惜,我根本不知道,冰封是为啥会在之前那么多学校占用的时间里写了那么多代码和博文的,我真的不明白,除非是从小学开始)
(老李还可以理解)

补充:这次被禁言是因为违反群规发了和 PL 系无关的东西,如果 ice1k 看到肯定会抱怨,所以我说明一下

@duangsuse
Copy link
Collaborator Author

duangsuse commented May 9, 2018

而执行引擎就更好了,大部分应用不需要执行代码文本,所以只是更好的选择而已(不过以后做 IDEA 插件某些事情还是逃不掉的)(至少要改现在的 JFlex 词法啊)(VSCode 还好)

而如果一定要费那么大力气去让 JFlex 能理解 Lite 的某些特殊词法(或者是在我那 800 行代码上无聊代码上修改),然后让 Cup 去匹配,不累死我了

现在我也的确是不想再在解释器上花更多时间了,后面还有 GeekApk,5 天内完成这上面的工作必须的

@duangsuse
Copy link
Collaborator Author

(乱用 issue management 系列)

不过和自己对比的确进步了一些(从项目大小来看... 虽然我觉得水平没啥变化)
(和之前的那个 MonkeyVM 比较)
,虽然学习的话要啃书和写新代码才对
因为最近有 Lite 和 GeekApk (一直有变化),不能学什么新知识,只能啃老本(之前看一点书还是有老本可以啃的)

因为 GeekApk 是一定不能再拖下去的,只好以后再学,不过资料都有

@duangsuse duangsuse added the enhancement New feature or request label May 9, 2018
@duangsuse
Copy link
Collaborator Author

Lite/Lime 和隔壁的 CovScript 和 Lice 一比虽然语法上有一些特色,终究代码不行
毕竟 Lice 可是学院派程序员写的,而 Lite 的作者 @piggyrole 就是个 🌶️ 🐔

比较:

class Lexer(sourceCode: String) {

	private val sourceCode = sourceCode.toCharArray()
	private var line = 1
	private var col: Int
	private var charIndex: Int
	private var tokenBuffer: MutableList<Token> = ArrayList(50)
	private var currentTokenIndex: Int = 0

	init {
		this.col = 1
		this.charIndex = 0

		doSplitTokens()
	}

	fun currentToken() = // assert(currentTokenIndex < this.tokenBuffer.size)
			this.tokenBuffer[this.currentTokenIndex]

	fun peekOneToken() = // assert(currentTokenIndex + 1 < this.tokenBuffer.size)
			this.tokenBuffer[this.currentTokenIndex + 1]

	fun nextToken() = this.currentTokenIndex++

	private fun doSplitTokens() {
		while (currentChar() != '\u0000') {
			when {
				currentChar() == '-' -> disambiguateIdentifierOrNegative()
				currentChar() in decDigits -> lexNumber()
				currentChar() in firstIdChars -> lexIdentifier()
				currentChar() in lispSymbols -> lexSingleCharToken()
				currentChar() == '"' -> lexString()
				currentChar() == ';' -> skipComment()
				currentChar() in blanks -> nextChar()
				else -> throw ParseException("Unknown character ${currentChar()}", MetaData(this.line, this.line, this.col, this.col + 1))
			}
		}
		tokenBuffer.add(Token(Token.TokenType.EOI, "", this.line, this.line, this.col, this.col + 1))
		this.currentTokenIndex = 0
	}

	private fun disambiguateIdentifierOrNegative() {
		if (peekOneChar() in decDigits) lexNumber()
		else lexIdentifier()
	}

	private fun lexIdentifier() {
		val line = this.line
		val startAtCol = this.col
		val str = scanFullString(idChars)
		this.tokenBuffer.add(Token(Token.TokenType.Identifier, str, this.line, this.line, startAtCol, this.col))
	}

	private fun lexNumber() {
		val line = this.line
		val startAtCol = this.col
		var isNegative = false
		var numberType: Token.TokenType
		var numberStr: String

		if (currentChar() == '-') {
isNegative = true
  • Kotlin 写成
  • 标识符都是很好的,英文表示最佳答案
  • 简洁的分词逻辑
  • 优秀的架构
  • 简洁的 API
  • 合适的 access level
  • Inspect code 0 warning 的成绩(大概)
package lite.lexer;

import java.util.ArrayList;
import java.util.Scanner;

/**
 * The Lite Lexer
 *
 * @author duangsuse
 * @see lite.Parser
 * @since 1.0
 */
public class Lexer {
    /**
     * Code string
     */
    public String code;

    /**
     * Parsing line
     */
    public int line;

    /**
     * Parsing at column
     */
    public int column;

    /**
     * Parsing char index
     */
    public int c;

    /**
     * Current character
     */
    public char curC;

    /**
     * Result tokens
     */
    public ArrayList<Token> tokens = new ArrayList<>();

    /**
     * Lexer state
     * 0 = null
     * 1 = ignoring comment, expecting newline/eof
     * 2 = building string
     * 3 = building single-quoted string
     * 4 = logging number
     * 5 = logging identifier
     * 66 = error when lexing
     */
    public byte lexerState;

    /**
     * Verbose lex
     */
    public boolean verbose = false;

    /**
     * Error string
     */
    public String error;

    /**
     * Where the string starts
     */
    public int stringStarting;

    /**
     * Temp String builder
     */
    public StringBuilder temp = new StringBuilder();

    /**
     * Split : in identifier (true)
     */
    public boolean splitComma = true;

    /**
     * Blank constructor
     */
    public Lexer() {
    }

    /**
     * Lite code lexer constructor
     *
     * @param lite lite code to lex
     */
    public Lexer(String lite) {
        code = lite;
        line = 0;
        column = 0;
        c = 0;
        lexerState = 0;
    }

    /**
     * Reads file from stdIn, output tokens to stdout
     *
     * @param args commandline arguments
     */
    public static void main(String[] args) {
        boolean verbose = false;
        boolean listOutput = true;
        boolean deflate = false;
        for (String s : args) {
            if (s.equals("-v")) {
                verbose = true;
            }
            if (s.equals("-p")) {
                listOutput = false;
            }
            if (s.equals("-d")) {
                deflate = true;
            }
        }
        Scanner scan = new Scanner(System.in);
        StringBuilder buf = new StringBuilder();
        while (scan.hasNextLine())
            buf.append(scan.nextLine()).append("\n");
        Lexer lex = new Lexer(buf.toString());
lex.verbose = verbose;
  • 多余的代码和 JavaDoc
  • 作者拙劣的技术跃然「码」上,呼之欲出
  • 不合适的 access level,作者可能是小学生,全部使用了 public
  • 该 final 的即使是 IDEA 要提示修改也不改
  • 破烂无聊的架构,无聊的无用 helper function占用太多体积
  • 太绝对,不处理某些 unicode 另外的特殊符号
  • 使用 Java 这种不用操心内存和寄存器的应用层语言,而且 Java 语言特性就没有 Kotlin 多
  • 没有错误处理情况
  • Idea inspect code 无数 warning

@duangsuse
Copy link
Collaborator Author

下面是关于解析器的情况:
为了方便暂时决定使用之前的 ohm.js + Lite 规则 作为解析器(那个解析器实际上也足够成熟了)

下面的文本处理过程我会规定 Lite AST JSON 表示方式,
然后写一 JavaScript AstJson 转换器,支持在 GNU/Linux+Node 和现代支持 js 的浏览器上运行

我会给转换器为 GNU/Linux/Android WebView 打包预备接下来使用

Spec 会放在 Wiki 上开发

@duangsuse
Copy link
Collaborator Author

duangsuse commented May 9, 2018

目前发现了规则有一个问题:

CallEasy 会把之后所有的 expression 都当成参数:

System.out.println getb() a b c
#                                      ^ 会被当成 getb 的参数 
System.b 1 2 3 a(1) 2
#                              ^ 应该是 System.b ... 的参数,却变成了 a 的参数,而 a (1) 2 被解析为 CallEasy

a() 1 2 3

目前认为原来正确的设计是 getb() 使用了 () 来限制参数的范围,所以不会有问题
但是实际上 a() 被当成了 Call,不过这个没什么的

@duangsuse
Copy link
Collaborator Author

☝️ 所以你们看,还是手写 top-down parser 好

@duangsuse
Copy link
Collaborator Author

不过那样最好就全手写算了,可现在我没这耐心了

@duangsuse
Copy link
Collaborator Author

2018-05-10 09-25-14

已放弃,实在忍不了 JavaScript 那种纸张的弱类型,啥都没有,对 JS VSCode 调试简直小学生工具一般,真搞不懂 js 是如何被那么多人使用的,真 tm 智障

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stupid project stupid WIP Work In Progress 我太菜了,被关了起来 垃圾 duangsuse
Projects
None yet
Development

No branches or pull requests

1 participant