Skip to content

ArfNtz/lex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lex

The name is inspired from this lex tool that generates lexical analyzers.

Here, lex is a 30 lines program that identifies langage elements in a text.

Usage

  • First, define the lexical and the dictionary, using regular expressions, on the definition part.
  • Then call lex with a text, and that dictionary. lex returns the list of langage elements matching the dictionary.

There are two examples further down, one for a mathematical langage, and one for a markdown langage.

Features

  • works for the dictionary you define, i.e. the langage you define.
  • Swift's syntax allows to do this in a simple and short way.

More description

In other words, lex is a 'tokenizer' that works with regular expressions. 'Tokenisation' is the first step for lexical analysis. lex is only around 30 lines of code, but it's a template code. That means it should be isolated for reuse because

After lex, a following step could apply an algebra to the identified langage elements. An algebra defines operators. Operators have a priority, a number of operands, and can be postfix/prefix/infix with its operands. We should identify operators within our langage elements, and reorder the elements according to the operators characteristics, in order to obtain a 'Reverse Polish Notation' : operands followed by operator ... . After this 'RPN' transformation, our original text could be used as a program in a state machine ...

Examples

Some markdown langage

Here is an example, for some markdown langage. We're trying to find the langage elements in this markdown text :

# Title {.flyer}
## SubTitle
### Paragraph title
#### Paragraph subTitle
notes
  • Define the langage elements
// langage elements
public enum MdLexem {
    case Level1         // # level 1
    case Level2         // ##  level 2
    case Level3         // ###  level 3
    case Level4         // ####  level 4
    case Text(String)   // some text
    case BracketOpen    // {
    case BracketClose   // }
    case Class(String)  // .fooClass
    case Other(String)  // unknown
}
  • Write the definitions
// dictionary
public let md_dict: [Def<MdLexem>] = [
    Def<MdLexem>(regex: "[\r\n]" , funct: { _ in nil } ),
    Def<MdLexem>(regex: "[ \ta-zA-Z0-9]+" , funct: { .Text($0) } ),
    Def<MdLexem>(regex: "#[ \t]+" , funct: { _ in .Level1 } ),
    Def<MdLexem>(regex: "##[ \t]+" , funct: { _ in .Level2 } ),
    Def<MdLexem>(regex: "###[ \t]+" , funct: { _ in .Level3 } ),
    Def<MdLexem>(regex: "####[ \t]+" , funct: { _ in .Level4 } ),
    Def<MdLexem>(regex: "\\{" , funct: { _ in .BracketOpen } ),
    Def<MdLexem>(regex: "\\}" , funct: { _ in .BracketClose } ),
    Def<MdLexem>(regex: "\\.[a-zA-Z0-9]+" , funct: { .Class(String($0.dropFirst())) } )
]
  • Then call lex on a text :
let text = """
# Title {.flyer}
## SubTitle
### Paragraph title
#### Paragraph subTitle
notes
"""

let elements = lex(text, md_dict)

print(elements)
  • `lex' returns identified elements :
[
lex.MdLexem.Level1, lex.MdLexem.Text("Title "), lex.MdLexem.BracketOpen,
lex.MdLexem.Class("flyer"), lex.MdLexem.BracketClose, lex.MdLexem.Level2,
lex.MdLexem.Text("SubTitle"), lex.MdLexem.Level3, lex.MdLexem.Text("Paragraph title"),
lex.MdLexem.Level4, lex.MdLexem.Text("Paragraph subTitle"),lex.MdLexem.Text("notes")
]

Some mathematical langage

Here is an example, for some mathematical langage. We're trying to find the langage elements in this markdown text :

bar(x, y) inf
x + y * 8 + (4 - 1) / 7
foo(8, 2)
  • Define the langage elements
// langage elements
public enum MathLexem {
    case Infinite
    case Identifier(String)
    case Number(Float)
    case ParensOpen
    case ParensClose
    case Comma
    case BinaryOp(String)
}
  • Write the definitions
// dictionary
public let math_dict: [Def<MathLexem>] = [
    Def<MathLexem>(regex: "[ \t\n]" , funct: { _ in nil } ),
    Def<MathLexem>(regex: "[a-zA-Z][a-zA-Z0-9]*" , funct: { $0 == "inf" ? .Infinite : .Identifier($0) } ),
    Def<MathLexem>(regex: "#[0-9.]+" , funct: {(r: String) in .Number((r as NSString).floatValue) } ),
    Def<MathLexem>(regex: "\\(" , funct: { _ in .ParensOpen } ),
    Def<MathLexem>(regex: "\\)" , funct: { _ in .ParensClose } ),
    Def<MathLexem>(regex: "," , funct: { _ in .Comma } ),
    Def<MathLexem>(regex: "[+\\-*/]" , funct: { .BinaryOp($0) } )
]
  • Then call lex on a text :
let math_text =
"""
  bar(x, y) inf
  x + y * 8 + (4 - 1) / 7
  foo(8, 2)
"""

let math_elements = lex(math_text, math_dict)

print(math_elements)
  • `lex' returns identified elements :
[
lex.MathLexem.Identifier("bar"), lex.MathLexem.ParensOpen, lex.MathLexem.Identifier("x"),
lex.MathLexem.Comma, lex.MathLexem.Identifier("y"), lex.MathLexem.ParensClose,
lex.MathLexem.Infinite, lex.MathLexem.Identifier("x"), lex.MathLexem.BinaryOp("+"),
lex.MathLexem.Identifier("y"), lex.MathLexem.BinaryOp("*"), lex.MathLexem.BinaryOp("+"),
lex.MathLexem.ParensOpen, lex.MathLexem.BinaryOp("-"), lex.MathLexem.ParensClose,
lex.MathLexem.BinaryOp("/"), lex.MathLexem.Identifier("foo"), lex.MathLexem.ParensOpen,
lex.MathLexem.Comma, lex.MathLexem.ParensClose
]

Releases

No releases published

Packages

No packages published

Languages