lex

The name is inspired from this lex tool that generates lexical analyzers.

Here, lex is a 30 lines program that identifies langage elements in a text.

Usage

First, define the lexical and the dictionary, using regular expressions, on the definition part.
Then call lex with a text, and that dictionary. lex returns the list of langage elements matching the dictionary.

There are two examples further down, one for a mathematical langage, and one for a markdown langage.

Features

works for the dictionary you define, i.e. the langage you define.
Swift's syntax allows to do this in a simple and short way.

More description

In other words, lex is a 'tokenizer' that works with regular expressions. 'Tokenisation' is the first step for lexical analysis. lex is only around 30 lines of code, but it's a template code. That means it should be isolated for reuse because

After lex, a following step could apply an algebra to the identified langage elements. An algebra defines operators. Operators have a priority, a number of operands, and can be postfix/prefix/infix with its operands. We should identify operators within our langage elements, and reorder the elements according to the operators characteristics, in order to obtain a 'Reverse Polish Notation' : operands followed by operator ... . After this 'RPN' transformation, our original text could be used as a program in a state machine ...

Examples

Some markdown langage

Here is an example, for some markdown langage. We're trying to find the langage elements in this markdown text :

# Title {.flyer}
## SubTitle
### Paragraph title
#### Paragraph subTitle
notes

Define the langage elements

// langage elements
public enum MdLexem {
    case Level1         // # level 1
    case Level2         // ##  level 2
    case Level3         // ###  level 3
    case Level4         // ####  level 4
    case Text(String)   // some text
    case BracketOpen    // {
    case BracketClose   // }
    case Class(String)  // .fooClass
    case Other(String)  // unknown
}

Write the definitions

// dictionary
public let md_dict: [Def<MdLexem>] = [
    Def<MdLexem>(regex: "[\r\n]" , funct: { _ in nil } ),
    Def<MdLexem>(regex: "[ \ta-zA-Z0-9]+" , funct: { .Text($0) } ),
    Def<MdLexem>(regex: "#[ \t]+" , funct: { _ in .Level1 } ),
    Def<MdLexem>(regex: "##[ \t]+" , funct: { _ in .Level2 } ),
    Def<MdLexem>(regex: "###[ \t]+" , funct: { _ in .Level3 } ),
    Def<MdLexem>(regex: "####[ \t]+" , funct: { _ in .Level4 } ),
    Def<MdLexem>(regex: "\\{" , funct: { _ in .BracketOpen } ),
    Def<MdLexem>(regex: "\\}" , funct: { _ in .BracketClose } ),
    Def<MdLexem>(regex: "\\.[a-zA-Z0-9]+" , funct: { .Class(String($0.dropFirst())) } )
]

Then call lex on a text :

let text = """
# Title {.flyer}
## SubTitle
### Paragraph title
#### Paragraph subTitle
notes
"""

let elements = lex(text, md_dict)

print(elements)

`lex' returns identified elements :

[
lex.MdLexem.Level1, lex.MdLexem.Text("Title "), lex.MdLexem.BracketOpen,
lex.MdLexem.Class("flyer"), lex.MdLexem.BracketClose, lex.MdLexem.Level2,
lex.MdLexem.Text("SubTitle"), lex.MdLexem.Level3, lex.MdLexem.Text("Paragraph title"),
lex.MdLexem.Level4, lex.MdLexem.Text("Paragraph subTitle"),lex.MdLexem.Text("notes")
]

Some mathematical langage

Here is an example, for some mathematical langage. We're trying to find the langage elements in this markdown text :

bar(x, y) inf
x + y * 8 + (4 - 1) / 7
foo(8, 2)

Define the langage elements

// langage elements
public enum MathLexem {
    case Infinite
    case Identifier(String)
    case Number(Float)
    case ParensOpen
    case ParensClose
    case Comma
    case BinaryOp(String)
}

Write the definitions

// dictionary
public let math_dict: [Def<MathLexem>] = [
    Def<MathLexem>(regex: "[ \t\n]" , funct: { _ in nil } ),
    Def<MathLexem>(regex: "[a-zA-Z][a-zA-Z0-9]*" , funct: { $0 == "inf" ? .Infinite : .Identifier($0) } ),
    Def<MathLexem>(regex: "#[0-9.]+" , funct: {(r: String) in .Number((r as NSString).floatValue) } ),
    Def<MathLexem>(regex: "\\(" , funct: { _ in .ParensOpen } ),
    Def<MathLexem>(regex: "\\)" , funct: { _ in .ParensClose } ),
    Def<MathLexem>(regex: "," , funct: { _ in .Comma } ),
    Def<MathLexem>(regex: "[+\\-*/]" , funct: { .BinaryOp($0) } )
]

Then call lex on a text :

let math_text =
"""
  bar(x, y) inf
  x + y * 8 + (4 - 1) / 7
  foo(8, 2)
"""

let math_elements = lex(math_text, math_dict)

print(math_elements)

`lex' returns identified elements :

[
lex.MathLexem.Identifier("bar"), lex.MathLexem.ParensOpen, lex.MathLexem.Identifier("x"),
lex.MathLexem.Comma, lex.MathLexem.Identifier("y"), lex.MathLexem.ParensClose,
lex.MathLexem.Infinite, lex.MathLexem.Identifier("x"), lex.MathLexem.BinaryOp("+"),
lex.MathLexem.Identifier("y"), lex.MathLexem.BinaryOp("*"), lex.MathLexem.BinaryOp("+"),
lex.MathLexem.ParensOpen, lex.MathLexem.BinaryOp("-"), lex.MathLexem.ParensClose,
lex.MathLexem.BinaryOp("/"), lex.MathLexem.Identifier("foo"), lex.MathLexem.ParensOpen,
lex.MathLexem.Comma, lex.MathLexem.ParensClose
]

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.vscode		.vscode
Sources		Sources
Tests		Tests
.gitignore		.gitignore
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lex

Examples

Some markdown langage

Some mathematical langage

About

Releases

Packages

Languages

ArfNtz/lex

Folders and files

Latest commit

History

Repository files navigation

lex

Examples

Some markdown langage

Some mathematical langage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages