/
lexer.ms
60 lines (53 loc) · 1.79 KB
/
lexer.ms
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// A very primitive lexer. Takes a string and splits it into a series of
// Tokens. Operators and punctuation are mapped to unique keywords. Names,
// which can be any series of letters, are turned into NAME tokens. All other
// characters are ignored (except to separate names). Numbers and strings are
// not supported. This is really just the bare minimum to give the parser
// something to work with.
isLetter = function(c)
return "a" <= c <= "z" or "A" <= c <= "Z"
end function
globals.Lexer = {}
// Lexer.init:
// Creates a new Lexer to tokenize the given string.
//
// text -- the string to tokenize
Lexer.init = function(text)
self.index = 0
self.text = text
self.punctuators = {}
// Register all of the TokenTypes that are explicit punctuators.
for type in TokenType.indexes
punc = punctuator(type)
if punc then self.punctuators[punc] = type
end for
return self
end function
// Lexer.next:
// Get the next token from our text. At the end of the string,
// return EOF.
Lexer.next = function()
while self.index < self.text.len
c = self.text[self.index]
self.index += 1
if self.punctuators.hasIndex(c) then
// Handle punctuation.
return token(self.punctuators[c], c)
end if
if isLetter(c) then
// Handle names.
start = self.index - 1
while self.index < len(self.text)
if not isLetter(self.text[self.index]) then break
self.index += 1
end while
name = self.text[start:self.index]
return token(TokenType.NAME, name)
end if
// Ignore all other characters (whitespace, etc.)
end while
// Once we've reached the end of the string, just return EOF tokens. We'll
// just keeping returning them as many times as we're asked so that the
// parser's lookahead doesn't have to worry about running out of tokens.
return token(TokenType.EOF, "")
end function