Skip to content
Newer
Older
100644 173 lines (132 sloc) 7.9 KB
30acab3 @beelsebob Added license and readme.
authored
1 CoreParse
2 =========
3
8ed493a @beelsebob Updated readme to reflect that CoreParse now supports iOS.
authored
4 CoreParse is a parsing library for Mac OS X and iOS. It supports a wide range of grammars thanks to its shift/reduce parsing schemes. Currently CoreParse supports SLR, LR(1) and LALR(1) parsers.
eacc356 @beelsebob Updated readme.
authored
5
5a2948b @beelsebob Further updates to the readme.
authored
6 For full documentation see http://beelsebob.github.com/CoreParse.
7
21aa6d6 @beelsebob Updated readme document.
authored
8 Why Should You use CoreParse
9 ----------------------------
10
11 You may wonder why and/or when you should use CoreParse. There are already a number of parsers available in the wild, why should you use this one?
12
13 * Compared to ParseKit:
14 * CoreParse supports more languages (LR(1) languages cover all LL(1) languages and more). In practice, LALR(1) grammars cover most useful languages.
15 * CoreParse produces faster parsers.
16 * CoreParse parsers and tokenisers can be archived using NSKeyedArchivers to save regenerating them each time your application runs.
17 * CoreParse's parsing algorithm is not recursive, meaning it could theoretically deal with much larger hierarchies of language structure without blowing the stack.
18 * Compared to lex/yacc or flex/bison:
19 * While I have no explicitly benchmarked, I would expect parsers produced by lex/yacc or flex/bison to be faster than CoreParse ones.
20 * CoreParse does not _require_ you to compile your parser before you start (though it is recommended).
21 * CoreParse provides allows you to specify grammars right in your Objective-C source, rather than needing another language, which intermixes C/Obj-C.
69d093c @beelsebob Added a comment to the comparison with lex/yacc
authored
22 * CoreParse does not use global state, multiple parser instances can be run in parallel (or the same parser instance can parse multiple token streams in parallel).
a633c94 @beelsebob Added note to readme
authored
23
c758c75 @beelsebob Update README.md
authored
24 Where is CoreParse Already Used?
25 --------------------------------
26
27 CoreParse is already used in a major way in at least two projects:
28
29 * Matt Mower uses it in his [statec](https://github.com/mmower/statec) project to parse his state machine specifications.
30 * I use it in [OpenStreetPad](https://github.com/beelsebob/OpenStreetPad/) to parse MapCSS.
31
32 If you know of any other places it's been used, please feel free to get in touch.
33
eacc356 @beelsebob Updated readme.
authored
34 Parsing Guide
35 =============
36
37 CoreParse is a powerful framework for tokenising and parsing. This document explains how to create a tokeniser and parser from scratch, and how to use those parsers to create your model data structures for you. We will follow the same example throughout this document. This will deal with parsing a simple numerical expression and computing the result.
38
480b1d5 @beelsebob Updated readme tutorial
authored
39 gavineadie has implemented this entire example, to see full working source see https://github.com/beelsebob/ParseTest/.
87dee70 @beelsebob Added link to gavinaedie's implementation of the readme example
authored
40
eacc356 @beelsebob Updated readme.
authored
41 Tokenisation
42 ------------
43
44 CoreParse's tokenisation class is CPTokeniser. To specify how tokens are constructed you must add *token recognisers* in order of precidence to the tokeniser.
45
46 Our example language will involve several symbols, numbers, whitespace, and comments. We add these to the tokeniser:
47
c758c75 @beelsebob Update README.md
authored
48 ```objective-c
49 CPTokeniser *tokeniser = [[[CPTokeniser alloc] init] autorelease];
50 [tokeniser addTokenRecogniser:[CPNumberRecogniser numberRecogniser]];
51 [tokeniser addTokenRecogniser:[CPWhiteSpaceRecogniser whiteSpaceRecogniser]];
52 [tokeniser addTokenRecogniser:[CPQuotedRecogniser quotedRecogniserWithStartQuote:@"/*" endQuote:@"*/" name:@"Comment"]];
53 [tokeniser addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"+"]];
54 [tokeniser addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"-"]];
55 [tokeniser addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"*"]];
56 [tokeniser addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"/"]];
57 [tokeniser addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@"("]];
58 [tokeniser addTokenRecogniser:[CPKeywordRecogniser recogniserForKeyword:@")"]];
59 ```
5a2948b @beelsebob Further updates to the readme.
authored
60
61 Note that the comment tokeniser is added before the keyword recogniser for the divide symbol. This gives it higher precidence, and means that the first slash of a comment will not be recognised as a division.
62
63 Next, we add ourself as a delegate to the tokeniser. We implement the tokeniser delegate methods in such a way that whitespace tokens and comments, although consumed, will not appear in the tokeniser's output:
07c4be9 @beelsebob More readme updates.
authored
64
c758c75 @beelsebob Update README.md
authored
65 ```objective-c
66 - (BOOL)tokeniser:(CPTokeniser *)tokeniser shouldConsumeToken:(CPToken *)token
67 {
68 return YES;
69 }
70
553d3b0 @beelsebob Update README.md
authored
71 - (void)tokeniser:(CPTokeniser *)tokeniser requestsToken:(CPToken *)token pushedOntoStream:(CPTokenStream *)stream
c758c75 @beelsebob Update README.md
authored
72 {
553d3b0 @beelsebob Update README.md
authored
73 if (![token isWhiteSpaceToken] && ![[token name] isEqualToString:@"Comment"])
5a2948b @beelsebob Further updates to the readme.
authored
74 {
553d3b0 @beelsebob Update README.md
authored
75 [stream pushToken:token];
5a2948b @beelsebob Further updates to the readme.
authored
76 }
c758c75 @beelsebob Update README.md
authored
77 }
78 ```
5a2948b @beelsebob Further updates to the readme.
authored
79
80 We can now invoke our tokeniser.
07c4be9 @beelsebob More readme updates.
authored
81
c758c75 @beelsebob Update README.md
authored
82 ```objective-c
83 CPTokenStream *tokenStream = [tokeniser tokenise:@"5 + (2.0 / 5.0 + 9) * 8"];
84 ```
5a2948b @beelsebob Further updates to the readme.
authored
85
86 Parsing
87 -------
88
5e3afaa @beelsebob Update README.md
authored
89 We construct parsers by specifying their grammar. We can construct a grammar simply using a simple BNF like language. Note the syntax tag@<NonTerminal> can be read simply as <NonTerminal>, the tag can be used later to quickly extract values from the parsed result:
07c4be9 @beelsebob More readme updates.
authored
90
c758c75 @beelsebob Update README.md
authored
91 ```objective-c
92 NSString *expressionGrammar =
93 @"Expression ::= term@<Term> | expr@<Expression> op@<AddOp> term@<Term>;"
94 @"Term ::= fact@<Factor> | fact@<Factor> op@<MulOp> term@<Term>;"
95 @"Factor ::= num@'Number' | '(' expr@<Expression> ')';"
96 @"AddOp ::= '+' | '-';"
97 @"MulOp ::= '*' | '/';";
98 NSError *err;
99 CPGrammar *grammar = [CPGrammar grammarWithStart:@"Expression" backusNaurForm:expressionGrammar error:&err];
100 if (nil == grammar)
101 {
102 NSLog(@"Error creating grammar:");
103 NSLog(@"%@", err);
104 }
105 else
106 {
107 CPParser *parser = [CPLALR1Parser parserWithGrammar:grammar];
108 [parser setDelegate:self];
109 ...
110 }
111 ```
801fa3f @beelsebob Finished readme tutorial.
authored
112
e8fe8db @beelsebob Tidied up grammar in readme
authored
113 When a rule is matched by the parser, the `initWithSyntaxTree:` method will be called on a new instance of the apropriate class. If no such class exists the parser delegate's `parser:didProduceSyntaxTree:` method is called. To deal with this cleanly, we implement 3 classes: Expression; Term; and Factor. AddOp and MulOp non-terminals are dealt with by the parser delegate. Here we see the initWithSyntaxTree: method for the Expression class, these methods are similar for Term and Factor:
90f2c9a @beelsebob Added CPParseResult protocol to aide parser production.
authored
114
c758c75 @beelsebob Update README.md
authored
115 ```objective-c
116 - (id)initWithSyntaxTree:(CPSyntaxTree *)syntaxTree
117 {
118 self = [self init];
119
120 if (nil != self)
801fa3f @beelsebob Finished readme tutorial.
authored
121 {
c758c75 @beelsebob Update README.md
authored
122 Term *t = [syntaxTree valueForTag:@"term"];
123 Expression *e = [syntaxTree valueForTag:@"expr"];
801fa3f @beelsebob Finished readme tutorial.
authored
124
c758c75 @beelsebob Update README.md
authored
125 if (nil == e)
801fa3f @beelsebob Finished readme tutorial.
authored
126 {
c758c75 @beelsebob Update README.md
authored
127 [self setValue:[t value]];
128 }
129 else if ([[syntaxTree valueForTag:@"op"] isEqualToString:@"+"])
130 {
131 [self setValue:[e value] + [t value]];
132 }
133 else
134 {
135 [self setValue:[e value] - [t value]];
801fa3f @beelsebob Finished readme tutorial.
authored
136 }
90f2c9a @beelsebob Added CPParseResult protocol to aide parser production.
authored
137 }
c758c75 @beelsebob Update README.md
authored
138
139 return self;
140 }
141 ```
90f2c9a @beelsebob Added CPParseResult protocol to aide parser production.
authored
142
e8fe8db @beelsebob Tidied up grammar in readme
authored
143 We must also implement the delegate's method for dealing with AddOps and MulOps:
90f2c9a @beelsebob Added CPParseResult protocol to aide parser production.
authored
144
c758c75 @beelsebob Update README.md
authored
145 ```objective-c
146 - (id)parser:(CPParser *)parser didProduceSyntaxTree:(CPSyntaxTree *)syntaxTree
147 {
148 return [(CPKeywordToken *)[syntaxTree childAtIndex:0] keyword];
149 }
150 ```
801fa3f @beelsebob Finished readme tutorial.
authored
151
152 We can now parse the token stream we produced earlier:
153
c758c75 @beelsebob Update README.md
authored
154 ```objective-c
155 NSLog(@"%f", [(Expression *)[parser parse:tokenStream] value]);
156 ```
801fa3f @beelsebob Finished readme tutorial.
authored
157
158 Which outputs:
159
160 80.2
21aa6d6 @beelsebob Updated readme document.
authored
161
162 Best Practices
163 --------------
164
cacc690 @beelsebob Get rid of en-dash
authored
165 CoreParse offers three types of parser - SLR, LR(1) and LALR(1):
21aa6d6 @beelsebob Updated readme document.
authored
166 * SLR parsers cover the smallest set of languages, and are faster to generate than LALR(1) parsers.
167 * LR(1) parsers consume a lot of RAM, and are slow, but cover the largest set of languages.
168 * LALR(1) parsers are as fast as SLR parsers to run, but slower to generate, they cover almost as many languages as LR(1) parsers.
169
170 It is recommended that you start with an SLR parser (unless you know better), and when a parser cannot be generated for your grammar, move onto an LALR(1) parser. LR(1) parsers are not really recommended at all, though may be useful in extreme circumstances.
171
172 It is recommended that if you have a significant grammar that requires an LALR(1) parser, you should use NSKeyedArchiving to archive the parser to a file. You should then read this file, and unarchive it when your application runs to save generating the parser every time it runs, as parser generation can take some time.
Something went wrong with that request. Please try again.