Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 226 lines (173 sloc) 8.998 kb
f858c4b @djs Import Scintilla 2.11
authored
1 How to write a scintilla lexer
2
3 A lexer for a particular language determines how a specified range of
4 text shall be colored. Writing a lexer is relatively straightforward
5 because the lexer need only color given text. The harder job of
6 determining how much text actually needs to be colored is handled by
7 Scintilla itself, that is, the lexer's caller.
8
9
10 Parameters
11
12 The lexer for language LLL has the following prototype:
13
14 static void ColouriseLLLDoc (
15 unsigned int startPos, int length,
16 int initStyle,
17 WordList *keywordlists[],
18 Accessor &styler);
19
20 The styler parameter is an Accessor object. The lexer must use this
21 object to access the text to be colored. The lexer gets the character
22 at position i using styler.SafeGetCharAt(i);
23
24 The startPos and length parameters indicate the range of text to be
25 recolored; the lexer must determine the proper color for all characters
26 in positions startPos through startPos+length.
27
28 The initStyle parameter indicates the initial state, that is, the state
29 at the character before startPos. States also indicate the coloring to
30 be used for a particular range of text.
31
32 Note: the character at StartPos is assumed to start a line, so if a
33 newline terminates the initStyle state the lexer should enter its
34 default state (or whatever state should follow initStyle).
35
36 The keywordlists parameter specifies the keywords that the lexer must
37 recognize. A WordList class object contains methods that make simplify
38 the recognition of keywords. Present lexers use a helper function
39 called classifyWordLLL to recognize keywords. These functions show how
40 to use the keywordlists parameter to recognize keywords. This
41 documentation will not discuss keywords further.
42
43
44 The lexer code
45
46 The task of a lexer can be summarized briefly: for each range r of
47 characters that are to be colored the same, the lexer should call
48
49 styler.ColourTo(i, state)
50
51 where i is the position of the last character of the range r. The lexer
52 should set the state variable to the coloring state of the character at
53 position i and continue until the entire text has been colored.
54
55 Note 1: the styler (Accessor) object remembers the i parameter in the
56 previous calls to styler.ColourTo, so the single i parameter suffices to
57 indicate a range of characters.
58
59 Note 2: As a side effect of calling styler.ColourTo(i,state), the
60 coloring states of all characters in the range are remembered so that
61 Scintilla may set the initStyle parameter correctly on future calls to
62 the
63 lexer.
64
65
66 Lexer organization
67
68 There are at least two ways to organize the code of each lexer. Present
69 lexers use what might be called a "character-based" approach: the outer
70 loop iterates over characters, like this:
71
72 lengthDoc = startPos + length ;
73 for (unsigned int i = startPos; i < lengthDoc; i++) {
74 chNext = styler.SafeGetCharAt(i + 1);
75 << handle special cases >>
76 switch(state) {
77 // Handlers examine only ch and chNext.
78 // Handlers call styler.ColorTo(i,state) if the state changes.
79 case state_1: << handle ch in state 1 >>
80 case state_2: << handle ch in state 2 >>
81 ...
82 case state_n: << handle ch in state n >>
83 }
84 chPrev = ch;
85 }
86 styler.ColourTo(lengthDoc - 1, state);
87
88
89 An alternative would be to use a "state-based" approach. The outer loop
90 would iterate over states, like this:
91
92 lengthDoc = startPos+lenth ;
93 for ( unsigned int i = startPos ;; ) {
94 char ch = styler.SafeGetCharAt(i);
95 int new_state = 0 ;
96 switch ( state ) {
97 // scanners set new_state if they set the next state.
98 case state_1: << scan to the end of state 1 >> break ;
99 case state_2: << scan to the end of state 2 >> break ;
100 case default_state:
101 << scan to the next non-default state and set new_state >>
102 }
103 styler.ColourTo(i, state);
104 if ( i >= lengthDoc ) break ;
105 if ( ! new_state ) {
106 ch = styler.SafeGetCharAt(i);
107 << set state based on ch in the default state >>
108 }
109 }
110 styler.ColourTo(lengthDoc - 1, state);
111
112 This approach might seem to be more natural. State scanners are simpler
113 than character scanners because less needs to be done. For example,
114 there is no need to test for the start of a C string inside the scanner
115 for a C comment. Also this way makes it natural to define routines that
116 could be used by more than one scanner; for example, a scanToEndOfLine
117 routine.
118
119 However, the special cases handled in the main loop in the
120 character-based approach would have to be handled by each state scanner,
121 so both approaches have advantages. These special cases are discussed
122 below.
123
124 Special case: Lead characters
125
126 Lead bytes are part of DBCS processing for languages such as Japanese
127 using an encoding such as Shift-JIS. In these encodings, extended
128 (16-bit) characters are encoded as a lead byte followed by a trail byte.
129
130 Lead bytes are rarely of any lexical significance, normally only being
131 allowed within strings and comments. In such contexts, lexers should
132 ignore ch if styler.IsLeadByte(ch) returns TRUE.
133
134 Note: UTF-8 is simpler than Shift-JIS, so no special handling is
135 applied for it. All UTF-8 extended characters are >= 128 and none are
136 lexically significant in programming languages which, so far, use only
137 characters in ASCII for operators, comment markers, etc.
138
139
140 Special case: Folding
141
142 Folding may be performed in the lexer function. It is better to use a
143 separate folder function as that avoids some troublesome interaction
144 between styling and folding. The folder function will be run after the
145 lexer function if folding is enabled. The rest of this section explains
146 how to perform folding within the lexer function.
147
148 During initialization, lexers that support folding set
149
150 bool fold = styler.GetPropertyInt("fold");
151
152 If folding is enabled in the editor, fold will be TRUE and the lexer
153 should call:
154
155 styler.SetLevel(line, level);
156
157 at the end of each line and just before exiting.
158
159 The line parameter is simply the count of the number of newlines seen.
160 It's initial value is styler.GetLine(startPos) and it is incremented
161 (after calling styler.SetLevel) whenever a newline is seen.
162
163 The level parameter is the desired indentation level in the low 12 bits,
164 along with flag bits in the upper four bits. The indentation level
165 depends on the language. For C++, it is incremented when the lexer sees
166 a '{' and decremented when the lexer sees a '}' (outside of strings and
167 comments, of course).
168
169 The following flag bits, defined in Scintilla.h, may be set or cleared
170 in the flags parameter. The SC_FOLDLEVELWHITEFLAG flag is set if the
171 lexer considers that the line contains nothing but whitespace. The
172 SC_FOLDLEVELHEADERFLAG flag indicates that the line is a fold point.
173 This normally means that the next line has a greater level than present
174 line. However, the lexer may have some other basis for determining a
175 fold point. For example, a lexer might create a header line for the
176 first line of a function definition rather than the last.
177
178 The SC_FOLDLEVELNUMBERMASK mask denotes the level number in the low 12
179 bits of the level param. This mask may be used to isolate either flags
180 or level numbers.
181
182 For example, the C++ lexer contains the following code when a newline is
183 seen:
184
185 if (fold) {
186 int lev = levelPrev;
187
188 // Set the "all whitespace" bit if the line is blank.
189 if (visChars == 0)
190 lev |= SC_FOLDLEVELWHITEFLAG;
191
192 // Set the "header" bit if needed.
193 if ((levelCurrent > levelPrev) && (visChars > 0))
194 lev |= SC_FOLDLEVELHEADERFLAG;
195 styler.SetLevel(lineCurrent, lev);
196
197 // reinitialize the folding vars describing the present line.
198 lineCurrent++;
199 visChars = 0; // Number of non-whitespace characters on the line.
200 levelPrev = levelCurrent;
201 }
202
203 The following code appears in the C++ lexer just before exit:
204
205 // Fill in the real level of the next line, keeping the current flags
206 // as they will be filled in later.
207 if (fold) {
208 // Mask off the level number, leaving only the previous flags.
209 int flagsNext = styler.LevelAt(lineCurrent);
210 flagsNext &= ~SC_FOLDLEVELNUMBERMASK;
211 styler.SetLevel(lineCurrent, levelPrev | flagsNext);
212 }
213
214
215 Don't worry about performance
216
217 The writer of a lexer may safely ignore performance considerations: the
218 cost of redrawing the screen is several orders of magnitude greater than
219 the cost of function calls, etc. Moreover, Scintilla performs all the
220 important optimizations; Scintilla ensures that a lexer will be called
221 only to recolor text that actually needs to be recolored. Finally, it
222 is not necessary to avoid extra calls to styler.ColourTo: the sytler
223 object buffers calls to ColourTo to avoid multiple updates of the
224 screen.
225
226 Page contributed by Edward K. Ream
Something went wrong with that request. Please try again.