/
cocci_syntax.tex
1936 lines (1627 loc) · 71.2 KB
/
cocci_syntax.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%\section{The SmPL Grammar}
% This section presents the SmPL grammar. This definition follows closely
% our implementation using the Menhir parser generator \cite{menhir}.
This document presents the grammar of the SmPL language used by the
\href{http://coccinelle.lip6.fr/}{Coccinelle tool}. For the most
part, the grammar is written using standard notation. In some rules,
however, the left-hand side is in all uppercase letters. These are
macros, which take one or more grammar rule right-hand-sides as
arguments. The grammar also uses some unspecified nonterminals, such
as \T{id}, \T{const}, etc. These refer to the sets suggested by
the name, {\em i.e.}, \T{id} refers to the set of possible
C-language identifiers, while \T{const} refers to the set of
possible C-language constants.
A square bracket that is surrounded by spaces in the description of a term
should appear explicitly in the term, as in an array reference. On the
other hand, square brackets that surround some other term indicate that the
presence of that term is optional.
%
\ifhevea
A PDF version of this documentation is available at
\url{http://coccinelle.lip6.fr/docs/main_grammar.pdf}.
\else
An HTML version of this documentation is available online at
\url{http://coccinelle.lip6.fr/docs/main_grammar.html}.
\fi
\section{Program}
\begin{grammar}
\RULE{\rt{program}}
\CASE{\any{\NT{include\_cocci}} \some{\NT{changeset}}}
\RULE{\rt{include\_cocci}}
\CASE{\#include \NT{string}}
\CASE{using \NT{string}}
\CASE{using \NT{pathToIsoFile}}
\CASE{virtual \T{id} \ANY{, \T{id}}}
\RULE{\rt{changeset}}
\CASE{\NT{metavariables} \NT{transformation}}
\CASE{\NT{script\_metavariables} \T{script\_code}}
% \CASE{\NT{metavariables} \ANY{--- filename +++ filename} \NT{transformation}}
\end{grammar}
\noindent
\T{script\_code} is any code in the chosen scripting language. Parsing of
the semantic patch does not check the validity of this code; any errors are
first detected when the code is executed. Furthermore, \texttt{@} should
not be used in this code. Spatch scans the script code for the next
\texttt{@} and considers that to be the beginning of the next rule, even if
\texttt{@} occurs within e.g., a comment.
\texttt{virtual} keyword is used to declare virtual rules. Virtual
rules may be subsequently used as a dependency for the rules in the
SmPL file. Whether a virtual rule is defined or not is controlled by
the \texttt{-D} option on the command line.
% Between the metavariables and the transformation rule, there can be a
% specification of constraints on the names of the old and new files,
% analogous to the filename specifications in the standard patch syntax.
% (see Figure \ref{scsiglue_patch}).
\section{Metavariables for Transformations}
The \NT{rulename} portion of the metavariable declaration can specify
properties of a rule such as its name, the names of the rules that it
depends on, the isomorphisms to be used in processing the rule, and whether
quantification over paths should be universal or existential. The optional
annotation {\tt expression} indicates that the pattern is to be considered
as matching an expression, and thus can be used to avoid some parsing
problems.
The \NT{metadecl} portion of the metavariable declaration defines various
types of metavariables that will be used for matching in the transformation
section.
\begin{grammar}
\RULE{\rt{metavariables}}
\CASE{@@ \any{\NT{metadecl}} @@}
\CASE{@ \NT{rulename} @ \any{\NT{metadecl}} @@}
\RULE{\rt{rulename}}
\CASE{\T{id} \OPT{extends \T{id}} \OPT{depends on} \opt{\NT{scope}} \NT{dep} \opt{\NT{iso}}
\opt{\NT{disable-iso}} \opt{\NT{exists}} \opt{\NT{rulekind}}}
\RULE{\rt{scope}}
\CASE{\T{exists}}
\CASE{\T{forall}}
\RULE{\rt{dep}}
\CASE{\T{id}}
\CASE{!\T{id}}
\CASE{!(\NT{dep})}
\CASE{ever \T{id}}
\CASE{never \T{id}}
\CASE{\NT{dep} \&\& \NT{dep}}
\CASE{\NT{dep} || \NT{dep}}
\CASE{file in \NT{string}}
\CASE{(\NT{dep})}
\RULE{\rt{iso}}
\CASE{using \NT{string} \ANY{, \NT{string}}}
\RULE{\rt{disable-iso}}
\CASE{disable \NT{COMMA\_LIST}\mth{(}\T{id}\mth{)}}
\RULE{\rt{exists}}
\CASE{exists}
\CASE{forall}
% \CASE{\opt{reverse} forall}
\RULE{\rt{rulekind}}
\CASE{expression}
\CASE{identifier}
\CASE{type}
\RULE{\rt{COMMA\_LIST}\mth{(}\rt{elem}\mth{)}}
\CASE{\NT{elem} \ANY{, \NT{elem}}}
\end{grammar}
The keyword \KW{disable} is normally used with the names of
isomorphisms defined in standard.iso or whatever isomorphism file has been
included. There are, however, some other isomorphisms that are built into
the implementation of Coccinelle and that can be disabled as well. Their
names are given below. In each case, the text describes the standard
behavior. Using \NT{disable-iso} with the given name disables this behavior.
\begin{itemize}
\item \KW{optional\_storage}: A SmPL function definition that does not
specify any visibility (i.e., static or extern), or a SmPL variable
declaration that does not specify any storage (i.e., auto, static,
register, or extern), matches a function declaration or variable
declaration with any visibility or storage, respectively.
\item \KW{optional\_qualifier}: This is similar to \KW{optional\_storage},
except that here it is the qualifier (i.e., const or volatile) that does
not have to be specified in the SmPL code, but may be present in the C code.
\item \KW{optional\_attributes}: This is also similar to
\KW{optional\_storage}, except that here is it an attribute (e.g.,
\_\_init) that does not have to be specified in the SmPL code, but may be
present in the C code.
\item \KW{value\_format}: Integers in various formats, e.g., 1 and 0x1, are
considered to be equivalent in the matching process.
\item \KW{optional\_declarer\_semicolon}: Some declarers (top-level terms
that look like function calls but serve to declare some variable) don't
require a semicolon. This isomorphism allows a SmPL declarer with a semicolon
to match such a C declarer, if no transformation is specified on the SmPL
semicolon.
\item \KW{comm\_assoc}: An expression of the form \NT{exp} \NT{bin\_op}
\KW{...}, where \NT{bin\_op} is commutative and associative, is
considered to match any top-level sequence of \NT{bin\_op} operators
containing \NT{exp} as the top-level argument.
\item \KW{prototypes}: A rule for transforming a function prototype is
generated when a function header changes.
\end{itemize}
The \texttt{depends on} clause indicates conditions under which a semantic
patch rule should be applied. Most of these conditions relate to the
success or failure of other rules, which may be virtual rules. Giving the
name of a rule implies that the current rule is applied if the named rule
has succeeded in matching in the current environment. Giving \texttt{ever}
followed by a rule name implies that the current rule is applied if the
named rule has succeeded in matching in any environment. Analogously,
\texttt{never} means that the named rule should have succeeded in matching
in no environment. The boolean and, or and negation operators combine
these declarations in the usual way. The declaration {\tt file in} checks
that the code being processed comes from the mentioned file, or from a
subdirectory of the directory to which Coccinelle was applied. In the
latter case, the string is matched against the complete pathname. A
trailing {\tt /} is added to the specified subdirectory name, to ensure
that a complete subdirectory name is matched. The
declaration {\tt file in} is only allowed on SmPL code-matching rules.
Script rules are not applied to any code in particular, and thus it doesn't
make sense to check on the file being considered.
As metavariables are bound and inherited across rules, a tree of
environments is built up. A rule is processed only once for all of the
branches that have the same metavariable bindings for the set of variables
that the rule depends on. Different branches, however, may be derived from
the success or failure of different sets of rules. A \texttt{depends on}
clause can further indicate whether the clause should be satisfied for all
the branches (\texttt{forall}) or only for one (\texttt{exists}).
\texttt{exists} is the default. These annotations can for example be
useful when one rule binds a metavariable \texttt{x}, subsequent rules have
the effect of testing good and bad properties of \texttt{x}, and a final
rule may want to ensure that all occurrences of \texttt{x} have the good
property (\texttt{forall}) or none have the bad property
(\texttt{exists}). \texttt{forall} and \texttt{exists} are currently only
supported at top level, not under conjunction and disjunction.
% Once there are references to rule names, there should not be forall and
% exists annotations, so one has to sort out where they are allowed and
% where they are not. The situation is perhaps like the case of path
% operations in temporal logic.
The possible types of metavariable declarations are defined by the grammar
rule below. Metavariables should occur at least once in the transformation
code immediately following their declaration. Fresh identifier
metavariables must only be used in {\tt +} code. These properties are not
expressed in the grammar, but are checked by a subsequent analysis. The
metavariables are designated according to the kind of terms they can match,
such as a statement, an identifier, or an expression. An expression
metavariable can be further constrained by its type. A declaration
metavariable matches the declaration of one or more variables, all sharing
the same type specification ({\em e.g.}, {\tt int a,b,c=3;}). A field
metavariable does the same, but for structure fields. In the minus code, a
statement list metavariable can only appear as a complete function body or
as the complete body of a sequence statement. In the plus code, a
statement list metavariable can occur anywhere a statement list is allowed,
i.e., including as an element of another statement list.
\begin{grammar}
\RULE{\rt{metadecl}}
\CASE{metavariable \NT{ids} ;}
\CASE{fresh identifier \NT{ids} ;}
\CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
\CASE{identifier \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_virt\_or\_not\_eq}\mth{)} ;}
\CASE{parameter \opt{list} \NT{ids} ;}
\CASE{parameter list [ \NT{id} ] \NT{ids} ;}
\CASE{parameter list [ \NT{const} ] \NT{ids} ;}
\CASE{identifier \opt{list} \NT{ids} ;}
\CASE{identifier list [ \NT{id} ] \NT{ids} ;}
\CASE{identifier list [ \NT{const} ] \NT{ids} ;}
\CASE{type \NT{ids} ;}
\CASE{statement \opt{list} \NT{ids} ;}
\CASE{declaration \NT{ids} ;}
\CASE{field \opt{list} \NT{ids} ;}
\CASE{typedef \NT{ids} ;}
\CASE{attribute name \NT{ids} ;}
\CASE{declarer name \NT{ids} ;}
% \CASE{\opt{local} function \NT{pmid\_with\_not\_eq\_list} ;}
\CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
\CASE{declarer \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{iterator name \NT{ids} ;}
\CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_regexp}\mth{)} ;}
\CASE{iterator \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
% \CASE{error \NT{pmid\_with\_not\_eq\_list} ; }
\CASE{\opt{local \mth{\mid} global} idexpression \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{\opt{local \mth{\mid} global} idexpression \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{\opt{local \mth{\mid} global} idexpression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{expression list \NT{ids} ;}
\CASE{expression \some{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{expression enum \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{expression struct \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{expression union \any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{expression \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
\CASE{expression list [ \NT{id} ] \NT{ids} ;}
\CASE{expression list [ \NT{const} ] \NT{ids} ;}
\CASE{\NT{ctype} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{\NT{ctype} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
\CASE{\ttlb \NT{ctypes}\ttrb~\any{*} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_ceq}\mth{)} ;}
\CASE{\ttlb \NT{ctypes}\ttrb~\any{*} [ ] \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{constant \opt{\NT{ctype}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{constant \OPT{\ttlb \NT{ctypes}\ttrb~\any{*}} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq}\mth{)} ;}
\CASE{position \opt{any} \NT{COMMA\_LIST}\mth{(}\NT{pmid\_with\_not\_eq\_mid}\mth{)} ;}
\CASE{symbol \NT{ids};}
\CASE{format \NT{ids};}
\CASE{format list [ \NT{id} ] \NT{ids} ;}
\CASE{format list [ \NT{const} ] \NT{ids} ;}
\CASE{assignment operator \NT{COMMA\_LIST}\mth{(}\T{assignopdecl}\mth{)} ;}
\CASE{binary operator \NT{COMMA\_LIST}\mth{(}\T{binopdecl}\mth{)} ;}
\RULE{\rt{assignopdecl}}
\CASE{\NT{id} \OPT{ = \NT{assignop\_constraint}}}
\RULE{\rt{assignop\_constraint}}
\CASE{\mth{\{}\NT{COMMA\_LIST}\mth{(}\NT{assign\_op}\mth{)}\mth{\}}}
\CASE{\NT{assign\_op}}
\RULE{\rt{binopdecl}}
\CASE{\NT{id} \OPT{ = \NT{binop\_constraint}}}
\RULE{\rt{binop\_constraint}}
\CASE{\mth{\{}\NT{COMMA\_LIST}\mth{(}\NT{bin\_op}\mth{)}\mth{\}}}
\CASE{\NT{bin\_op}}
\end{grammar}
A metavariable declaration {\bf local idexpression} v means that v is an
expression that is restricted to be a local variable. If it should just be
a variable, but not necessarily a local one, then drop local. A more
complex description of a location, such as a->b is considered to be an
expression, not an idexpression.
{\bf Constant} is for constants, such as 27. But it also considers an
identifier that is all capital letters (possibly containing numbers) as a
constant as well, because the names given to macros in Linux usually have
this form.
An {\bf identifier} is the name of a structure field, a macro, a function,
or a variable. It is the name of something rather than an expression that
has a value. But an identifier can be used in the position of an
expression as well, where it represents a variable.
It is possible to specify that an {\bf expression list} or a {\bf parameter
list} metavariable should match a specific number of expressions or
parameters.
An {\bf identifier list} is only used for the parameter list of a macro.
It is possible to specify its length.
It is possible to specify some information about the definition of a {\bf
fresh identifier}. Examples are found in {\tt demos/plusplus1.cocci} and
{\tt demos/plusplus2.cocci} %See the wiki.
A {\bf symbol} declaration specifies that the provided identifiers should
be considered C identifiers when encountered in the body of the rule.
Identifiers in the body of the rule that are not declared explicitly are by
default considered symbols, thus symbol declarations are optional. It is
not required, but it will not cause a parse error, to redeclare a name as a
symbol. A name declared as a symbol can, however, be redeclared as another
metavariable. It will be considered to be a metavariable in such rules,
and will revert to being a symbol in subsequent rules. These conditions
also apply to iterator names and declarer names.
An {\bf attribute name} declaration indicates a name that should be
considered to be an attribute.
A {\bf position} metavariable is used by attaching it using \texttt{@} to
any token, including another metavariable. Its value is the position
(file, line number, etc.) of the code matched by the token. It is also
possible to attach expression, declaration, type, initialiser, and
statement metavariables in this manner. In that case, the metavariable is
bound to the closest enclosing expression, declaration, etc. If such a
metavariable is itself followed by a position metavariable, the position
metavariable applies to the metavariable that it follows, and not to the
attached token. This makes it possible to get eg the starting and ending
position of {\tt f(...)}, by writing {\tt f(...)@E@p}, for expression
metavariable {\tt E} and position metavariable {\tt p}. This attachment
notation for metavariables of type other than position can also be
expressed with a conjunction, but the @ notation may be more concise.
When used, a {\bf format} or {\bf format list} metavariable must be
enclosed by a pair of \texttt{@}s. A format metavariable matches the
format descriptor part, i.e., \texttt{2x} in \texttt{\%2x}. A format list
metavariable matches a sequence of format descriptors as well as the text
between them. Any text around them is matched as well, if it is not
matched by the surrounding text in the semantic patch. Such text is not
partially matched. If the length of the format list is specified, that
indicates the number of matched format descriptors. It is also possible to
use \texttt{\ldots} in a format string, to match a sequence of text
fragments and format descriptors. This only takes effect if the format
string contains format descriptors. Note that this makes it impossible to
require \texttt{\ldots} to match exactly in a string, if the semantic patch
string contains format descriptors. If that is needed, some processing
with a scripting language would be required. And example for the use of
string format metavariables is found in {\tt demos/format.cocci}.
Matching of various kinds of format strings within strings is supported.
With the {\tt -{}-ibm} option, matching of decimal format declarations is
supported, but the length and precision arguments are not interpreted.
Thus it is not possible to match metavariables in these fields. Instead,
the entire format is matched as a single string.
{\bf Assignment} (resp.~binary) {|bf operator} metavariables match any
assignment (resp. binary) operator. The list of operators that can be
matched can be restricted by adding an operator constraint, i.e. a list of
accepted operators.
Other kinds of metavariables can also be attached using \texttt{@} to any
token. In this case, the metavariable floats up to the enclosing
appropriate expression. For example, {\tt 3 +@E 4}, where {\tt E} is an
expression metavariable binds {\tt E} to {\tt 3 + 4}. A particular case is
{\tt Ps@Es}, where {\tt Ps} is a parameter list and {\tt Es} is an
expression list. This pattern matches a parameter list, and then matches
{\tt Es} to the list of expressions, ie a possible argument list,
represented by the names of the parameters. Another particular case is
{\tt E@S}, where {\tt E} is any expression and {\tt S} is a statement
metavariable. {\tt S} matches the closest enclosing statement, which may
be more than what is matches by the semantic match pattern itself.
\begin{grammar}
\RULE{\rt{ids}}
\CASE{\NT{COMMA\_LIST}\mth{(}\NT{pmid}\mth{)}}
\RULE{\rt{pmid}}
\CASE{\T{id}}
\CASE{\NT{mid}}
% \CASE{list}
% \CASE{error}
% \CASE{type}
\RULE{\rt{mid}} \CASE{\T{rulename\_id}.\T{id}}
\RULE{\rt{pmid\_with\_regexp}}
\CASE{\NT{pmid} =\~{} \NT{regexp}}
\CASE{\NT{pmid} !\~{} \NT{regexp}}
\RULE{\rt{pmid\_with\_not\_eq}}
\CASE{\NT{pmid} \OPT{!= \NT{id\_or\_meta}}}
\CASE{\NT{pmid}
\OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_meta}\mth{)} \ttrb}}
\RULE{\rt{pmid\_with\_virt\_or\_not\_eq}}
\CASE{virtual.\T{id}}
\CASE{\NT{pmid\_with\_not\_eq}}
\RULE{\rt{pmid\_with\_not\_ceq}}
\CASE{\NT{pmid} \OPT{!= \NT{id\_or\_cst}}}
\CASE{\NT{pmid} \OPT{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{id\_or\_cst}\mth{)} \ttrb}}
\RULE{\rt{id\_or\_cst}}
\CASE{\T{id}}
\CASE{\T{integer}}
\RULE{\rt{id\_or\_meta}}
\CASE{\T{id}}
\CASE{\T{rulename\_id}.\T{id}}
\RULE{\rt{pmid\_with\_not\_eq\_mid}}
\CASE{\NT{pmid} \OPT{\NT{ANDAND\_LIST}\mth{(}\NT{script\_constraint}\mth{)}}}
\RULE{\rt{script\_constraint}}
\CASE{!= \NT{mid}}
\CASE{!= \ttlb~\NT{COMMA\_LIST}\mth{(}\NT{mid}\mth{)} \ttrb}
\CASE{: script:ocaml
(\NT{COMMA\_LIST}\mth{(} \NT{mid} \mth{)})
\ttlb \NT{expr} \ttrb}
\CASE{: script:python
(\NT{COMMA\_LIST}\mth{(} \NT{mid} \mth{)})
\ttlb \NT{expr} \ttrb}
\RULE{\rt{ANDAND\_LIST}\mth{(X)}}
\CASE{\mth{X}}
\CASE{\mth{X} \&\& \NT{ANDAND\_LIST}\mth{(X)}}
\end{grammar}
Subsequently, we refer to arbitrary metavariables as
\mth{\msf{metaid}^{\mbox{\scriptsize{\it{ty}}}}}, where {\it{ty}}
indicates the {\it metakind} used in the declaration of the variable.
For example, \mth{\msf{metaid}^{\ssf{Type}}} refers to a metavariable
that was declared using \texttt{type} and stands for any type.
{\tt metavariable} declares a metavariable for which the parser tried to
figure out the metavariable type based on the usage context. Such a
metavariable must be used consistently. These metavariables cannot be used
in all contexts; specifically, they cannot be used in context that would
make the parsing ambiguous. Some examples are the leftmost term of an
expression, such as the left-hand side of an assignment, or the type in a
variable declaration. These restrictions may seem somewhat arbitrary from
the user's point of view. Thus, it is better to use metavariables with
metavariable types. If Coccinelle is given the argument {\tt
-{}-parse-cocci}, it will print information about the type that is inferred
for each metavariable.
The \NT{ctype} and \NT{ctypes} nonterminals are used by both the grammar of
metavariable declarations and the grammar of transformations, and are
defined on page~\pageref{types}.
An identifier metavariable with {\tt virtual} as its ``rule name'' is given
a value on the command line. For example, if a semantic patch contains a
rule that declares an identifier metavariable with the name {\tt
virtual.alloc}, then the command line could contain {\tt -D
alloc=kmalloc}. There should not be space around the {\tt =}. An
example is in {\tt demos/vm.cocci} and {\tt demos/vm.c}.
It is possible to give an identifier metavariable a list of constraints
that it should or should not be equal to. If the constraint is a list of
(unquoted) strings, then the value of the metavariable should be the same
as one of the strings, in the case of an equality constraint, or different
from all of the strings, in the case of an inequality constraint. It is
also possible to include inherited identifier metavariables among the
constraints. In the case of a positive constraint, things work in the same
way, but not with respect to the inherited value of the metavariable. On
the other hand, an inequality constraint does not work so well, because the
only value available is the one available in the current environment. If
the proposed value is different from the one in the current environment,
but perhaps the same as the one in some other environment, the match will
still succeed.
Metavariables can be associated with constraints implemented as OCaml or
python script code. The form of the code is somewhat restricted, due to
the fact that it passes through the Coccinelle semantic patch lexer, before
being converted back to a string to be passed to the scripting language
interpreter. It is thus best to avoid complicated code in the constraint
itself, and instead to define relevant functions in and {\tt initialize}
rule. The code must represent an expression that has type bool in the
scripting language. The script code can be parameterized by any inherited
metavariables. It is implicitly parameterized by the metavariable being
declared. In the script, the inherited metavariable parameters are
referred to by their variable names, without the associated rule name. The
script code can also be parameterized by metavariables defined previously
in the same rule. Such metavariables must always all be mentioned in the
same ``rule elem'' as the metavariable to which the constraint applies.
Such a rule elem must also not contain disjunctions, after disjunction
lifting. The result of disjunction lifting can be observed using {\tt
-{}-parse-cocci}. A rule elem is eg an atomic statement, such as a
return or an assignment, or a loop header, if header, etc. The variable
being declared can also be referenced in the script code by its name. All
parameters, except position variables, have their string representation.
An example is in {\tt demos/poscon.cocci}.
Script constraints may be executed more than once for a given metavariable
binding. Executing the script constraint does not guarantee that the
complete match will work out; the constraints are executed within the
matching process.
A declaration of a name as a typedef extends through the rest of the
semantic patch. It is not required, but it will not cause a parse error,
to redeclare a name as a typedef. A name declared as a typedef can,
however, be redeclared as another metavariable. It will be considered to
be a metavariable in such rules, and will revert to being a typedef in
subsequent rules.
\paragraph*{Warning:} Each metavariable declaration causes the declared
metavariables to be immediately usable, without any inheritance
indication. Thus the following are correct:
\begin{quote}
\begin{verbatim}
@@
type r.T;
T x;
@@
[...] // some semantic patch code
\end{verbatim}
\end{quote}
\begin{quote}
\begin{verbatim}
@@
r.T x;
type r.T;
@@
[...] // some semantic patch code
\end{verbatim}
\end{quote}
\noindent
But the following is not correct:
\begin{quote}
\begin{verbatim}
@@
type r.T;
r.T x;
@@
[...] // some semantic patch code
\end{verbatim}
\end{quote}
This applies to position variables, type metavariables, identifier
metavariables that may be used in specifying a structure type, and
metavariables used in the initialization of a fresh identifier. In the
case of a structure type, any identifier metavariable indeed has to be
declared as an identifier metavariable in advance. The syntax does not
permit {\tt r.n} as the name of a structure or union type in such a
declaration.
\section{Metavariables for Scripts}
Metavariables for scripts can only be inherited from transformation rules.
In the spirit of scripting languages such as Python that use dynamic
typing, metavariables for scripts do not include type declarations.
\begin{grammar}
\RULE{\rt{script\_metavariables}}
\CASE{@ script:\NT{language} \OPT{\NT{rulename}} \OPT{depends on \NT{dep}} @
\any{\NT{script\_metadecl}} @@}
\CASE{@ initialize:\NT{language} \OPT{depends on \NT{dep}} @
\any{\NT{script\_virt\_metadecl}} @@}
\CASE{@ finalize:\NT{language} \OPT{depends on \NT{dep}} @
\any{\NT{script\_virt\_metadecl}} @@}
\RULE{\rt{language}} \CASE{python} \CASE{ocaml}
\RULE{\rt{script\_metadecl}}
\CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;}
\CASE{\T{id} <{}< \T{rulename\_id}.\T{id} = "..." ;}
\CASE{\T{id} <{}< \T{rulename\_id}.\T{id} = [] ;}
\CASE{\T{id} ;}
\RULE{\rt{script\_virt\_metadecl}}
\CASE{\T{id} <{}< virtual.\T{id} ;}
\end{grammar}
Currently, the only scripting languages that are supported are Python and
OCaml, indicated using {\tt python} and {\tt ocaml}, respectively. The
set of available scripting languages may be extended at some point.
Script rules declared with \KW{initialize} are run before the treatment of
any file. Script rules declared with \KW{finalize} are run when the
treatment of all of the files has completed. There can be at most one of
each per scripting language (thus currently at most one of each).
Initialize and finalize script rules do not have access to SmPL
metavariables. Nevertheless, a finalize script rule can access any
variables initialized by the other script rules, allowing information to be
transmitted from the matching process to the finalize rule.
Initialize and finalize rules do have access to virtual metavariables,
using the usual syntax. As for other scripting language rules, the rule
is not run (and essentially does not exist) if some of the required virtual
metavariables are not bound. In ocaml, a warning is printed in this case.
An example is found in {\tt demos/initvirt.cocci}.
A script metavariable that does not specify an origin, using \texttt{<<},
is newly declared by the script. This metavariable should be assigned to a
string and can be inherited by subsequent rules as an identifier. In
Python, the assignment of such a metavariable $x$ should refer to the
metavariable as {\tt coccinelle.\(x\)}. Examples are in the files
\texttt{demos/pythontococci.cocci} and \texttt{demos/camltococci.cocci}.
In an OCaml script, the following extended form of \textit{script\_metadecl}
may be used:
\begin{grammar}
\RULE{\rt{script\_metadecl'}}
\CASE{(\T{id},\T{id}) <{}< \T{rulename\_id}.\T{id} ;}
\CASE{\T{id} <{}< \T{rulename\_id}.\T{id} ;}
\CASE{\T{id} ;}
\end{grammar}
\noindent
In a declaration of the form \texttt{(\T{id},\T{id}) <{}<
\T{rulename\_id}.\T{id} ;}, the left component of \texttt{(\T{id},\T{id})}
receives a string representation of the value of the inherited metavariable
while the right component receives its abstract syntax tree. The file
\texttt{parsing\_c/ast\_c.ml} in the Coccinelle implementation gives some
information about the structure of the abstract syntax tree. Either the
left or right component may be replaced by \verb+_+, indicating that the
string representation or abstract syntax trees representation is not
wanted, respectively.
The abstract syntax tree of a metavariable declared using {\tt
metavariable} is not available.
Script metavariables can have default values. This is only allowed if the
abstract syntax tree of the metavariable is not requested. The default
value of a position metavariable is written as {\tt []}. The default value
of any other kind of metavariable is a string. There is no control that
the string actually represents the kind of term represented by the
metavariable. Normally, a script rule is only applied if all of the
metavariables have values. If default values are provided, then the script
rule is only applied if all of the metavariables for which there are no
default values have values. See {\tt demos/defaultscript.cocci} for examples of
the use of this feature.
\section{Control Flow}
Rules describe a property that Coccinelle must match, and when the
property described is matched the rule is considered successful. One aspect
that is taken into account in determining a match is the program control
flow. A control flow describes a possible run time path taken by a program.
\subsection{Basic dots}
When using Coccinelle, it is possible to express matches of certain code
within certain types of control flows. Ellipses (``...'') can be used to
indicate to Coccinelle that anything can be present between consecutive
statements. For instance the following SmPL patch tells Coccinelle that
rule r0 wishes to remove all calls to function c().
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
@r0@
@@
-c();
\end{lstlisting}\\
\end{tabular}
\end{center}
The context of the rule provides no other guidelines to Coccinelle
about any possible control flow other than this is a statement, and that
c() must be called. We can modify the required control flow required for
this rule by providing additional requirements and using ellipses in between.
For instance, if we only wanted to remove calls to c() that also
had a prior call to foo() we'd use the following SmPL patch:
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
@r1@
@@
foo()
...
-c();
\end{lstlisting}\\
\end{tabular}
\end{center}
Note that the region matched by ``...'' can be empty.
\subsection{Dot variants}
There are two possible modifiers to the control flow for ellipses, one
(<... ...>) indicates that matching the pattern in between the ellipses is
to be matched 0 or more times, i.e., it is
optional, and another (<+... ...+>) indicates that the pattern in between
the ellipses must be matched at least once, on some control-flow path. In
the latter, the \texttt{+} is intended to be reminiscent of the \texttt{+}
used in regular expressions. For instance, the following SmPL patch tells
Coccinelle to remove all calls to c() if foo() is present at least
once since the beginning of the function.
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
@r2@
@@
<+...
foo()
...+>
-c();
\end{lstlisting}\\
\end{tabular}
\end{center}
Alternatively, the following indicates that foo() is allowed but optional.
This case is typically most useful when all occurrences, if any, of foo()
prior to c() should be transformed.
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
@r3@
@@
<...
foo()
...>
-c();
\end{lstlisting}\\
\end{tabular}
\end{center}
\subsection{An example}
Let's consider some sample code to review: flow1.c.
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=C]
int main(void)
{
int ret, a = 2;
a = foo(a);
ret = bar(a);
c();
return ret;
}
\end{lstlisting}\\
\end{tabular}
\end{center}
Applying the SmPL rule r0 to flow1.c would remove the c() line as the control
flow provides no specific context requirements. Applying rule r1 would also
succeed as the call to foo() is present. Likewise rules r2 and r3 would also
succeed. If the foo() call is removed from flow1.c only rules r0 and r3 would
succeed, as foo() would not be present and only rules r0 and r3 allow for
foo() to not be present.
One way to describe code control flow is in terms of McCabe cyclomatic
complexity.
The program flow1.c has a linear control flow, i.e., it has no
branches. The main
routine has a McCabe cyclomatic complexity of 1. The McCabe cyclomatic
complexity can be computed using
{\tt pmccabe} (https://www.gnu.org/\-software/\-complexity/\-manual/\-html\_node/\-pmccabe-parsing.html).
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=C]
pmccabe /flow1.c
1 1 5 1 10 flow1.c(1): main
\end{lstlisting}\\
\end{tabular}
\end{center}
Since programs can use branches, often times you may also wish to annotate
requirements for control flows in consideration for branches, for when
the McCabe cyclomatic complexity is > 1. The following program, flow2.c,
enables the control flow to diverge on line 7 due to the branch, if (a) --
one control flow possible is if (a) is true, another when if (a) is false.
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=C]
int main(void)
{
int ret, a = 2;
a = foo(a);
ret = bar(a);
if (a)
c();
return ret;
}
\end{lstlisting}\\
\end{tabular}
\end{center}
This program has a McCabe cyclomatic complexity of 2.
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=C]
pmccabe flow2.c
2 2 6 1 11 flow2.c(1): main
\end{lstlisting}\\
\end{tabular}
\end{center}
Using the McCabe cyclomatic complexity is one way to get an idea of
the complexity of the control graph for a function, another way is
to visualize all possible paths. Coccinelle provides a way to visualize
control flows of programs, this however requires {\tt dot}
(http://www.graphviz.org/) and {\tt gv} to be installed (typically provided
by a package called graphviz). To visualize control flow or a program
using Coccinelle you use:
\begin{center}
\begin{tabular}{c}
spatch -{}-control-flow-to-file flow1.c \\
spatch -{}-control-flow-to-file flow2.c
\end{tabular}
\end{center}
%Below are the two generated control flow graphs for flow1.c and flow2.c
%respectively.
%\begin{figure}
% \[\includegraphics[width=\linewidth]{flow1.pdf}\]
% \caption{Linear flow example}
% \label{devmodel}
%\end{figure}
%\begin{figure}
% \[\includegraphics[width=\linewidth]{flow2.pdf}\]
% \caption{Linear flow example}
% \label{devmodel}
%\end{figure}
Behind the scenes this generates a dot file and uses gv to generate
a PDF file for viewing. To generate and inspect these manually you
can use the following:
\begin{center}
\begin{tabular}{c}
spatch -{}-control-flow-to-file flow2.c \\
dot -Tpdf flow1:main.dot > flow1.pdf
\end{tabular}
\end{center}
By default properties described in a rule must match all control
flows possible within a code section being inspected by Coccinelle.
So for instance, in the following SmPL patch rule r1 would match all
the control flow possible on flow1.c as its linear, however it would
not match the control possible on flow2.c. The rule r1 would not
be successful in flow2.c
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
@r1@
@@
foo()
...
-c();
\end{lstlisting}\\
\end{tabular}
\end{center}
The default control flow can be modified by using the keyword ``exists''
following the rule name. In the following SmPL patch the rule r2 would
be successful on both flow1.c and flow2.c
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
@r2 exists@
@@
foo()
...
-c();
\end{lstlisting}\\
\end{tabular}
\end{center}
If the rule name is followed by the ``forall'' keyword, then all control flow
paths must match in order for the rule to succeed. By default when a
semantic patch has ``-'' and ``+'', or when it has no annotations at all and
only script code, ellipses (``...'') use the forall semantics. And when the
semantic patch uses the context annotation (``*''), the ellipses (``...'') uses
the exists semantics. Using the keyword ``forall'' or ``exists'' in the rule
header affects all ellipses (``...'') uses in the rule. You can also annotate
each ellipses (``...'') with ``when exists'' or ``when forall'' individually.
Rules can also be not be successful if requirements do not match
when a rule name is followed by ``depends on XXX''. When ``depends on'' is used
it means the rule should only apply if rule XXX matched with the current
metavariable environment. Alternatively, ``depends on ever XXX'' can be used
as well, this means this rule should apply if rule XXX was ever matched at
all. A counter to this use is ``depends on never XXX'', which means that this
rule should apply if rule XXX was never matched at all.
\section{Transformation}
Coccinelle semantic patches are able to transform C code.
\subsection{Basic transformations}
The transformation specification essentially has the form of C code, except
that lines to remove are annotated with \verb+-+ in the first column, and
lines to add are annotated with \verb-+-. A transformation specification
can also use {\em dots}, ``\verb-...-'', describing an arbitrary sequence
of function arguments or instructions within a control-flow path.
Implicitly, ``\verb-...-'' matches the shortest path between something that
matches the pattern before the dots (or the beginning of the function, if
there is nothing before the dots) and something that matches the pattern
after the dots (or the end of the function, if there is nothing after the
dots). Dots may be modified with a {\tt when} clause, indicating a pattern
that should not occur anywhere within the matched sequence. The shortest
path constraint is implemented by requiring that the pattern (if any)
appearing immediately before the dots and the pattern (if any) appearing
immediately after the dots are not matched by the code matched by the dots.
{\tt when any}
removes the aforementioned constraint that ``\verb-...-'' matches the
shortest path. Finally, a transformation can specify a disjunction of
patterns, of the form \mtt{( \mth{\mita{pat}_1} | \mita{\ldots} |
\mth{\mita{pat}_n} )} where each \texttt{(}, \texttt{|} or \texttt{)} is
in column 0 or preceded by \texttt{\textbackslash}.
Similarly, a transformation can specify a conjunction of
patterns, of the form \mtt{( \mth{\mita{pat}_1} \& \mita{\ldots} \&
\mth{\mita{pat}_n} )} where each \texttt{(}, \texttt{\&} or \texttt{)} is
in column 0 or preceded by \texttt{\textbackslash}. All of the patterns
must be matched at the same place in the control-flow graph.
The grammar that we present for the transformation is not actually the
grammar of the SmPL code that can be written by the programmer, but is
instead the grammar of the slice of this consisting of the {\tt -}
annotated and the unannotated code (the context of the transformed lines),
or the {\tt +} annotated code and the unannotated code. For example, for
parsing purposes, the following transformation
%presented in Section \ref{sec:seq2}
is split into the two variants shown below and each is parsed
separately.
\begin{center}
\begin{tabular}{c}
\begin{lstlisting}[language=Cocci]
proc_info_func(...) {
<...
@-- hostno
@++ hostptr->host_no
...>
}
\end{lstlisting}\\
\end{tabular}
\end{center}
{%\sizecodebis
\begin{center}
\begin{tabular}{p{5cm}p{3cm}p{5cm}}
\begin{lstlisting}[language=Cocci]
proc_info_func(...) {
<...
@-- hostno
...>
}
\end{lstlisting}
&&
\begin{lstlisting}[language=Cocci]
proc_info_func(...) {
<...
@++ hostptr->host_no
...>
}
\end{lstlisting}
\end{tabular}
\end{center}
}
\noindent
Requiring that both slices parse correctly ensures that the rule matches
syntactically valid C code and that it produces syntactically valid C code.
The generated parse trees are then merged for use in the subsequent
matching and transformation process.
The grammar for the minus or plus slice of a transformation is as follows:
\begin{grammar}
\RULE{\rt{transformation}}
\CASE{\some{\NT{include}}}