-
Notifications
You must be signed in to change notification settings - Fork 2
Tokens
This wiki page is currently under construction and not linked to the capabilities
Based on the collection of tokens that we know can be emitted by R, this page is almost a verbatim copy of the corresponding section of the appendix in my master's thesis.
Every R program is an expression list, identified by the exprlist
token type, which consists of several expressions (identified by expr
).
Consider the following example:
x <- 1 + 2
if(x > 0) {
print("Hello World!")
}
y <- 3
The corresponding expression list consists of three expressions, namely the assignment of 1 + 2
to x
, the if
construct, and the assignment of 3
to y
.
With the following tables, we provide what is to our knowledge a full list of token types that R produces. For more information, the "Syntax" topic of the R documentation offers a great starting point.
- Tokens Representing Constants
- Tokens Representing Assignments
- Tokens Representing Operators (No Assignments)
- Tokens Representing Control-Flow Structures
- Tokens Indicating a Function Definition
- Tokens Used to Access Objects
- Tokens Representing Names
- Tokens Representing Comments or Directives
- Tokens Used To Delimit Parts of Expressions
- Tokens Representing Meta-Elements
Besides PIPEBIND
, which at the moment must be enabled explicitly by setting an environment variable all the tokens shown in the tables are supported by the normalization of flowR - although this is different from supporting all of their uses.
It should be noted that there are many tokens that appear in the source code of the R interpreter but are not listed within the tables.
While some of these tokens, like COLON_ASSIGN
, are explicitly marked as deprecated, several of them seem to be for internal use only and are - to the best of our knowledge - never emitted by the parser with getParseData
. For example:
- newlines are directly consumed to split expressions,
- error tokens produce an explicit error message, and
- the individual tokens for unary operators (like
UPLUS
) are transformed to the same token as their binary counterparts (like+
).
# | ✓ | Token | Description |
---|---|---|---|
T1 | ✓ | NULL_CONST |
Represents NULL. |
T2 | ✓ | NUM_CONST |
Identifies a number (including NA) or a logical, depending on the lexeme. |
T3 | ✓ | STR_CONST |
A string, independent of the quotation mark. |
# | ✓ | Token | Description |
---|---|---|---|
T4 | ✓ | EQ_ASSIGN |
A local equal assignment. Differentiate this from EQ-SUB , which has a slightly different semantic. |
T5 | ✓ | EQ_FORMALS |
Essentially EQ-ASSIGN , but when used within formals. |
T6 | ✓ | EQ_SUB |
Essentially EQ-ASSIGN , but when used to name arguments for function call or arguments in access. |
T7 | ✓ | LEFT_ASSIGN |
A local left assignment or global left assignment. Includes := , originally bound to COLON_ASSIGN . |
T8 | ✓ | RIGHT_ASSIGN |
A local right assignment or global right assignment. |
# | ✓ | Token | Description |
---|---|---|---|
T9 | ✓ | AND |
The vectorized logical and binary operator (& ). |
T10 | ✓ | AND2 |
Non-vectorized logical and binary operator (&& ). |
T11 | ✓ | EQ |
The equality binary operator. |
T12 | ✓ | GE |
Vectorized binary operator greater-than-or-equal-to. |
T13 | ✓ | GT |
Vectorized binary operator greater-than. |
T14 | ✓ | LE |
Vectorized binary operator less-than-or-equal-to. |
T15 | ✓ | LT |
Vectorized binary operator less-than. |
T16 | ✓ | NE |
The inequality binary operator. |
T17 | ✓ | OR |
The vectorized logical or binary operator (| ). |
T18 | ✓ | OR2 |
Non-vectorized logical or binary operator (|| ). |
T19 | ✓ | PIPE |
The native pipe (introduced R 4.1.0) |
T20 | PIPEBIND |
The native pipebind. | |
T21 | ✓ | SPECIAL |
Represents all binary operators of the form %x% . |
T22 | ✓ | + |
Vectorized binary operator addition or unary operator plus. |
T23 | ✓ | - |
Vectorized binary operator subtraction or unary operator minus. |
T24 | ✓ | * |
The vectorized binary operator multiplication. |
T25 | ✓ | / |
The vectorized binary operator division. |
T26 | ✓ | : |
The non-vectorized sequence operator. |
T27 | ✓ | ! |
The vectorized logical unary operator not operator. |
T28 | ✓ | ^ |
The vectorized exponentiation operator (exponent must be scalar). |
T29 | ✓ | ? |
Triggers the help action. |
T30 | ✓ | ~ |
Signals a model formula. |
# | ✓ | Token | Description |
---|---|---|---|
T31 | ✓ | BREAK |
The break construct in a loop. |
T32 | ✓ | ELSE |
Signals start of the else part of an IF statement. |
T33 | ✓ | FOR |
Start of a for loop structure. |
T34 | ✓ | forcond |
Signals the x IN v head of the FOR loop. |
T35 | ✓ | IF |
Start of an if conditional structure. |
T36 | ✓ | IN |
Used to separate name and vector in a for loop. |
T37 | ✓ | NEXT |
The next construct in a loop. |
T38 | ✓ | REPEAT |
Start of a repeat loop structure. |
T39 | ✓ | WHILE |
Start of a while loop structure. |
# | ✓ | Token | Description |
---|---|---|---|
T40 | ✓ | FUNCTION |
Indicates the start of a function definition. |
T41 | ✓ | \ |
Indicates the start of a lambda function. |
# | ✓ | Token | Description |
---|---|---|---|
T42 | ✓ | LBB |
Indicates the start of a double bracket access. |
T43 | ✓ | SLOT |
Target of a slotted access. |
T44 | ✓ | $ |
Indicates dollar access. |
T45 | ✓ | @ |
Indicates slotted access. |
T46 | ✓ | [ |
Indicates the start of a single bracket access. |
T47 | ✓ | ] |
Corresponding end of [ or LBB . |
# | ✓ | Token | Description |
---|---|---|---|
T48 | ✓ | NS_GET |
The access of an exported name. |
T49 | ✓ | NS_GET_INT |
The access of an internal name. |
T50 | ✓ | SYMBOL |
A name with a potential namespace (x::y ). |
T51 | ✓ | SYMBOL_FUNCTION_CALL |
The SYMBOL of a function call. |
T52 | ✓ | SYMBOL_PACKAGE |
Name used to access a namespace (NS_GET , NS_GET_INT ). |
T53 | ✓ | SYMBOL_SUB |
The name to the left of an EQ-SUB . |
T54 | ✓ | SYMBOL_FORMALS |
The name to the left of an EQ-FORMALS . |
# | ✓ | Token | Description |
---|---|---|---|
T55 | ✓ | COMMENT |
A line comment. |
T56 | ✓ | LINE_DIRECTIVE |
A line directive comment. |
# | ✓ | Token | Description |
---|---|---|---|
T57 | ✓ | ( |
An opening parenthesis (e.g., in a FOR -loop). |
T58 | ✓ | ) |
Corresponding end of ( . |
T59 | ✓ | , |
Separates for example formal arguments. |
T60 | ✓ | ; |
Separates expressions. |
T61 | ✓ | { |
Groups expressions, no effect on scoping. |
T62 | ✓ | } |
Corresponding end of { . |
# | ✓ | Token | Description |
---|---|---|---|
T63 | ✓ | expr |
Represents an expression. |
T64 | ✓ | expr_or_assign_or_help |
Used for example by EQ-ASSIGN , with the same semantics as expr . |
T65 | ✓ | exprlist |
Added by xml_parse_data to group expressions, no longer used since #659. |