chapters/lexicalstructure.tex

\chapter{Lexical Structure}\doublelabel{lexical-structure}

This chapter describes several of the basic building blocks of Modelica
such as characters and lexical units including identifiers and literals.
Without question, the smallest building blocks in Modelica are single
characters belonging to a character set. Characters are combined to form
lexical units, also called tokens. These tokens are detected by the
lexical analysis part of the Modelica translator. Examples of tokens are
literal constants, identifiers, and operators. Comments are not really
lexical units since they are eventually discarded. On the other hand,
comments are detected by the lexical analyzer before being thrown away.

The information presented here is derived from the more formal
specification in \autoref{modelica-concrete-syntax}.

\section{Character Set}\doublelabel{character-set}

The character set of the Modelica language is Unicode, but restricted to
the Unicode characters corresponding to 7-bit ASCII characters in
several places; for details see \autoref{lexical-conventions}.

\section{Comments}\doublelabel{comments}

There are two kinds of comments in Modelica which are not lexical units
in the language and therefore are treated as whitespace by a Modelica
translator. The whitespace characters are space, tabulator, and line
separators (carriage return and line feed); and whitespace cannot occur
inside tokens, e.g., \textless{}= must be written as two characters
without space or comments between them.  The following comment variants are
available:
%TODO-FORMAT should be a table instead of lstlisting?
\begin{lstlisting}[language=modelica]
// comment & Characters from // to the end of the line are ignored.
/* comment */ & Characters between /* and */ are ignored, including line terminators.
\end{lstlisting}

\begin{nonnormative}
The comment syntax is identical to that of C++.
\end{nonnormative}

Modelica comments do not nest, i.e., /* */ cannot be embedded within /*
*/. The following is \emph{invalid}:
\begin{lstlisting}[language=modelica]
/* Commented out - erroneous comment, invalid nesting of comments!
  /* This is an interesting model */
  model interesting
  ...
  end interesting;
*/
\end{lstlisting}

There is also a description-string, that is part of the Modelica language and
therefore not ignored by the Modelica translator. Such a description-string may
occur at the end of a declaration, equation, or statement or at the
beginning of a class definition. For example:
\begin{lstlisting}[language=modelica]
model TempResistor "Temperature dependent resistor"
  ...
  parameter Real R "Resistance for reference temp.";
  ...
end TempResistor;
\end{lstlisting}

\section{Identifiers, Names, and Keywords}\doublelabel{identifiers-names-and-keywords}

\emph{Identifiers} are sequences of letters, digits, and other
characters such as underscore, which are used for \emph{naming} various
items in the language. Certain combinations of letters are
\emph{keywords} represented as \emph{reserved} words in the Modelica
grammar and are therefore not available as identifiers.

\subsection{Identifiers}\doublelabel{identifiers}

Modelica \emph{identifiers}, used for naming classes, variables,
constants, and other items, are of two forms. The first form always
starts with a letter or underscore (\_), followed by any number of
letters, digits, or underscores. Case is significant, i.e., the names
\lstinline!Inductor! and \lstinline!inductor! are different. The second form \lstinline!(Q-IDENT)! starts
with a single quote, followed by a sequence of any printable ASCII
character, where single-quote must be preceded by backslash, and
terminated by a single quote, e.g. \lstinline!'12H'!, \lstinline!'13\'H'!,
\lstinline!'+foo'!. Control characters in quoted identifiers have to use string
escapes.
The single quotes are part of the identifier, i.e., \lstinline!'x'! and \lstinline!x!
are distinct identifiers. The redundant escapes (\lstinline!'\?'! and \lstinline!'\"'!) are the same as the corresponding non-escaped
variants (\lstinline!'?'! and \lstinline!'"'!), but are only for use in Modelica source code.
A full BNF definition of the Modelica syntax and
lexical units is available in \autoref{modelica-concrete-syntax}.

\begin{lstlisting}[language=grammar]
IDENT   = NONDIGIT { DIGIT | NONDIGIT } | Q-IDENT
Q-IDENT = "'" { Q-CHAR | S-ESCAPE | """ } "'"
NONDIGIT = "_" | letters "a" ... "z" | letters "A" ... "Z"
DIGIT    = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Q-CHAR = NONDIGIT | DIGIT | "!" | "#" | "$" | "%" | "&" | "(" | ")" | "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | "<" | ">" | "=" | "?" | "@" | "[" | "]" | "^" | "{" | "}"  | "|" | "~" | " "
S-ESCAPE = "\'" | "\"" | "\?" | "\\" | "\a" | "\b" | "\f" | "\n" | "\r" | "\t" | "\v"
\end{lstlisting}

\subsection{Names}\doublelabel{names}

A \emph{name} is an identifier with a certain interpretation or meaning.
For example, a name may denote an \lstinline!Integer! variable, a \lstinline!Real! variable, a
function, a type, etc. A name may have different meanings in different
parts of the code, i.e., different scopes. The interpretation of
identifiers as names is described in more detail in \autoref{scoping-name-lookup-and-flattening}. The
meaning of package names is described in more detail in \autoref{packages}.

\subsection{Modelica Keywords}\doublelabel{modelica-keywords}

The following Modelica \emph{keywords} are reserved words and may not be
used as identifiers, except as listed in \autoref{lexical-conventions}:
\begin{longtable}[c]{@{}lllll@{}}
\lstinline!algorithm! & \lstinline!discrete! & \lstinline!false! & \lstinline!loop! & \lstinline!pure!\\ \hline
\lstinline!and! & \lstinline!each! & \lstinline!final! & \lstinline!model! & \lstinline!record!\\ \hline
\lstinline!annotation! & \lstinline!else! & \lstinline!flow! & \lstinline!not! & \lstinline!redeclare!\\ \hline
& \lstinline!elseif! & \lstinline!for! & \lstinline!operator! & \lstinline!replaceable!\\ \hline
\lstinline!block! & \lstinline!elsewhen! & \lstinline!function! & \lstinline!or! & \lstinline!return!\\ \hline
\lstinline!break! & \lstinline!encapsulated! & \lstinline!if! & \lstinline!outer! & \lstinline!stream!\\ \hline
\lstinline!class! & \lstinline!end! & \lstinline!import! & \lstinline!output! & \lstinline!then!\\ \hline
\lstinline!connect! & \lstinline!enumeration! & \lstinline!impure! & \lstinline!package! & \lstinline!true!\\ \hline
\lstinline!connector! & \lstinline!equation! & \lstinline!in! & \lstinline!parameter! & \lstinline!type!\\ \hline
\lstinline!constant! & \lstinline!expandable! & \lstinline!initial! & \lstinline!partial! & \lstinline!when!\\ \hline
\lstinline!constrainedby! & \lstinline!extends! & \lstinline!inner! & \lstinline!protected! & \lstinline!while!\\ \hline
\lstinline!der! & \lstinline!external! & \lstinline!input! & \lstinline!public! & \lstinline!within!\\ \hline
\end{longtable}

\section{Literal Constants}\doublelabel{literal-constants}

Literal constants are unnamed constants that have different forms
depending on their type. Each of the predefined types in Modelica has a
way of expressing unnamed constants of the corresponding type, which is
presented in the ensuing subsections. Additionally, array literals and
record literals can be expressed.

\subsection{Floating Point Numbers}\doublelabel{floating-point-numbers}

A floating point number is expressed as a decimal number in the form of
a sequence of decimal digits followed by a decimal point, followed by decimal digits,
followed by an exponent indicated by E or e followed by a sign
and one or more decimal digits. The various parts can be omitted, see \lstinline!UNSIGNED_REAL! in~\autoref{lexical-conventions} for
details and also the examples below. The minimal recommended range is
that of IEEE double precision floating point numbers, for which the
largest representable positive number is 1.7976931348623157E+308 and the
smallest positive number is 2.2250738585072014E-308. For example, the
following are floating point number literal constants:
\begin{lstlisting}[language=modelica]
22.5, 3.141592653589793, 1.2E-35
\end{lstlisting}

The same floating point number can be represented by different literals.
For example, all of the following literals denote the same number:
\begin{lstlisting}[language=modelica]
13., 13E0, 1.3e1, 0.13E2, .13E2
\end{lstlisting}
The last variant shows that that the leading zero is optional (in that case decimal digits must be present).
Note that \lstinline!13! is not in this list, since it is not a floating point number,
but can be converted to a floating point number.

\subsection{Integer Literals}\doublelabel{integer-literals}

Literals of type \lstinline!Integer! are sequences of decimal digits, e.g. as in the
integer numbers \lstinline!33!, \lstinline!0!, \lstinline!100!, \lstinline!30030044!. The minimal
recommended number range is from -2147483648 to +2147483647 corresponding to a
two's-complement 32-bit integer implementation.

\begin{nonnormative}
Negative numbers are formed by unary minus followed by an integer literal.
\end{nonnormative}

\subsection{Boolean Literals}\doublelabel{boolean-literals}

The two \lstinline!Boolean! literal values are \lstinline!true! and \lstinline!false!.

\subsection{Strings}\doublelabel{strings}

String literals appear between double quotes as in \lstinline!"between"!. Any
character in the Modelica language character set (see \autoref{lexical-conventions} for
allowed characters) apart from double quote (\lstinline!"!) and backslash
(\lstinline!\!), including new-line, can be \emph{directly} included
in a string without using an escape code. Certain characters in string
literals can be represented using escape codes, i.e., the character is
preceded by a backslash (\lstinline!\!) within the string. Those
characters are:
\begin{longtable}[c]{@{}ll@{}}
\lstinline!\'! & single quote -- may also appear without backslash in string constants.\\
\lstinline!\"! & double quote\\
\lstinline!\?! & question-mark -- may also appear without backslash in string constants.\\
\lstinline!\\! & backslash itself\\
\lstinline!\a! & alert (bell, code 7, ctrl-G)\\
\lstinline!\b! & backspace (code 8, ctrl-H)\\
\lstinline!\f! & form feed (code 12, ctrl-L)\\
\lstinline!\n! & newline (code 10, ctrl-J), same as literal newline\\
\lstinline!\r! & carriage return (code 13, ctrl-M)\\
\lstinline!\t! & horizontal tab (code 9, ctrl-I)\\
\lstinline!\v! & vertical tab (code 11, ctrl-K)\\
\end{longtable}

For example, a string literal containing a tab, the words: \emph{This is},
double quote, space, the word: \emph{between}, double quote, space, the word:
\emph{us}, and new-line, would appear as follows:
\begin{lstlisting}[language=modelica]
"\tThis is\" between\" us\n"
\end{lstlisting}

Concatenation of string literals in certain situations (see the Modelica
grammar) is denoted by the + operator in Modelica, e.g. \lstinline!"a"! + \lstinline!"b"!
becomes \lstinline!"ab"!. This is useful for expressing long string literals that
need to be written on several lines.

The \lstinline!"\n"! character is used to conceptually indicate the
end of a line within a Modelica string. Any Modelica program that needs
to recognize line endings can check for a single \lstinline!"\n"!
character to do so on any platform. It is the responsibility of a
Modelica implementation to make any necessary transformations to other
representations when writing to or reading from a text file.

\begin{nonnormative}
For example, a \lstinline!"\n"! is written and read as-is in a Unix or Linux implementation, but written as
\lstinline!"\r\n"! pair, and converted back to \lstinline!"\n"! when read in a Windows implementation.
\end{nonnormative}

\begin{nonnormative}
For long string comments, e.g., the \lstinline!info! annotation to
store the documentation of a model, it would be very inconvenient, if
the string concatenation operator would have to be used for every line
of documentation. It is assumed that a Modelica tool supports the
non-printable newline character when browsing or editing a string
literal. For example, the following statement defines one string that
contains (non-printable) newline characters:
\begin{lstlisting}[language=modelica]
assert(noEvent(length > s_small), "
The distance between the origin of frame_a and the origin of frame_b
of a LineForceWithMass component became smaller as parameter s_small
(= a small number, defined in the
\"Advanced\" menu). The distance is
set to s_small, although it is smaller, to avoid a division by zero
when computing the direction of the line force.",
level = AssertionLevel.warning);
\end{lstlisting}
\end{nonnormative}

\section{Operator Symbols}\doublelabel{operator-symbols}

The predefined operator symbols are formally defined on page \pageref{lexical-conventions} and
summarized in the table of operators in \autoref{operator-precedence-and-associativity}.