# Interpreting Computer Programs

In this chapter, we study the design of interpreters and the computational processes that they create when executing programs. Many interpreters have an elegant common structure: two mutually recursive functions. The first evaluates expressions in environments; the second applies functions to arguments.

These functions are recursive in that they are defined in terms of each other: applying a function requires evaluating the expressions in its body, while evaluating an expression may involve applying one or more functions.


## Scheme

Scheme is a dialect of Lisp, the second-oldest programming language that is still widely used today (after Fortran). Scheme encourages a functional style. Our object of study, a subset of the Scheme language, employs a very similar model of computation to Python's, but uses only expressions (no statements), specializes in symbolic computation, and employs only immutable values.

### Expressions

Scheme programs consist of expressions, which are either call expressions or special forms. 

**Call expressions**: a call expression consists of an operator expression followed by zero or more operand sub-expressions. Both the operator and operand are contained within parentheses. Scheme exclusively uses prefix notation. Call expressions can be nested, and they may span more than one line.

Scheme expressions are lists. Scheme expressions may be primitives or combinations. Number literals are primitives, while call expressions are combined forms that include arbitrary sub-expressions. The evaluation procedure of call expressions matches that of Python.

**Special forms**: they look syntactically like a call expression, they have a different evaluation procedure.
- `if`: `(if <predicate> <consequent> <alternative>)`. Evaluation procedure:
    1. Evaluate the `<predicate>`.
    2. If the `<predicate>` evaluates to a true value, the interpreter then evaluates the `<consequent>` and returns its value.
    3. Otherwise it evaluates the `<alternative>` and returns its value.
- `and`: `(and <e1> ... <en>)`. Evaluation procedure:
    1. Evaluate the expressions `<e>` one at a time, in left-to-right order.
    2. If any `<e>` evaluates to `false`, the value of the and expression is `false`, and the rest of the `<e>`'s are not evaluated.
    3. If all `<e>`'s evaluate to `true` values, the value of the and expression is the value of the last one.
- `or`: `(or <e1> ... <en>)`. Evaluation procedure:
    1. Evaluate the expressions `<e>` one at a time, in left-to-right order.
    2. If any `<e>` evaluates to a true value, that value is returned as the value of the or expression, and the rest of the `<e>`'s are not evaluated.
    3. If all `<e>`'s evaluate to false, the value of the or expression is false.
- `not`: `(not <e>)`. The value of a `not` expression is `true` when the expression `<e>` evaluates to `false`, and `false` otherwise.

In Scheme, all values are true values except for `#f`.

### Definitions

**Values** can be named using the `define` special form: `(define name value)`.

New **functions** (called *procedures* in Scheme) can be defined using a second version of the `define` special form: `(define (<name> <formal parameters>) <body>)`. User-defined functions can take multiple arguments and include special forms. Scheme supports local definitions with the same lexical scoping rules as Python.

Anonymous functions are created using the `lambda` special form. Lambda is used to create procedures in the same way as define, except that no name is specified for the procedure: `(lambda (<formal-parameters>) <body>)`. The resulting procedure is just as much a procedure as one that is created using `define`. Like any expression that has a procedure as its value, a lambda expression can be used as the operator in a call expression:
```scheme
> ((lambda (x y z) (+ x y (square z))) 1 2 3)
12
```

### Compund Values

**Pairs** are built into the Scheme language. For historical reasons, pairs are created with the `cons` built-in function, and the elements of a pair are accessed with `car` and `cdr`: `(define x (cons 1 2))`.

**Recursive lists** are also built into the language, using pairs. A special value denoted `nil` or `'()` represents the empty list. A recursive list value is rendered by placing its elements within parentheses, separated by spaces. Built-in **lists** are recursive lists, linked lists. They can be created using `list`: `(list value1 ... valueN)`, and each element accessed with `car` and `cdr`. 

Whether a list is empty can be determined using the primitive `null?` predicate. Using it, we can define the standard sequence operations for computing length and selecting elements:
```scheme
(define (length items)
  (if (null? items)
      0
      (+ 1 (length (cdr items)))))
      
(define (getitem items n)
  (if (= n 0)
      (car items)
      (getitem (cdr items) (- n 1))))
```

### Symbolic Data

In order to manipulate symbols we need a new element in our language: the ability to quote a data object. Suppose we want to construct the list `(a b)`. We can't accomplish this with `(list a b)`, because this expression constructs a list of the values of `a` and `b` rather than the symbols themselves. In Scheme, we refer to the symbols `a` and `b` rather than their values by preceding them with a single quotation mark. In Scheme, any expression that is not evaluated is said to be *quoted*.

Quotation also allows us to type in compound objects, using the conventional printed representation for lists.

Scheme expressions are lists. Lists data structure can therefore represent combinations. Using quotation we avoid evaluating the operator (it is seen as a symbol and not as an operator) and we can obtain a Scheme expression. When that expression is evaluated with `eval` it is returned the same value that it would have been returned with the call expression of the combination. We can create code that generates expressions, which we evaluate when we want.

### Macros

We would like to create our own special forms deciding which operand should be evaluted when called. Using normal call expressions it is not possible since all operands are evaluated in the call. We achieve this with macros. A **macro** is an operation performed on the source code of a program before evaluation.

The syntax for creating a `define-macro` function is the same, but the evaluation procedure not. To evaluate a `define-macro` function call:
- Evaluate operator sub-expression, which evaluates to a macro.
- Apply operator to unevaluated operands.
- Evaluate the expression returned by the macro in the frame it was called in.
Notice that the operands are evaluated inside the body and, of course, we can choose not to evaluate them all.

Macros exists in many languages, but are easiest to define correctly in a language like Lisp because its code are lists (data). 

### Quasiquote & Unquote

With macros we are interested in choosing what to evaluate and what not to. The **quasiquote** allows us to construct literal lists in a similar way as quote, but also lets us specify if any sub-expression within the list should be evaluated. This last action is achieved using **unquote**, which removes an expression from the quoted context, evaluates it, and places it back in.


### Tail Recursion

There exists an efficient way of repeating procedures for functional languages, where no re-assignment exists (no `for` loops). Remember that in Python recursive calls always creates new active frames, so they take a lot of space in computer, while a `for` loop uses the same frame but re-binds names. Tail recursion solves this problem in Scheme.

A procedure call that has not yer returned is **active**. A **tail call** occurs when a function calls another function as its last action of the current frame. In this case, the frame is no longer needed, and we can remove it from memory. Scheme implements tail-call optimization. We say that a recursive function is **tail recursive** if all of its recursive calls are tail calls.

Tail recursive processes can use a constant amount of memory because each recursive call frame does not need to be saved.

When trying to identify whether a given function call within the body of a function is a tail call, we look for whether the call expression is in **tail context**. 

Given that each of the following expressions is the last expression in the body ofthe function, the following expressions are tail contexts:
- The second or third operand in an `if`expression.
- Any of the non-predicate sub-expressions in a `cond` expression (i.e.  the second expression of each clause).
- The last operand in an `and` or an `or` expression.
- The last operand in a `begin` expression’s body.
- The last operand in a `let` expression’s body.

A way that recursive procedures might be transformed into tail recurrisve procedures is by creating a new helper function inside the original function. This helper function has one (or more) arguments than the original. The new argument is the keeps track of the modifications done to the returned value. In the base case that value is finally returned.

### `let` Special Form

The `let` special form allows you to create local bindings within Scheme. The `let` special form consists of two elements: a list of two element pairs, and a body expression. Each of the pairs contains a symbol and an expression to be bound to the symbol.
```scheme
(let ((var-1 expr-1)
      (var-2 expr-2)
      ...
      (var-n expr-n))
      body-expr)
```
When evaluating a `let` expression, a new frame local to the let expression is created. In this frame, each variable is bound to the value of its corresponding expression at the same time. Then, the body expression is evaluated in this frame using the new bindings.

### `mu` Procedure

All of the Scheme procedures we have seen so far use **lexical scoping** (or **static scoping**) but we can also have **dynamic scoping**.
- Lexical scope: the parent of the new call frame is the environment in which the procedure was *defined*.
- Dynmic scope: the parent of a new call frame is the environment in shich the procedure was *called*.

The `mu` special form is a non-standard Scheme expression type representing a procedure that is dynamically scoped. Its syntax is the same than that of a `lambda` expression, but using the `mu` keyword instead of `lambda`.

### Streams

A **stream** is a lazily computed linked list (equal to a Scheme list). It is similar to a Python iterable, but streams are immutable whereas iterables are mutable.

We use the special form `cons-stream ` to create a stream:
```scheme
(cons-stream <operand1> <operand2>)
```
To evaluate this expression,
1. Evaluate the first operand.
2. Construct a promise containing the second operand. A promise is a representation of a value that could be computed if that would be required.
3. Return a pair containing the value of the first operand and the promise.

To actually get the rest of the stream, we must call `cdr-stream` on it to force the promise to be evaluated. The `cdr-stream` of a stream must be either a stream or `nil`. Note that this argument is only evaluated once and is then stored in the promise; subsequent calls to `cdr-stream` returns the value without recomputing it. This is one of the main differences of streams with respect iterators. Both streams and iterators are useful to generate constant space procedures.


## Exceptions

There is no single correct approach to handling errors in a program. Errors can be addressed at the moment, what the Python interpreter does, or later and continuing providing a service.

**Exceptions** provide a general mechanism for adding error-handling logic to programs. **Raising an exception** is a technique for interrupting the normal flow of execution in a program, signaling that some exceptional circumstance has arisen, and returning directly to an enclosing part of the program that was designated to react to that circumstance. The Python interpreter raises an exception each time it detects an error in an expression or statement. Users can also raise exceptions with `raise` and `assert` statements.

An `exception` is an object instance with a class that inherits, either directly or indirectly, from the `BaseException` class. The `assert` statement introduced in Chapter 1 raises an exception with the class `AssertionError`. In general, any exception instance can be raised with the `raise` statement. The most common use of `raise` constructs an exception instance and raises it.
```python
>>> raise Exception('An error occurred')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
Exception: an error occurred
```
In general, the syntax is: `raise <expression>`, where `<expression>` must evaluate to a subclass of `BaseException` or an instance of one. Examples of errors: `TypeError`, `NameError`, `KeyError`, `RuntimeError`...

When an exception is raised:
- No further statements in the current block of code are executed. Unless the exception is *handled* (described below), the interpreter will return directly to the interactive read-eval-print loop (*halt* the interpreter), or terminate entirely if Python was started with a file argument.
- The interpreter will print a **stack backtrace**, which is a structured block of text that describes the nested set of active function calls in the branch of execution in which the exception was raised. In the example above, the file name `<stdin>` indicates that the exception was raised by the user in an interactive session, rather than from code in a file.

An assert statement raises an exception of type `AssertionError`. Syntax: `assert <expression>, <string>`. If `<expression>` is not a true value, an assrtion error is created using the `<string>` as an argument.

Exceptions tend to make the program slower. If Python is executed with the flag `-O` (optimized) then no assert statements are executed. Whether assertions are enabled is governed by a bool value, `__debug__`.

**Handling exceptions**. An exception can be handled by an enclosing try statement. A `try` statement consists of multiple clauses; the first begins with `try` and the rest begin with `except`:
```python
try:
    <try suite>
except <exception class> as <name>:
    <except suite>
...
```
Evaluation procedure:
1. Execute `<try suite>`. 
2. If during the execution of `<try suite>` and exception is raised, control jumps directly to the body of the `<except suite>` of the most recent `try` statement that handles that type of exception.
3. If the class of the exception coincides with an `<exception class>` or inherits from it, bound the identifier `<name>` to the exception object that was raised and execute the `<except suite>`, which will handle the exception. The binding does not persist beyond the `<except suite>`.

Exceptions enable non-local continuations of control. That is, if `f` calls `g` and `g` calls `h`, exceptions can shift control from `h` to `f` without waiting for `g` to return.

### Exception Objects

Exception objects themselves can have attributes, such as the error message stated in an assert statement and information about where in the course of execution the exception was raised. User-defined exception classes can have additional attributes. Exceptions are another technique that help us as programs to separate the concerns of our program into modular parts.  Python's exception mechanism allow us to separate the logic for the execution of some statements, which appearsin the suite of the `try` clause, from the logic for handling errors, which appears in except clauses.


## Interpreters for Languages with Combination

**Metalinguistic abstraction** — establishing new languages in terms of other languages for a particular problem domain — plays an important role in all branches of engineering design. We can implement these languages by constructing interpreters. An **interpreter** for a programming language is a function that, when applied to an expression of the language, performs the actions required to evaluate that expression.

### Programming Languages and Paradigms

Types of programming languages:
- **Physical languages**, such as the machine languages for particular computers. These languages are concerned with the representation of data and control in terms of individual bits of storage and primitive machine instructions. The machine-language programmer is concerned with using the given hardware to erect systems and utilities for the efficient implementation of resource-limited computations. Statements are interpreted by the hardware itself.
- **High-level languages**, erected on a machine-language substrate, hide concerns about the representation of data as collections of bits and the representation of programs as sequences of primitive instructions. These languages have means of combination and abstraction, such as function definition, that are appropriate to the larger-scale organization of software systems. Statements and expressions are interpreted by another program or compiled (translated) into another language.

A programming language has:
- **Syntax**: the legal statements and expressions in the language.
- **Semantics**: the execution/evaluation rule for those statements and expressions.

To create a new language, you either need a:
- **Specification**: a document describing the precise syntax and semantics of the language.
- **Canonical implementation**: an interpreter or compiler for the language.

*Bibliography*: https://stackoverflow.com/questions/1784664/what-is-the-difference-between-declarative-and-imperative-programming and Wikipedia.

A **programming paradigm** is a fundamental style of computer programming. There are four main paradigms: imperative, declarative, functional (which is considered a subset of the declarative paradigm) and object-oriented.
- **Declarative programming**: is a programming paradigm that expresses the logic of a computation (What do) without describing its control flow (How do). 
    - Some well-known examples of declarative domain specific languages (DSLs) include CSS, regular expressions, and a subset of SQL (SELECT queries, for example). Many markup languages such as HTML, MXML, XAML, XSLT... are often declarative. 
    - The declarative programming try to blur the distinction between a program as a set of instructions and a program as an assertion about the desired answer.
- **Imperative programming**: is a programming paradigm that describes computation in terms of statements that change a program state. 
    - The declarative programs can be dually viewed as programming commands or mathematical assertions. 
    - Many (if not all) declarative approaches have some sort of underlying imperative abstraction.
- **Functional programming**: is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. 
    - It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state.
    - There are no re-assignments nd no mutable data. In a pure functional language, such as Haskell, all functions are without side effects, and state changes are only represented as functions that transform the state.
    - Name-value bindings are permanent.
    - Referential transparency (the value of an expression does not change when one of its subexpressions is substituted with the value of that subexpression).
    - Sub-expressions can safely be evaluated in parallel or on demand (lazily).

Scheme is a functional programming language. Scheme employs a very similar model of computation to Python's, but uses only expressions (no statements), specializes in symbolic computation, and employs only immutable values. Python is a mix of imperative and declarative (https://blog.newrelic.com/engineering/python-programming-styles/).

### Expression Trees

An **expression tree** is a set of expressions structured in a tree-like manner, i.e., hierarchical-structured nested expressions. In the Calculator evaluator nodes are either leaves with base cases (self-evaluating expressions) or operators which are the root nodes for other sub-trees.

Until this point in the course, expression trees have been conceptual entities to which we have referred in describing the process of evaluation; we have never before explicitly represented expression trees as data in our programs. In order to write an interpreter, we must operate on expressions as data.

Nested pairs can represent lists, but the elements of a list can also be lists themselves. Pairs are therefore sufficient to represent Scheme expressions, which are in fact nested lists. All Calculator expressions are nested Scheme lists.

Our Calculator interpreter will read in nested Scheme lists, convert them into expression trees represented as nested Pair instances (*Parsing expressions* below), and then evaluate the expression trees to produce values (*Calculator evaluation* below).

### Parsing Expressions

**Parsing** is the process of generating expression trees from raw text input. A parser is a composition of two components: a lexical analyzer and a syntactic analyzer.
- The **lexical analyzer** partitions the input string into tokens, which are the minimal syntactic units of the language such as names and symbols.
- The **syntactic analyzer** constructs an expression tree from this sequence of tokens. The sequence of tokens produced by the lexical analyzer is consumed by the syntactic analyzer.

**Lexical analysis**. The component that interprets a string as a token sequence is called a **tokenizer** or **lexical analyzer**. Scheme tokens are delimited by white space, parentheses, dots, or single quotation marks. Delimiters are tokens, as are symbols and numerals. The tokenizer analyzes a line character by character, validating the format of symbols and numerals. Lexical analysis is an iterative process.

Tokenizing a well-formed Calculator expression separates all symbols and delimiters, but identifies multi-character numbers (e.g., 2.3) and converts them into numeric types.

**Syntactic analysis**. The component that interprets a token sequence as an expression tree is called a **syntactic analyzer**. Syntactic analysis is a tree-recursive process, and it must consider an entire expression that may span multiple lines. Analyzing a sequence of tokens often involves analyzing a subsequence of those tokens into a subexpression, which itself serves as a branch (e.g., operand) of a larger expression tree. Recursion generates the hierarchical structures consumed by the evaluator.

**Recursive syntactic analysis**. A **predictive recursive descent parser** inspects only $k$ tokens to decide how to proceed for some fixed $k$. It does not need to look very far ahead to understand what is happening at some point in the program. The English language cannot be parsed via predictive recursive descent. Programming languages can be parsed via predictive recursive descent.

Each call to `scheme_read` consumes the input tokens for exactly one expression. 
- Base case: symbols and numbers.
- Recursive call: scheme_read sub-expressions and combine them. Sub-expressions starts with a parenthesis insde an unclosed parenthesis.

The `scheme_read` function expects its input `src` to be a `Buffer` instance that gives access to a sequence of tokens. A `Buffer` collects tokens that span multiple lines into a single object that can be analyzed syntactically.

The `scheme_read` function first checks for various base cases, including empty input (which raises an end-of-file exception, called `EOFError` in Python) and primitive expressions. A recursive call to `read_tail` is invoked whenever a `(` token indicates the beginning of a list.

The `read_tail` function continues to read from the same input `src`, but expects to be called after a list has begun. Its base cases are an empty input (an error) or a closing parenthesis that terminates the list. Its recursive call reads the first element of the list with `scheme_read`, reads the rest of the list with `read_tail`, and then returns a list represented as a `Pair`.

### Calculator Evaluation

The `scalc` module implements an evaluator for the Calculator language. The `calc_eval` function takes an expression as an argument and returns its value.

For the Calculator syntax, the only two legal syntactic forms of expressions are numbers and call expressions, which are `Pair` instances representing well-formed Scheme lists. For the Calculator semantics, numbers are *self-evaluating*; they can be returned directly from `calc_eval`. Call expressions require function application.

Call expressions are evaluated by first recursively mapping the `calc_eval` function to the list of operands, which computes a list of arguments. Then, the operator is applied to those arguments in a second function, `calc_apply`. In `calc_apply`, each conditional clause corresponds to applying one operator. Notice that we have two functions, as noted at the beggining of the chapter: one for application of functions and one for evaluation, defined in terms of the other.

**Read-eval-print loops**. A REPL is a mode of interaction that reads an expression, evaluates it, and prints the result for the user. The Python interactive session is an example of such a loop. 

The function `read_eval_print_loop` buffers input from the user, constructs an expression using the language-specific `scheme_read` function, then prints the result of applying `calc_eval` to that expression.

Therefore, the interactive interpreters does:
- Print a prompt.
- *Read* the text input from the user.
- Parse the text input into an expression.
- *Evaluate* the expression.
- If any errors occur, report those errors. Otherwise,
- *Print* the value of the expression.
- Repeat.

Despite exceptions being raised all over the code, exception handling only occurs in the REPL.  A well-designed interactive interpreter should not halt completely on an error, so that the user has an opportunity to try again in the current environment. In our interpreter the only way to exit it is by a KeyboardInterrupt or an EOFError.


## Interpreters for Languages with Abstraction

Calculator does not support abstraction in any way. As a result, it is not a particularly powerful or general programming language. We now turn to the task of defining a general programming language that supports abstraction by binding names to values and defining new operations.

### Structure

An interpreter for Scheme can share much of the same structure as the Calculator interpreter.

**Parsing**. The `scheme_reader` and `scheme_tokens` modules from the Calculator interpreter are nearly sufficient to parse any valid Scheme expression. However, it does not yet support quotation or dotted lists.

**Evaluation**. Scheme is evaluated one expression at a time. Each expression returned from `scheme_read` is passed to the `scheme_eval` function, which evaluates an expression `expr` in the current environment env.

The `scheme_eval` function evaluates the different forms of expressions in Scheme: primitives, special forms, and call expressions. The form of a combination in Scheme can be determined by inspecting its first element. Each special form has its own evaluation rule.

**Procedure application**. Procedure application is implemented by the function `scheme_apply`. The procedure application process in Scheme is considerably more general than the calc_apply function in Calculator. It applies two kinds of arguments: a `PrimtiveProcedure` or a `LambdaProcedure`. 
- A `PrimitiveProcedure` has an instance attribute `fn` that is bound to a Python function. In addition, it may or may not require access to the current environment. This Python function is called whenever the procedure is applied.
- A `LambdaProcedure` is implemented in Scheme. It has a body attribute that is a Scheme expression, evaluated whenever the procedure is applied. To apply the procedure to a list of arguments, the body expression is evaluated in a new environment. To construct this environment, a new frame is added to the environment, in which the formal parameters of the procedure are bound to the arguments. The body is evaluated using `scheme_eval`.

**Eval/apply recursion**. The functions that implement the evaluation process, `scheme_eval` and `scheme_apply`, are mutually recursive. Evaluation requires application whenever a call expression is encountered. Application uses evaluation to evaluate operand expressions into arguments, as well as to evaluate the body of user-defined procedures. The general structure of this mutually recursive process appears in interpreters quite generally and constitutes the essence of the evaluation process.

This recursive cycle ends with language primitives. Evaluation has a base case that is evaluating a primitive expression. Some special forms also constitute base cases without recursive calls. Function application has a base case that is applying a primitive procedure.

## Environments

The `Frame` class forms environments. Each `Frame` instance represents an environment in which symbols are bound to values. A frame has a dictionary of bindings, as well as a parent frame that is `None` for the global frame.

Bindings are not accessed directly, but instead through two `Frame` methods: `lookup` and `define`. 
- `lookup` implements the look-up procedure of the environment model of computation described in Chapter 1. A symbol is matched against the bindings of the current frame. If it is found, the value to which it is bound is returned. If it is not found, look-up proceeds to the parent frame. 
- The `define` method always binds a symbol to a value in the current frame.

# Commented programs

Here I comment some details about the `Calculator` implementation. It is not exhaustive and only highlights the important parts of the files.

## `scheme_reader.py`

### Pairs and Scheme lists

Both `Pair` and `nil` inherit from `object`. Here is a discussion about that: https://stackoverflow.com/questions/4015417/python-class-inherits-object .

### Scheme list parser

**`scheme_read`**. `scheme_read` takes a buffered `src` and decides what to do with the current element of `src`. `scheme_read` is used recursively with `read_tail`.

If the current element of `src` is not `None`, then we add one to the current index and check cases. We return the current element if it is a `nil` str or the str is not in `DELIMETERS`. If we have a `(` as current element, then we know that a pair is beggining, so we read it with `read_tail`.

**`read_tail`**: if we do not have a `nil` expression (`'()`), then we read the `first` element of the pair with `scheme_read` (remember the index was updated before the call to `read_tail`) and the second element with `read_tail` again (during assignment of `first` the current index is updated, so we keep reading new characters or lines). Recursion stops when we reach the base case, a `)`, for which we return a `nil`.

## `buffer.py`

Read the docstring and the examples. They explain quite a lot.

**`current`**: return the current element, or `None` if none exists.
This method is called in the constructor and in `pop`.
- In the constructor, `self.index` is zero and `self.current_line` is an empty tuple. So `not more_on_line` is true and the suite is executed. In general, the suite is executed when we are done with the `self.current_line`. `self.index` is updated to zero (in this case, it already was zero), because we start at the beggining of the line. Then we try to read the next line from the iterator `self.source`. If a `StopIteration` occurs, there is nothing else to read from `self.source` and we set `self.current_line` to an empty tuple and the current element is `None`. If no error occurs, then we assign the `self.current_line` to the list `self.source` returns and we append it to `self.lines`. Now, `self.current_line` has a positive length and we finish the execution of the suite. It returns the element at `self.index` zero from `self.current_line`.

## `scheme_tokens.py`

The operator `|` appears at the beggining of the file. Here its meaning is explained: https://stackoverflow.com/questions/21243775/vertical-bar-in-python-bitwise-assignment-operator . Basically is the union operator for sets (and, in general, the `__or__` operator).

**Set of characters**: at the beggining of the file, there are some cryptic definitions for characters.
- `_SYMBOL_STARTS`: set of characters which might start a Scheme symbol.
- `_SYMBOL_INNERS`: set of characters that might be inside a Scheme symbol.
- `_NUMERAL_STARTS`: set of characters with which a numeral might start.
- `_WHITESPACE`: set of characters that represent white spaces.
- `_SINGLE_CHAR_TOKENS`: set of characters that by themselves represent a token: `(`, `)` and `'`.
- `_TOKEN_END`: set of characters with which a token finishes (`_WHITESPACE` or `_SINGLE_CHAR_TOKENS`).
- `DELIMETERS`: set of characters that delimits expressions: union of `_SINGLE_CHAR_TOKENS` and a dot.

**`next_candidate_token`** given some code line, finds the next candidate token (i.e. it is able to separate a single token). 

The inputs are:
- `line`: a str (?) which is a `line` of code.
- `k`: an int index used at `line`.

The output is:
- A tuple `(tok, k')`. See docstring for explanation.

While the index `k`is not longer than the `line`, execute an `if` statement. Each clause means:
- `if c == ';'`: no more characters and, thus, no more tokens. I think it represents a comment character (everything after `;` should not be executed). Return `None, len(line)`.
- `elif c in _WHITESPACE`: ignore white spaces (they are not tokens) and keep looking for tokens in the next character adding one to `k`.
- `elif c in _SINGLE_CHAR_TOKENS`: this character is a token by itself, so it must be returned. Return `c, k+1`.
- `elif c == '#'`: bolean values `#t` and `#f`.
- `else`: the token must be a set of multiple characters. Keep looking ahead in `line` until we find a `_TOKEN_END`. In that case, return the set of characters from k up to the `_TOKEN_END`.
If the index is longer than the `line`, then there are no new tokens (no more characters).

**tokenize_line**: returns a list of Scheme tokens in `line`. While `next_candidate_token` keeps returning possible tokens, classify that token, convert it accordingly, and append it to the list. 

In `elif text[0] in _NUMERAL_STARTS`, a possible numeral is detected. We try to convert the str into an int. If the str contains a dot, then the conversion raises a `ValueError`, so we try to convert it to a float.

## `scalc.py`

This part is not that hard.