# Lexical analysis

A Python program is read by a parser. Input to the parser is a stream of tokens, generated by the lexical analyzer. 

There are folowing tokens in the python code:

*   *NEWLINE* - represent the end of logical line;
*   *INDENT* and *DEDENT* - are used to determine groupe of statements;
*   *identifiers* - (also referred to as *names*) are used to identify variables, functions e.t.c;
*   *keywords* - are used to identify built-in constants, operators and statements of python???
*   *literals* - are notations for constant values of some built-in types. There are numeric literals and string literals.
*   *operators*, for example: `+       -       *       **      /       //      % `;
*   *delimiters*, for example: `(       )       [       ]       {       }  ,       :       . = +=      -=      *=      /=      //=`;



## Logical lines and physical lines


A Python program is divided into a number of logical lines.

The end of a logical line is represented by the token NEWLINE. Statements cannot cross logical line boundaries except where NEWLINE is allowed by the syntax (e.g., between statements in compound statements). A logical line is constructed from one or more physical lines by following the explicit or implicit line joining rules.

A physical line is a sequence of characters terminated by an end-of-line sequence.

Logical line that contains only whitespaces or comments is ingored. 

Two or more physical lines may be joined into logical lines using backslash characters (\\)
For example:

```python
if 1900 < year < 2100 and 1 <= month <= 12 \
   and 1 <= day <= 31 and 0 <= hour < 24 \
   and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
        return 1
```



Expressions in parentheses, square brackets or curly braces can be split over more than one physical line without using backslashes. For example:

```python
month_names = ['Januari', 'Februari', 'Maart',      # These are the
               'April',   'Mei',      'Juni',       # Dutch names
               'Juli',    'Augustus', 'September',  # for the months
               'Oktober', 'November', 'December']   # of the year
```

## Indentation

Leading whitespace (spaces and tabs) at the beginning of the logical line is used to determine the indentation level of the logical line, which in turn is used to determine the grouping of statements.

<table>
  <tbody>
    <col width="1000x" ></col>
    <tr>
      <td td bgcolor=lightgreen height="100px"><font color=black size=3>
<dd>
Statements which go together must have the same indentation. Each such set of statements is called a <strong>block</strong><br><br>Using four spaces for indentation is the official Python language recommendation.</dd>
      </font></td>
    </tr>
  </tbody>
</table>

## Identifier naming and keywords

 There are some rules which must be followed for naming identifiers:

*   The first character of the identifier must be a letter of the alphabet (uppercase ASCII or lowercase ASCII or Unicode character) or an underscore (_).
*   The rest of the identifier name can consist of letters (uppercase ASCII or lowercase ASCII or Unicode character), underscores (_) or digits (0-9).
*   Identifier names are case-sensitive. For example, myname and myName are not the same. Note the lowercase n in the former and the uppercase N in the latter.
*   Examples of valid identifier names are i, name_2_3. Examples of invalid identifier names are `2things`, `this is spaced out`, `my-name` and `>a1b2_c3`.

The following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifiers. They must be spelled exactly as written here:

```python
False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield
```

## Literals

Literals are notations for constant values of some built-in types.

There are following literals:
*   *String literals* 
*   *Numeric literals*

## Numeric literals

*Numeric literals* included three types: integers, floating point numbers and imaginary numbers. 
   *   An example of an integer is `2` which is just a whole number;
   *   Examples of floating point numbers (or *floats* for short) are `3.23` and `52.3E-4`. The `E` notation indicates powers of `10`. In this case, `52.3E-4` means $52.3 * 10^{-4}$.
   *   Some examples of imaginary literals: `3.14j  10.j 10j     .001j   1e100j   3.14e-10j   3.14_15_93j`

## Strings litetals

Strings literals are sequence of characters. 
They must be enclosed in matching single quotes (`'`) or double quotes (`"`) They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings, they are used for multi-line string literals).

One thing to note is that in a string, a single backslash at the end of the line indicates that the string is continued in the next line, but no newline is added. For example:

```python
"This is the first sentence. \
This is the second sentence."
```

is equivalent to

```python
"This is the first sentence. This is the second sentence."
```

Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation.

```python
print("[A-Za-z_]"       # letter or underscore
      "[A-Za-z0-9_]*"   # letter, digit or underscore
      )    

# [A-Za-z_][A-Za-z0-9_]*

```

### Escape Sequence

The backslash (\\) character is also used to give special meaning to otherwise ordinary characters like `n`, which means ‘newline’ when escaped (`\n`). It can also be used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.

|Escape Sequence     |Meaning            |
|--------------------|-------------------|
|\\\ | 	Backslash (\\)|
|\\' | Single quote (')|
|\\"| Double quote (")|
|\\n | New line|
|\t | Horizontal Tab |

### Formatted string litarels

*New in version 3.6*

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces `{}`. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.

Top-level format specifiers may include nested replacement fields. These nested fields may include their own conversion fields and format specifiers, but may not include more deeply nested replacement fields. The format specifier mini-language is the same as that used by the `str.format()` method.

Formatted string literals may be concatenated, but replacement fields cannot be split across literals.

Some examples of formatted string literals:

```python
>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."
>>> f"He said his name is {repr(name)}."  # repr() is equivalent to !r
"He said his name is 'Fred'."
>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal("12.34567")
>>> f"result: {value:{width}.{precision}}"  # nested fields
'result:      12.35'
>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%B %d, %Y}"  # using date format specifier
'January 27, 2017'
>>> f"{today=:%B %d, %Y}" # using date format specifier and debugging
'today=January 27, 2017'
>>> number = 1024
>>> f"{number:#0x}"  # using integer format specifier
'0x400'
>>> foo = "bar"
>>> f"{ foo = }" # preserves whitespace
" foo = 'bar'"
>>> line = "The mill's closed"
>>> f"{line = }"
'line = "The mill\'s closed"'
>>> f"{line = :20}"
"line = The mill's closed   "
>>> f"{line = !r:20}"
'line = "The mill\'s closed" '
```

### Raw String

Raw strings are used to specify some strings where no special processing such as escape sequences are handles. The r or R prefix are used to specify raw strings.

For example:

```python
print(r"Newlines are indicated by \n")

# Newlines are indicated by \n
```

## Docstrings and comments

<table>
  <tbody>
    <col width="1000x" ></col>
    <tr>
      <td td bgcolor=lightgreen height="200px"><font color=black size=3>
<dd>
<strong>Docstring</strong> is a string literal which appears as the first expression in a class, function or module. While ignored<br><br> when the suite is executed, it is recognized by the compiler and put into the __doc__ attribute of the enclosing <br><br>class, function or module. Since it is available via introspection, it is the canonical place for documentation<br><br> of the object. 
      </font></td>
    </tr>
  </tbody>
</table>


A comment starts with a hash character `#` that is not part of a string literal, and ends at the end of the physical line. A comment signifies the end of the logical line unless the implicit line joining rules are invoked. Comments are ignored by the syntax.

For example:

```python
print('hello world')  # Note that print is a function
```

or

```python
# Note that print is a function
print('hello world')
```

or 

To specify multiline comments/string literal constants/ triple quotes are used - (`"""` or `'''`). Single and double quotes can be freely used inside triple quotes. 

For example:
```python
'''Multi-line comments (also can be useful as multi-line literal constants)
You can also use triple double quotes""" to specify multiline comments or literal constants
He sad: "My name is James Bond"
'''
```

# Operators and Expressions

Here is a quick overview of the available operators:

`+`(plus)

*   Adds two objects
*   `3 + 5` gives `8`. `'a' + 'b'` gives `'ab'`.

`-` (minus)

*   Gives the subtraction of one number from the other; if the first operand is absent it is assumed to be zero.
*   `-5.2` gives a negative number and `50 - 24` gives `26`.

`*` (multiply)

*   Gives the multiplication of the two numbers or returns the string repeated that many times.
*   `2 * 3` gives `6`. `'la' * 3` gives `'lalala'`.

`**` (power)

*   Returns `x` to the power of `y`
*   `3 ** 4` gives `81` (i.e. `3 * 3 * 3 * 3`)

`/` (divide)

*   Divide `x` by `y`
*   `13 / 3` gives `4.333333333333333`

`//` (divide and floor)

*   Divide x by y and round the answer down to the nearest integer value. Note that if one of the values is a float, you'll get back a float.
*   `13 // 3` gives `4`
`-13 // 3` gives `-5`
`9//1.81` gives `4.0`

`%` (modulo)

*   Returns the remainder of the division
*   `13 % 3` gives `1`. `-25.5 % 2.25` gives `1.5`.

`<` (less than)

*   Returns whether `x` is less than `y`. **All comparison operators** return `True` or `False`. Note the capitalization of these names.
*   `5 < 3` gives `False` and `3 < 5` gives `True`.
*    Comparisons can be chained arbitrarily: `3 < 5 < 7` gives `True`.


`>` (greater than)

*   Returns whether `x` is greater than `y`
*   `5 > 3` returns `True`. If both operands are numbers, they are first converted to a common type. Otherwise, it always returns `False`.

`<=` (less than or equal to)

*   Returns whether x is less than or equal to y
*   `x = 3; y = 6; x <= y` returns `True`

`>=` (greater than or equal to)

*   Returns whether `x` is greater than or equal to `y`
*   `x = 4; y = 3; x >= 3` returns `True`

`==` (equal to)

*   Compares if the objects are equal
*   `x = 2; y = 2; x == y` returns `True`
*   `x = 'str'; y = 'stR'; x == y` returns `False`
*   `x = 'str'; y = 'str'; x == y` returns `True`

`!=` (not equal to)

*   Compares if the objects are not equal
*   `x = 2; y = 3; x != y` returns `True`

`in` and `not in` (membership test operations)

*  test for membership. `x in s` evaluates to `True` if `x` is a member of `s`, and `False` otherwise. `x not in s` returns the negation of `x in s`. 

`is` and `not is` (identity comparisons)

*   The operators `is` and `is not` test for an object’s identity: `x is y` is true if and only if `x` and `y` are the same object. An Object’s identity is determined using the `id()` function. `x is not y` yields the inverse truth value.

`:=` (assignment expressions) *New in version 3.8*

*   An assignment expression (sometimes also called a “named expression” or “walrus”) assigns an expression to an identifier, while also returning the value of the expression.

`not` (boolean NOT)

*   If `x` is `True`, it returns `False`. If `x` is `False`, it returns `True`.
*   `x = True; not x` returns `False`.

`and` (boolean AND)

*  `x and y` returns `False` if `x` is `False`, else it returns evaluation of `y`
* `x = False; y = True; x and y` returns `False` since x is False. In this case, Python will not evaluate `y` since it knows that the left hand side of the `and` expression is `False` which implies that the whole expression will be `False` irrespective of the other values. This is called short-circuit evaluation.

`or` (boolean OR)

*   If `x` is `True`, it returns `True`, else it returns evaluation of `y`
*   `x = True; y = False; x or y` returns `True`. Short-circuit evaluation applies here as well.

# Evalution Order

Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.

☝ **Operator precedence**

The following table gives the precedence table for Python, from highest precedence (most binding) to the the lowest precedence (least binding).

*   `(expressions...), [expressions...], {key: value...}, {expressions...}` : Binding or parenthesized expression, list display, dictionary display, set display

* `x[index], x[index:index], x(arguments...), x.attribute` : Subscription, slicing, call, attribute reference

* `await x` : Await expression

*   `**` : Exponentiation 

*   `+x, -x, ~x` : Positive, negative, bitwise NOT

*   `*, @, /, //, %` : Multiplication, matrix multiplication, division, floor division, remainder

*   `+, -` : Addition and subtraction

*   `<<, >>` : Shifts

*   `&` : Bitwise AND

*   `^` : Bitwise XOR

*   `|` : Bitwise OR

*   `in, not in, is, is not, <, <=, >, >=, !=, ==` : Comparisons, including membership tests and identity tests

*   `not x` :  Boolean NOT

*   `and` : Boolean AND

*   `or` : Boolean OR

*   `if – else` : Conditional expression

*   `lambda` : Lambda expression

*   `:=` : Assignment expression




Synopsis prepared by S.V. Kislyakov
(SkillFactory, group DSPR-107, 25 Jul 2022)


# References

1.  [C.H. Swaroop "A Byte of Python"](https://python.swaroopch.com/) (version 4.0 of 19 Jan 2016, with revision of 06 Nov 2020).
2.  [Library Reference](https://docs.python.org/3/library/index.html) (Python 3.10.5 documentation).
3.  [Language Reference](https://docs.python.org/3/reference/index.html) (Python 3.10.5 documentation).
4.  [Glossary](https://docs.python.org/3/glossary.html) (Python 3.10.5 documentation).