---
title: Parsing and Validation
skip-execution: true
---

::::{attention}

This notebook is optional and NOT required for any course assessment activities. Lab tutor may go through them if time is available.

::::

In [None]:
import ast
from typing import Annotated
from ipywidgets import Text, interact
from pydantic import Field, ValidationError, validate_call

%load_ext divewidgets

In [None]:
if not input('Load JupyterAI? [Y/n]').lower()=='n':
    %reload_ext jupyter_ai

## Parsing User Input

A parser is a crucial component that analyzes the structure of data, often in the form of text, and converts it into a more meaningful format. For instance, when executing a Python program, the Python interpreter first parses the program's source code into an Abstract Syntax Tree (AST).

In [None]:
print(ast.dump(ast.parse('q = a/b if b else "undefined"', "_", "exec"), indent=4))

This involves breaking down the source code into individual components and interpreting their roles. E.g.,

- `q` is understood as a [variable name][name] for storing a value, while `a` and `b` are also variable names but for loading values; and
- `"undefined"` is regarded as a [contant string literal][string].

The processs is called tokenization, which is performed by the so-called lexer. The parser composes a hierarchical structure that accurately represents the operations and their execution flow. E.g., the Python code

```python
... = .../... if ... else ...
```
is translated to the tree structure:

```
...
        Assign(
            targets=...
            value=IfExp(
                test=...
                body=BinOp(
                    left=...
                    op=Div(),
                    right=...
                orelse=...
...
```

In other words, the division operation `Div` needs to be completed first so that the conditional expression `IfExp` can be completed, which then allows the assignment operation `Assign` to complete. As the language grows, the logic involved in parsing the program becomes more sophisticated. For more details, see the [ANTLR video](https://www.youtube.com/watch?v=OAoA3E-cyug).

[name]: https://docs.python.org/3/library/ast.html#variables
[string]: https://docs.python.org/3/library/ast.html#literals

### Parsing Boolean Values

The following is a very simple parser that can understand yes/no strings in user input and convert them to their corresponding boolean values.

In [None]:
@interact(x=Text('yes'))
def parse(x):
    s = x.lower()
    match s:
        case "yes" | "y":
            return True
        case "no" | "n":
            return False
        case _:
            return string

The matching is case in-sensitive:

In [None]:
parse("yes"), parse("n"), parse("Y"), parse("No")

Instead of using the `if` statement, the program uses the [`match` statement](https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching) introduced in Python 3.10. The following flowchart shows roughly how the statement is executed:

In [None]:
%%flowchart
st=>start: Start
suite0=>operation: s = x.lower()
cond1=>condition: s == "yes" or s == "y"
cond2=>condition: s == "no" or s == "n"
suite1=>inputoutput: return True
suite2=>inputoutput: return False
suite3=>inputoutput: return x
e=>end

st(right)->suite0(right)->cond1
cond1(yes)->suite1->e
cond1(no)->cond2
cond2(yes)->suite2->e
cond2(no)->suite3->e

- Initially, `s = string.lower()` obtains the string in lowercase.
- In the first case of the match statement, `"yes"` or `"y"` is converted to the boolean value `True`.
- In the second case of the match statement, `"no"` or `"n"` is converted to the boolean value `False`.

::::{exercise}
:label: ex:parse-boolean

Modify the program so that, additionally,

- `"true"` or `"t"` is parsed as `True`, and
- `"false"` or `"f"` is parse as `False`.

The comparison should be case insensitive.

::::

In [None]:
@interact(x=Text('T'))
def parse(x):
    # YOUR CODE HERE
    raise NotImplementedError

In [None]:
# tests
assert parse("TRUE") is True
assert parse("Y") is True
assert parse("t") is True
assert parse("False") is False
assert parse("F") is False
assert parse("n") is False
assert parse("TrUE") is True
assert parse("No") is False

### Parsing Numbers

It is desirable to parse numbers as well. There is indeed a way to check whether a string consists only of digits:

In [None]:
str.isdigit("1302"), "CS1302".isdigit()

Unfortunately, the function failed to detect negative integers:

In [None]:
"-12".isdigit()

The following function resolves the issue using the [`try` statement](https://docs.python.org/3/reference/compound_stmts.html#try). It even works when `x` is of type `int`.

In [None]:
def isint(x):
    """
    Returns True if x can be converted to an integer, and False otherwise.
    """
    try:
        int(x)
    except ValueError:
        return False
    return True


isint("CS1302"), isint("1302"), isint("-1302"), isint(-1302)

How does it work? `isint(x)` would describe its implementation as follows:
> I `try` to convert `x` to `int` and `return True` `except` when `ValueError` is raised, in which case I `return False`[^gramar]

This is illustrated by the following flowchart.

[^gramar]: Why use first person narration? Just to avoid error like `tries` (a syntax error in Python), or `try`s (a gramatical mistake).

In [None]:
%%flowchart
st=>start: Start
cond1=>condition: ValueError
suite1=>operation: int(x)
suite2=>inputoutput: return True
suite3=>inputoutput: return False
e=>end

st(right)->suite1(right)->cond1
cond1(yes)->suite3->e
cond1(no)->suite2->e

::::{exercise}
:label: ex:parse

Improve the `parse` function so that it parses the input argument `x` by `try`ing to convert the `x` as follows in order:
- `int(x)`
- `float(x)`
- `complex(x)`

It should return the value of the first possible conversion without `ValueError`. If all of the conversions fail, the parser should behave in the same way as the one implemented in [](#ex:parse-boolean).


::::

In [None]:
@interact(x=Text("-13+0.2j"))
def parse(x):
    # YOUR CODE HERE
    raise NotImplementedError

In [None]:
# tests
assert (_:=parse("1302")) == 1302 and isinstance(_, int)
assert (_:=parse("-13.02")) == -13.02 and isinstance(_, float)
assert (_:=parse("-13+0.2j")) == -13+0.2j and isinstance(_, complex)
assert parse("yes") is True
assert parse("N") is False
assert (_:=parse("-1302")) == -1302 and isinstance(_, int)
assert (_:=parse("inf")) == float("inf") and isinstance(_, float)

## Data Validation

If you have implemented your parser correctly in the last section, the [`interact` function of `ipywidgets`](https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html#using-interact) allows you to play with the function interactively:

In [None]:
@interact(a=Text("3"), b=Text("4"))
def length_of_hypotenuse(a, b):
    a, b = parse(a), parse(b)
    c = (a**2 + b**2) ** (0.5)
    return c

Without the parser, i.e., with the line `a, b = parse(a), parse(b)` removed, the above code will fail because `a` and `b` are passed to `length_of_hypotenuse` as string values, not numbers, and exponentiation such as `a ** 2` is not implemented for string value by default. With the parser, however, you can even call the function with integer arguments:

In [None]:
length_of_hypotenuse(3, 4)

::::{seealso} How does `interact` work?
:class: dropdown

The `@interact` line is a decorator you will learn later in the course. It automatically create a user interface with two text input `a` and `b`, and continuously pass their updated values as arguments to `length_of_hypotenuse`.

::::

### Assertion

Interestingly, the function does not fail even if the input arguments are negative numbers.

In [None]:
a, b = -3, 4
length_of_hypotenuse(a, b)

::::{note} Is it a good idea to be able to handle negative edge length?
:class: dropdown

Imagine  that the length `a` was computed incorrectly to a negative value, but the error goes undetected as `length_of_hypotenuse` does not raise any error. This could lead to a more serious issues if some critical applications depend on the calculation.

::::

Instead of allowing the input arguments to be any values of any type, it is often better to validate the arguments and raise an error if the values or types are unexpected. We can achieve this using the [`assert` statement](https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement):

In [None]:
%%optlite -h 400
def length_of_hypotenuse(a, b):
    assert a >= 0 and b >= 0
    c = (a**2 + b**2) ** (0.5)
    return c


length_of_hypotenuse(-3, 4)

Validation is the process of checking whether a desired condition holds before further processing to avoid costly mistakes. We have been using the `assert` statements for validation. For instance, you may validate the notebook before submission to lower the chance of careless mistakes. After the submission, there are also hidden tests to validate whether the submitted programs are engineered to work only on the visible test cases.

Our function is still imperfect. For instance, it allows edge length to be infinite:

Note that if the input argument is too large, the exponentiation function will raise an `OverflowError`:

In [None]:
%%optlite -h 500
def length_of_hypotenuse(a, b):
    assert a >= 0 and b >= 0
    c = (a**2 + b**2) ** (0.5)
    return c


c = length_of_hypotenuse(3e300, 4)

However, sometimes, no error is raised even if the input is too large:

In [None]:
def length_of_hypotenuse(a, b):
    assert a >= 0 and b >= 0
    c = (a**2 + b**2) ** (0.5)
    return c


length_of_hypotenuse(3e400, 4), length_of_hypotenuse(3, 4e400), length_of_hypotenuse(3e400, 4e400)

::::{exercise}
:label: ex:assert

Improve the function to raise an `AssertionError` (not `OverflowError`) if the input `a` or `b`, or the output length of the hypotenuse overflows to infinite `float('inf')`.

:::{hint}
:class: dropdown

Use `a * a` instead of `a ** 2` to avoid `OverflowError`.

:::

::::

In [None]:
def length_of_hypotenuse(a, b):
    # YOUR CODE HERE
    raise NotImplementedError
    return c

In [None]:
# tests
def test_AE(a, b):
    try:
        c = length_of_hypotenuse(a, b)
        return max(a, b, c) < float("inf")
    except AssertionError:
        return True


assert length_of_hypotenuse(3, 4) == 5
assert test_AE(3, 4)
assert test_AE(3e300, 4)
assert test_AE(3e400, 4)
assert test_AE(3, 4e400)
assert test_AE(3e400, 4e400)

### Type Hinting and Validation

Instead of manually checking input arguments, you can use the packages [Pydantic](https://docs.pydantic.dev/latest/concepts/types/#custom-types) and [Typing](https://docs.python.org/3/library/typing.html#typing.Annotated):

In [None]:
NonNegative = Annotated[float, Field(ge=0)]


@interact(a=Text("3"), b=Text("4"))
@validate_call(validate_return=True)
def length_of_hypotenuse(a: NonNegative, b: NonNegative) -> NonNegative:
    """
    Return the length of hypotenuse.
    """
    c = (a * a + b * b) ** (0.5)
    return c

Notice that the string inputs `a` and `b` are automatically converted to `float`.

In [None]:
length_of_hypotenuse("3", 4)

::::{note} How does the code work?

To understand the code, note that:

- The following line defines a custom type `NonNegative` for non-negative numbers using `Annotated` from `typing` and `Field` from `pydantic`:
  ```python
  NonNegative = Annotated[float, Field(ge=0)]
  ```
- The following line uses type hints to specify the expected types for the function’s input arguments and return value:
  ```python
  def length_of_hypotenuse(a: NonNegative, b: NonNegative) -> NonNegative
  ```
- The following line uses the decorator `validate_call` to ensure that the function raises a validation error if the input arguments or return value are invalid:
  ```python
  @validate_call(validate_return=True)
  ```

::::

However, the above code does not raise any `ValidationError` if the input `a` or `b`, or the output length of the hypotenuse overflows to `float("inf")`:

In [None]:
length_of_hypotenuse(3e400, 4)

::::{exercise}
:label: ex:validation

To fix the above issue, define a custom type called `NonNegativeFinite` for non-negative finite numbers using `Annotated` from `typing` and `Field` from `pydantic`.

::::

In [None]:
# YOUR CODE HERE
raise NotImplementedError


@interact(a=Text("3"), b=Text("4"))
@validate_call(validate_return=True)
def length_of_hypotenuse(
    a: NonNegativeFinite, b: NonNegativeFinite
) -> NonNegativeFinite:
    """
    Return the length of hypotenuse.
    """
    c = (a * a + b * b) ** (0.5)
    return c

In [None]:
# test
def test_VE(a, b):
    try:
        c = length_of_hypotenuse(a, b)
        return max(a, b, c) < float("inf")
    except ValidationError:
        return True

assert test_AE(3, 4)
assert test_VE(3e300, 4)
assert test_VE(3e400, 4)
assert test_VE(3, 4e400)
assert test_VE(3e400, 4e400)