A few months ago, I inherited a code base devoid of any tests. I realised quite
quickly there might be a better design for the entire project, and wanted to get
there cautiously via continuous refactoring. 

Without the support harness of a test suite, it is hard to guarantee refactored
code has the same functionality as the old, battle-tested code. As such, refactoring
legacy projects begins with writing a test suite covering most (or all) execution
paths. For my project I did this by hand, but that laborious process piqued my
interest in automated test generation techniques.

I recently came across [this](https://www.fuzzingbook.org/beta/html/PrototypingWithPython.html)
fascinating piece by Andreas Zeller on automated testing techniques in Python.
Zeller prototypes a symbolic test generator based around the `z3` SMT solver. His
prototype is impressive, but quite limited: it does not handle looping constructs,
function calls, or functions of impure state. 

Here we extend the approach to also include concrete values from earlier 
executions, resulting in a simple concolic test generator.
<!-- TEASER_END -->

Zeller takes the following, well-known _triangle_ function that classifies a
triangle with integral sides `a`, `b`, and `c` into one of three categories.

In [29]:
def triangle(a: int, b: int, c: int) -> str:
    if a == b:
        if b == c:
            return 'equilateral'
        else:
            return 'isosceles #1'
    else:
        if b == c:
            return 'isosceles #2'
        else:
            if a == c:
                return 'isosceles #3'
            else:
                return 'scalene'

He then develops a simple symbolic test generator that systematically creates
test inputs covering all possible execution paths in the triangle function.

In [2]:
import sys
import inspect

In [21]:
def traceit(frame, event, arg):
    f_code = frame.f_code
    f_name = f_code.co_name
    lineno = frame.f_lineno
    vars = frame.f_locals

    s_lines, start_line_num = inspect.getsourcelines(f_code)
    loc = f"{f_name}:{lineno} {s_lines[lineno - start_line_num].rstrip()}"
    vars = ", ".join(f"{name} = {vars[name]}" for name in vars)

    print(f"{loc:60} ({vars})")

    return traceit

In [22]:
def triangle_traced():
    sys.settrace(traceit)
    triangle(2, 2, 1)
    sys.settrace(None)

In [23]:
triangle_traced()

triangle:2 def triangle(a: int, b: int, c: int) -> str:      (a = 2, b = 2, c = 1)
triangle:3     if a == b:                                    (a = 2, b = 2, c = 1)
triangle:4         if b == c:                                (a = 2, b = 2, c = 1)
triangle:7             return 'isosceles #1'                 (a = 2, b = 2, c = 1)
triangle:7             return 'isosceles #1'                 (a = 2, b = 2, c = 1)


In [6]:
import ast
import astor

In [10]:
triangle_source = inspect.getsource(triangle)
triangle_ast = ast.parse(triangle_source)

In [11]:
def collect_conditions(tree):
    conditions = []

    def traverse(node):
        if isinstance(node, ast.If):
            cond = astor.to_source(node.test).strip()
            conditions.append(cond)

        for child in ast.iter_child_nodes(node):
            traverse(child)

    traverse(tree)
    return conditions

In [12]:
collect_conditions(triangle_ast)

['(a == b)', '(b == c)', '(b == c)', '(a == c)']

In [13]:
def collect_path_conditions(tree):
    paths = []

    def traverse_if_children(children, context, cond):
        old_paths = len(paths)

        for child in children:
            traverse(child, context + [cond])

        if len(paths) == old_paths:
            paths.append(context + [cond])

    def traverse(node, context):
        if isinstance(node, ast.If):
            cond = astor.to_source(node.test).strip()
            not_cond = "z3.Not" + cond

            traverse_if_children(node.body, context, cond)
            traverse_if_children(node.orelse, context, not_cond)
        else:
            for child in ast.iter_child_nodes(node):
                traverse(child, context)

    traverse(tree, [])

    return ["z3.And(" + ", ".join(path) + ")" for path in paths]

In [14]:
path_conditions = collect_path_conditions(triangle_ast)
path_conditions

['z3.And((a == b), (b == c))',
 'z3.And((a == b), z3.Not(b == c))',
 'z3.And(z3.Not(a == b), (b == c))',
 'z3.And(z3.Not(a == b), z3.Not(b == c), (a == c))',
 'z3.And(z3.Not(a == b), z3.Not(b == c), z3.Not(a == c))']

In [15]:
import z3

In [16]:
a = z3.Int('a')
b = z3.Int('b')
c = z3.Int('c')

s = z3.Solver()

In [28]:
for path_condition in path_conditions:
    s = z3.Solver()
    s.add(a > 0, b > 0, c > 0)
    eval(f"s.check({path_condition})")
    m = s.model()
    print(m, triangle(m[a].as_long(), m[b].as_long(), m[c].as_long()))

[a = 1, b = 1, c = 1] equilateral
[b = 1, a = 1, c = 2] isosceles #1
[b = 2, a = 1, c = 2] isosceles #2
[b = 1, a = 2, c = 2] isosceles #3
[b = 2, a = 1, c = 3] scalene
