$\newcommand{\To}{\Rightarrow}$
$\newcommand{\false}{\mathrm{false}}$

In [1]:
import os, sys
sys.path.append(os.path.split(os.getcwd())[0])

In [2]:
from kernel.type import boolT
from kernel.term import Term, Var
from kernel.proof import Proof
from kernel.report import ProofReport
from logic import basic
from logic import matcher
from logic.logic import conj
from data.nat import natT, one
from logic.proofterm import ProofTerm
from syntax import printer

thy = basic.load_theory('nat')

## Proofs and proof-checking

In the previous two sections, we showed how to prove theorems about equality and propositional logic. Let us review what we have done so far.

For each statement to be proved, we first wrote down the proof step-by-step in a semi-formal language, where each step consists of applying a primitive deduction rule, or some composite rule (such as apply_theorem). This proof is then checked by converting the semi-formal proof into Python code. In the Python code, we are careful to construct theorems using only a limited set of functions: `thy.get_theorem` or one of the primitive deduction rules (`Thm.assume`, `Thm.implies_intr`, etc). For the composite rule, we defined corresponding Python functions, which again construct theorems using only the limited set of functions.

In this way, as long as we keep to calling only the limited set of functions, either directly or indirectly through custom procedures, and if the limited set of functions correctly implement the primitive deduction rules of higher-order logic, we can trust that our proof is correct.

While this already substantially increases our confidence in the proofs, we can still do better. First, it is difficult to ensure that the theorem objects are constructed using only the limited set of functions. This gets more difficult as the size of the code increases, and when multiple teams are collaborating on a project. Second, we cannot completely trust our implementation of the primitive deduction rules. Ideally, we should allow programs written by others to check our proof. Both of these problems can be solved by storing a trace of the proof, so it can be checked later, either using the same program or using other programs.

There are two kinds of traces that we will consider, which we call linear proof and proof term.

A linear proof (class `Proof`) is a list of proof items (class `ProofItem`). Each proof item contains an identifier, deduction rule used, arguments to the deduction rule, and input sequents (referred to by the identifiers of corresponding proof items).

Let us consider a simple example, the proof of $A \to A$. Recall the proof is as follows:

0. $A \vdash A$ by assume A.
1. $\vdash A \to A$ by implies_intr A from 0.

The corresponding linear proof is:

In [3]:
A = Var("A", boolT)
prf = Proof()
prf.add_item(0, "assume", args=A)
prf.add_item(1, "implies_intr", args=A, prevs=[0])
print(printer.print_proof(thy, prf))

0: assume A
1: implies_intr A from 0


Note the use of `print_proof` function to print the proof. This proof can be *checked* using the `check_proof` method. This method takes a proof as input, and checks the proof in the context of the current theory. If the check succeeds, it returns the theorem obtained by the proof.

In [4]:
res = thy.check_proof(prf)
print(printer.print_thm(thy, res, unicode=True))

⊢ A ⟶ A


Checking the proof also records the sequent obtained at each line of the proof. These will be displayed the next time the proof is printed:

In [5]:
print(printer.print_proof(thy, prf, unicode=True))

0: A ⊢ A by assume A
1: ⊢ A ⟶ A by implies_intr A from 0


Proof-checking will uncover any mistakes in the proof, including application of proof rules on inputs that are invalid. For example:

In [6]:
B = Var("B", boolT)
C = Var("C", boolT)
prf2 = Proof()
prf2.add_item(0, "assume", args=Term.mk_implies(A,B))
prf2.add_item(1, "assume", args=C)
prf2.add_item(2, "implies_elim", prevs=[0,1])
thy.check_proof(prf2)  # raises CheckProofException

CheckProofException: invalid derivation

Proof checking takes an optional *proof report*, which records statistics from the proof. Continuing from the first example for which proof-checking is successful, we have:

In [7]:
rpt = ProofReport()
thy.check_proof(prf, rpt=rpt)
print(rpt)

Steps: 2
  Theorems:  0
  Primitive: 2
  Macro:     0
Theorems applied: 
Macros evaluated: 
Macros expanded: 
Gaps: []


The report says the proof consists of 2 primitive steps. Otherwise it is not very interesting. We will see more features of the proof report later using more complicated examples.

## Proof terms

While linear proofs are intuitive and can be printed in an easily readable form, they are difficult to generate automatically. When generating proofs, we prefer a pattern where each proved result is considered an object, and new results are produced by combining existing proved results in any order. In this way, the produced proof resembles a tree: at the root is the final result of the proof. At each node is an intermediate statement, and the edges characterize dependency between intermediate statements. Such trees correspond to `ProofTerm` objects in Python.

The proof term for the theorem $A \to A$ is constructed as follows:

In [8]:
pt0 = ProofTerm.assume(A)
pt1 = ProofTerm.implies_intr(A, pt0)
print(printer.print_thm(thy, pt1.th, unicode=True))

⊢ A ⟶ A


Note the similarity with constructing theorems in previous sections. In general, a proof term can be considered as a theorem with extra information: the full history of how the theorem is derived. Any proof term has a field `th`, which is the theorem obtained by the proof.

Any proof term can be converted to a linear proof using the `export` method, which can be used for proof checking or display.

In [9]:
prf = pt1.export()
thy.check_proof(prf)
print(printer.print_proof(thy, prf, unicode=True))

0: A ⊢ A by assume A
1: ⊢ A ⟶ A by implies_intr A from 0


Existing theorems can be invoked using the `ProofTerm.theorem` function. We give an example on substitution of identities:

In [10]:
a = Var("a", natT)
b = Var("b", natT)
pt0 = ProofTerm.theorem(thy, 'add_assoc')
pt1 = ProofTerm.substitution({"x": a, "y": b, "z": one}, pt0)
print(printer.print_thm(thy, pt1.th, unicode=True))

⊢ a + b + 1 = a + (b + 1)


Again, the proof can be checked and printed as follows:

In [11]:
prf = pt1.export()
thy.check_proof(prf)
print(printer.print_proof(thy, prf, unicode=True))

0: ⊢ x + y + z = x + (y + z) by theorem add_assoc
1: ⊢ a + b + 1 = a + (b + 1) by substitution {x: a, y: b, z: 1} from 0


We can also see the report from proof checking:

In [12]:
rpt = ProofReport()
thy.check_proof(prf, rpt)
print(rpt)

Steps: 2
  Theorems:  1
  Primitive: 1
  Macro:     0
Theorems applied: add_assoc
Macros evaluated: 
Macros expanded: 
Gaps: []


This report is slightly more interesting. It states that the proof consists of one invocation of existing theorem, and one primitive step (`substitution`). The only theorem applied in the proof is `add_assoc`.

## Function producing proof terms

Just as for theorems, we can write our own functions for producing proof terms. For example, we can write a new `apply_theorem` function, this time operating on proof terms:

In [13]:
def apply_theorem(thy, th_name, *args, instsp=None):
    pt = ProofTerm.theorem(thy, th_name)
    As, _ = pt.prop.strip_implies()  # list of assumptions of th
    if instsp is None:
        instsp = dict(), dict()      # initial (empty) instantiation
    for A, arg in zip(As, args):     # match each assumption with corresponding arg
        matcher.first_order_match_incr(A, arg.prop, instsp)
    tyinst, inst = instsp
    pt2 = ProofTerm.subst_type(tyinst, pt) if tyinst else pt   # perform substitution on th
    pt3 = ProofTerm.substitution(inst, pt2) if inst else pt2
    for arg in args:                   # perform implies_elim on th
        pt3 = ProofTerm.implies_elim(pt3, arg)
    return pt3

A slight difference is that we perform the substitutions only if the dictionary is non-empty. This helps avoid one primitive step in many cases.

We first test this function on a simple example:

In [14]:
ptA = ProofTerm.assume(A)
ptB = ProofTerm.assume(B)
ptAB = apply_theorem(thy, 'conjI', ptA, ptB)
prf = ptAB.export()
thy.check_proof(prf)
print(printer.print_proof(thy, prf, unicode=True))

0: ⊢ A ⟶ B ⟶ A ∧ B by theorem conjI
1: ⊢ A ⟶ B ⟶ A ∧ B by substitution {A: A, B: B} from 0
2: A ⊢ A by assume A
3: A ⊢ B ⟶ A ∧ B by implies_elim from 1, 2
4: B ⊢ B by assume B
5: A, B ⊢ A ∧ B by implies_elim from 3, 4


The final theorem is as expected. We can now reproduce the full proof of $A \wedge B \to B \wedge A$:

In [15]:
pt0 = ProofTerm.assume(conj(A, B))
pt1 = apply_theorem(thy, 'conjD1', pt0)
pt2 = apply_theorem(thy, 'conjD2', pt0)
pt3 = apply_theorem(thy, 'conjI', pt2, pt1)
pt4 = ProofTerm.implies_intr(conj(A, B), pt3)
prf = pt4.export()
thy.check_proof(prf)
print(printer.print_proof(thy, prf, unicode=True))

0: ⊢ A ⟶ B ⟶ A ∧ B by theorem conjI
1: ⊢ B ⟶ A ⟶ B ∧ A by substitution {A: B, B: A} from 0
2: ⊢ A ∧ B ⟶ B by theorem conjD2
3: ⊢ A ∧ B ⟶ B by substitution {A: A, B: B} from 2
4: A ∧ B ⊢ A ∧ B by assume A ∧ B
5: A ∧ B ⊢ B by implies_elim from 3, 4
6: A ∧ B ⊢ A ⟶ B ∧ A by implies_elim from 1, 5
7: ⊢ A ∧ B ⟶ A by theorem conjD1
8: ⊢ A ∧ B ⟶ A by substitution {A: A, B: B} from 7
9: A ∧ B ⊢ A by implies_elim from 8, 4
10: A ∧ B ⊢ B ∧ A by implies_elim from 6, 9
11: ⊢ A ∧ B ⟶ B ∧ A by implies_intr A ∧ B from 10


We can also view the report from checking the proof:

In [16]:
rpt = ProofReport()
thy.check_proof(prf, rpt)
print(rpt)

Steps: 12
  Theorems:  3
  Primitive: 9
  Macro:     0
Theorems applied: conjI, conjD1, conjD2
Macros evaluated: 
Macros expanded: 
Gaps: []


This tells us that there are 3 invocations of theorems. The set of theorems invoked are `conjD1`, `conjD2` and `conjI`. In addition, there are 9 applications of primitive deduction rules.

In this section, we studied how to construct both linear proofs and proof terms. These allow us to produce traces of proofs that can be checked independently, including by third-party tools. There is, however, one major problem remaining with this approach: all proofs constructed so far consists only of primitive deduction rules. As we move to more complicated examples, these proofs can get very long, especially as we rely more and more on automatic procedures for producing proofs. It would be nice if it is possible to condense proofs produced by automatic procedures, for example reducing the multiple steps of proof produced by an invocation of `apply_theorem` to just one step. This leads to the important concept of *macros*, which we will begin to study in the next section.