### Deformance and paratext

The deformance algorithm used to generate the paratexts described in the previous section on the definition of the first- and second-recension *dicta* was implemented in the form of a 201-line Python program. The program reads the MGH e-text of the Friedberg edition, and parses it to extract the *dicta*. The Oxford Concordance Program (OCP) format in which the e-text is encoded is extremely difficult to parse because it is not tree-structured---it has start tags for textual elements such as canons and *dicta*, cases and distinctions, but not (unlike XML) end tags.[@hockey_history_2004] The extraction engine captures every element of text between a *dictum* start tag (`<T A>` or `<T P>`) and the start tag for the next element that can possibly follow a *dictum*:

```python
import re

f = open('edF.txt', 'r')
file = f.read()
# (?<=...) positive lookbehind assertion.
dicta = re.findall('(?:\<T [AP]\>|(?<=\<T [AP]\>))(.*?)'    # dictum starts with dictum ante or dictum post tag.
    '(?:'                   # non-capturing group.
        '\<1 [CD][CP]?\>|'  # dictum ends with major division,
        '\<2 \d{1,3}\>|'    # or number of major division,
        '\<3 \d{1,2}\>|'    # or number of question,
        '\<4 \d{1,3}\>|'    # or number of canon,
        '\<P 1\>|'          # or Palea,
        '\<T [AIPRT]\>'      # or inscription or text tag.
    ')', file, re.S)        # re.S (re.DOTALL) makes '.' special character match any character including newline.
print(dicta)
```

In [1]:
import re
import sys

f = open('edF.txt', 'r')
file = f.read()
toc = open('toc_all.txt', 'r')
dictionary_Fr = {} # Friedberg
dictionary_1r = {} # first recension
dictionary_2r = {} # second recension
# (?<=...) positive lookbehind assertion.
dicta = re.findall('(?:\<T [AP]\>|(?<=\<T [AP]\>))(.*?)'    # dictum starts with dictum ante or dictum post tag.
    '(?:'                   # non-capturing group.
        '\<1 [CD][CP]?\>|'  # dictum ends with major division,
        '\<2 \d{1,3}\>|'    # or number of major division,
        '\<3 \d{1,2}\>|'    # or number of question,
        '\<4 \d{1,3}\>|'    # or number of canon,
        '\<P 1\>|'          # or Palea,
        '\<T [AIPRT]\>'     # or inscription or text tag.
    ')', file, re.S)        # re.S (re.DOTALL) makes '.' special character match any character including newline.
print('expected 1273 dicta, found ' + str(len(dicta)) + ' dicta', file=sys.stderr)
print(dicta[188])

 -Gratian.+ Ecce, quomodo serui ad clericatum ualeant assumi,
uel quomodo non admittantur. Liberti quoque non sunt promouendi
ad clerum, nisi ab obsequiis sui patroni fuerint absoluti.
Unde in Concilio Eliberitano: -[c. 80.]+



expected 1273 dicta, found 1273 dicta


The extracted *dicta* require considerable scrubbing before they can be used. Here, for example, is what D.54 d.p.c.23 looks like in its raw state:

```python
[' -Gratian.+ Ecce, quomodo serui ad clericatum ualeant assumi,\n
uel quomodo non admittantur. Liberti quoque non sunt promouendi\n
ad clerum, nisi ab obsequiis sui patroni fuerint absoluti.\n
Unde in Concilio Eliberitano: -[c. 80.]+\n']
```

Each *dictum* is then processed into an item (key-value pair) in a Python dictionary:

In [2]:
for dictum in dicta:
    dictum = re.sub('\<S \d{1,4}\>\<L 1\> \-\d{1,4}\+', '', dictum) # remove page and line number tags.
    dictum = re.sub('\<P 1\> \-\[PALEA\.\+', '', dictum)    # remove Palea tags.
    dictum = re.sub('\-.*?\+', '', dictum)
    dictum = re.sub(re.compile('\-\[.*?\]\+', re.S), '', dictum)
    dictum = re.sub('\s+', ' ', dictum)
    dictum = re.sub('^\s+', '', dictum) # remove leading whitespace characters
    dictum = re.sub('\s+$', '', dictum) # remove trailing whitespace characters
    key = toc.readline().rstrip()
    if key in dictionary_Fr:
    # if there's already a dictionary entry with this key, merge the entries
        # print('duplicate key: ' + key, file=sys.stderr)
        dictum = dictionary_Fr[key] + ' ' + dictum
    dictionary_Fr[key] = dictum
print(dictionary_Fr['D.54 d.p.c.23'])

Ecce, quomodo serui ad clericatum ualeant assumi, uel quomodo non admittantur. Liberti quoque non sunt promouendi ad clerum, nisi ab obsequiis sui patroni fuerint absoluti. Unde in Concilio Eliberitano:


```python
{'D.54 d.p.c.23': 'Ecce, quomodo serui ad clericatum ualeant assumi, uel quomodo non admittantur. Liberti quoque non sunt promouendi ad clerum, nisi ab obsequiis sui patroni fuerint absoluti. Unde in Concilio Eliberitano:'}
```

The first recension variants from the Friedberg edition recorded in Winroth's appendix are then encoded as a list of dictionaries in which the `'pattern'` item is the variant represented as a Python regular expression:

```python
[{'key': 'D.54 d.p.c.23', 'pattern': '(Ecce, quomodo serui.*?quomodo non admittantur\.)'}]
```

Finally, the deformance engine uses the variants encoded as regular expression patterns to generate the first and second paratexts corresponding the first- and second-recension *dicta*. For each *dictum*, the text matching the pattern is inserted into a dictionary representing the first recension paratext; then the text resulting when the text matching the pattern is replaced by the null string `''` is inserted into a dictionary representing the second recension paratext:

In [3]:
import re

dictionary_1r = {} # first recension paratext
dictionary_2r = {} # second recension paratext
dictionary_Fr = {'D.54 d.p.c.23': 'Ecce, quomodo serui ad clericatum ualeant assumi, uel quomodo non admittantur. Liberti quoque non sunt promouendi ad clerum, nisi ab obsequiis sui patroni fuerint absoluti. Unde in Concilio Eliberitano:'}
keysandpatterns = [{'key': 'D.54 d.p.c.23', 'pattern': '(Ecce, quomodo serui.*?quomodo non admittantur\.)'}]
for i in range (len(keysandpatterns)):
    key = keysandpatterns[i]['key']
    pattern = keysandpatterns[i]['pattern']
    result = re.search(pattern, dictionary_Fr[key])
    dictionary_1r[key] = result.group(1)
    dictionary_2r[key] = re.sub(pattern, '', dictionary_Fr[key])
print(dictionary_1r)
print(dictionary_2r)

{'D.54 d.p.c.23': 'Ecce, quomodo serui ad clericatum ualeant assumi, uel quomodo non admittantur.'}
{'D.54 d.p.c.23': ' Liberti quoque non sunt promouendi ad clerum, nisi ab obsequiis sui patroni fuerint absoluti. Unde in Concilio Eliberitano:'}


Here is the resulting first recension paratext:

```python
{'D.54 d.p.c.23': 'Ecce, quomodo serui ad clericatum ualeant assumi, uel quomodo non admittantur.'}
```
and the corresponding second recension paratext:

```python
{'D.54 d.p.c.23': 'Liberti quoque non sunt promouendi ad clerum, nisi ab obsequiis sui patroni fuerint absoluti. Unde in Concilio Eliberitano:'}
```