# Reducing Words to Alphabetical Order (Following Ahl 1988)

Ahl in \(Culler 1988, [*On Puns*](https://ecommons.cornell.edu/handle/1813/3659)\) writes on anagrammatic possibility and "randomness": "I do not know of any systematic attempt to determine the frequency of their occurence in Latin literature, but a computer program which reduces words to alphabetical order would help." Below is such a computer program, limited in scope to Ovid's *Metamophoses* (I just finished rereading *Metaformations* and couldn't resist), but easily extensible to other texts. More on [Twitter](#), including [this related post](https://twitter.com/diyclassics/status/956607514648424448). \[PJB 4.7.2020\]

In [1]:
# Imports

import os
import string
import re
import html  

from random import sample
from collections import Counter
from functools import reduce
from natsort import natsorted

from pprint import pprint

from cltk.corpus.readers import get_corpus_reader
from cltk.stem.latin.j_v import JVReplacer
from cltk.tokenize.line import LineTokenizer

from ll_tools import preprocess

In [2]:
# Setup CLTK tools

line_tokenizer = LineTokenizer('latin')
replacer = JVReplacer()

In [3]:
# Preprocess texts

def preprocess(text):

    remove_list = [
                   r'\bMetamorposes\b',
                   r'Ovid: Metamorphoses .+',
                   r'P. OVIDI NASONIS METAMORPHOSEN LIBER .+',                  
                  ]
    
    for pattern in remove_list:
        text = re.sub(pattern, '', text)
    
    text = html.unescape(text) # Handle html entities
    text = re.sub(r'&nbsp;?', ' ',text) #&nbsp; stripped incorrectly in corpus?
    text = re.sub(r'\x00',' ',text) #Another space problem?
    
    # Fix partial lines
    text = re.sub(r'\d+(a|b)', ' ', text)
        
    text = text.lower()
    text = replacer.replace(text) #Normalize u/v & i/j    
    
    punctuation ="\"#$%&\'()*+,-/:;<=>@[\]^_`{|}~.?!«»—"
    translator = str.maketrans({key: " " for key in punctuation})
    text = text.translate(translator)
    
    translator = str.maketrans({key: " " for key in '0123456789'})
    text = text.translate(translator)
    
    text = re.sub('[ ]+',' ', text) # Remove double spaces
    text = re.sub('\s+\n+\s+','\n', text) # Remove double lines and trim spaces around new lines
    
    return text.strip()

In [4]:
# # Set up corpus

ll = get_corpus_reader(corpus_name = 'latin_text_latin_library', language = 'latin')
files = ll.fileids()
ovid = "\n".join([preprocess(ll.raw(file)) for file in files if 'ovid.m' in file])
ovid_lines = [line.strip() for line in line_tokenizer.tokenize(ovid)]

In [5]:
ovid_lines[:4]

['in noua fert animus mutatas dicere formas',
 'corpora di coeptis nam uos mutastis et illas',
 'adspirate meis primaque ab origine mundi',
 'ad mea perpetuum deducite tempora carmen']

In [6]:
def alphabetize_tokens(word):
    word = ''.join(sorted(word))
    return word

def alphabetize_tokens_line(line):
    line_tokens = line.split()
    return [alphabetize_tokens(token) for token in line_tokens]

def alphabetize_tuples(word):
    return (word, ''.join(sorted(word)))

def alphabetize_tuples_line(line):
    line_tokens = line.split()
    return [alphabetize_tuples(token) for token in line_tokens]

In [7]:
ahl_lines = [" ".join(alphabetize_tokens_line(line)) for line in ovid_lines]

In [8]:
print("\n".join(ahl_lines[:4]))

in anou efrt aimnsu aamsttu cdeeir afmors
acooprr di ceiopst amn osu aimssttu et aills
aadeiprst eims aeimpqru ab egiinor dimnu
ad aem eempprtuu cddeeitu aemoprt acemnr


In [9]:
ahl_words = reduce(lambda x,y: x+y, [line.split() for line in ahl_lines])

In [10]:
Counter(ahl_words).most_common(25)

[('et', 2269),
 ('in', 1093),
 ('est', 980),
 ('cen', 630),
 ('nno', 588),
 ('cmu', 410),
 ('tu', 381),
 ('aemnt', 304),
 ('aequ', 299),
 ('des', 297),
 ('ad', 286),
 ('epr', 276),
 ('eoqquu', 274),
 ('hiim', 273),
 ('doqu', 270),
 ('aert', 268),
 ('is', 264),
 ('aill', 221),
 ('eill', 212),
 ('amqu', 210),
 ('iqu', 206),
 ('ab', 190),
 ('atu', 190),
 ('em', 190),
 ('diqu', 185)]

In [11]:
for i in range(1, 18):
    words = [word for word in ahl_words if len(word) == i]
    if words:
        count = Counter(words).most_common(5)
        count_disp = ", ".join([item[0] for item in count])
        print(f'Most common anagram forms of length {i}:\n{count_disp}\n')
    else:
        print(f'No words of length {i}')

Most common anagram forms of length 1:
a, o, e, i, c

Most common anagram forms of length 2:
et, in, tu, ad, is

Most common anagram forms of length 3:
est, cen, nno, cmu, des

Most common anagram forms of length 4:
aequ, hiim, doqu, aert, aill

Most common anagram forms of length 5:
aemnt, diitx, aeprt, aimno, eenqu

Most common anagram forms of length 6:
eoqquu, iinqtu, coprsu, ellstu, lstuuu

Most common anagram forms of length 7:
ceooprr, acooprr, aceoprt, aemoprt, cinnoux

Most common anagram forms of length 8:
aeginnsu, amopqstu, aabcchir, eiipprtu, inoprstu

Most common anagram forms of length 9:
ciinosttt, aaegirttu, aiimnossu, aeimnqquu, aeeinnstt

Most common anagram forms of length 10:
aeeipqrrtu, aceemqquuu, ceeginnort, deeimoqttu, aacdeioprr

Most common anagram forms of length 11:
aaeeilmnqtu, aceilmprttu, aceiilmqrsu, abceeilsstu, aaceeilmqru

Most common anagram forms of length 12:
aaceilmqrsuu, ccciiimprstu, ceiiimnrrstu, aaeinoprrstt, abcdeeiiilmn

Most common anagram

In [12]:
ahl_tuples_lines = [alphabetize_tuples_line(line) for line in ovid_lines]

In [13]:
ahl_tuples = reduce(lambda x,y: x+y, [line for line in ahl_tuples_lines])

In [14]:
from collections import defaultdict
ahl_dict = defaultdict(set)
for v, k in ahl_tuples:
    ahl_dict[k].add(v)

In [15]:
for i in range(4, 8):
    print(f'Anagram combinations with {i} word forms:') 
    anagrams = sorted([item for item in ahl_dict.items() if len(item[1]) == i])
    if anagrams:
        for anagram in anagrams:
            print(f'{anagram[0]}: {", ".join(sorted(anagram[1]))}')
        print('\n')
    else:
        print('None')

Anagram combinations with 4 word forms:
aablstu: ablatus, balatus, stabula, tabulas
acelt: calet, celat, lacte, lecta
acers: acres, arces, ceras, sacer
acerstu: creatus, curetas, recusat, teucras
acirsu: acrius, carius, icarus, raucis
ademnt: dantem, demant, madent, tandem
adert: ardet, daret, derat, tarde
aeeprrt: paterer, praeter, reparet, reperta
aegimnr: germani, gramine, margine, reginam
aeginr: ignare, negari, nigrae, regina
aeintu: auenti, ueniat, uineta, uitane
aeiprt: aperit, aperti, rapiet, rapite
aelt: alte, late, tale, tela
aemns: amens, amnes, manes, mensa
aeoprst: apertos, postera, poteras, sperato
aeprt: paret, parte, pater, patre
aerst: artes, rates, serta, stare
ailt: alit, alti, lati, tali
aimno: animo, minoa, omina, omnia
aimrs: armis, maris, ramis, rimas
aioprt: optari, patior, patrio, potiar
aipr: apri, pari, rapi, ripa
aiprs: apris, paris, rapis, ripas
aiprt: parit, patri, rapit, rapti
aips: apis, ipsa, pias, pisa
airst: arsit, artis, atris, ratis
airt: arti, atri