###### Replicating Classics <span style='font-style: normal; '>1</span>

# Packard 1974: Replicating 'Sound-Patterns in Homer'
### Part 1: Tabular Data
#### by Patrick J. Burns ([@diyclassics](https://twitter.com/diyclassics))

Published 8.16.18; last updated 8.16.18

In this first installment of *Replicating Classics*, I review David W. Packard's 1974 article for *Transactions of the American Philological Association*, ["Sound-Patterns in Homer"](https://doi.org/10.2307/2936092). Packard writes this article as an empirical defense of Homer as the "most polyphonous of poets" (cf. D.H. *De Comp.* 16) and seeks to argue for alliterative and related euphonic effects in the *Iliad* and *Odyssey* from the textual data itself. He enters a debate between critics who see no intended sound-pattern effects (e.g. Walter Leaf) and those for whom it is an essential quality of the verse (e.g. W. B. Stanford). He cites previous work that brought statistics to bear on similar problems from J. A. Scott and O. J. Todd, who (independent of each other) published studies based on frequency data for the letter sigma in Greek literature. But with this article, Packard brings such work to much larger scale by systematically investigating all sound patterns in both of the Homeric epics: "I have tabulated the frequency of various sounds in Homer" (240).

Packard's work does an admirable job of tabulating Homer-as-data and then building a series of literary critical arguments from this tabulation, for example to highlight passages where letter frequency is used to expressive effect such as the pronounced number of liquid sounds in the description of the river Scamander at *Il.* 7.329 (250) or to signal the presence of "harsh" sounds in the Cyclops scene beginning at *Od.* 989 (257). Stanford would call Packard's work "an indispensable checklist for remarks on Homer's euphony" (1981: 139 n. 1). The study continues to be cited and has influenced quanitative work on euphony, alliteration, and other quantitative approaches to classics scholarship, such as \[Craik and Kaferly 1987\] and \[Forstall and Scheirer 2011\], to name just a very small sample.

![Table 1 from Packard 1974](../img/packard-1974-table-1.png "Table 1 from Packard 1974")

**Figure 1. The beginning of Packard's *Iliad* table is given below.**

My main goal in this *Replicating Classics* series is to recreate important computational and/or quantitative studies in the field—to whatever degree that is possible or even practical—using current best practices for data-driven research. By this, I mean the following: 1. the use of a version-controlled repository (i.e. GitHub) to store the code *and* the data in one place; 2. the use of code notebooks (i.e. Jupyter) to formalize all transformations of the data and production of tables, figures, texts, etc.; and 3. the use of dependency management (i.e. pipenv) to ensure that code can be run by different users on different machines with as little difficulty in setup and with identical (or predictably similar) results. In short, I'm aiming for reproducible classics research. Moreover, I want there to be a pedagogical aspect to this series as well. Accordingly, especially at the beginning of each notebook, code blocks are broken down into small steps, with explanatory texts, comments, and output as necessary. I would like readers of this series to be able to pick up the basics of Python text processing and the use of key (text-as-)data science packages like Pandas which they could then use on different datasets in their own work.

In Part 1 of "Packard 1974: Replicating 'Sound-Patterns in Homer'", I recreate to the best of my ability the tables include in the article which give the "sound densities" per line of each letter, diphthong, or letter classes in the *Iliad* and the *Odyssey*. This installment of *Replicating Classics* will then continue with four more parts in future notebooks:
- Part 2: Defining "unusual" sound densities in the *Iliad* and the *Odyssey*
- Part 3: Looking at consonant clusters at word-boundaries
- Part 4: Measuring "smooth" and "harsh" verses in Homer
- Part 5: Further problems and continued debates

In [1]:
# Imports

import pandas as pd

from collections import Counter
from pprint import pprint

In [2]:
# Data
#
# These files are plaintext conversion of the TEI XML files available from the Perseus Digital Library under a Creative Commons Attribution-ShareAlike 3.0 United States License. See ../data/text/readme.md for more info

iliad_file = '../data/texts/iliad.txt'
odyssey_file = '../data/texts/odyssey.txt'

In [3]:
# Get line information

with open(iliad_file, 'r') as f:
    iliad_raw = f.read()
    
with open(odyssey_file, 'r') as f:
    odyssey_raw = f.read()    

In [4]:
# Show sample from text

pprint(iliad_raw.split('\n')[:5])

['μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος',
 'οὐλομένην, ἣ μυρί᾽ Ἀχαιοῖς ἄλγε᾽ ἔθηκε,',
 'πολλὰς δ᾽ ἰφθίμους ψυχὰς Ἄϊδι προΐαψεν',
 'ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν',
 'οἰωνοῖσί τε πᾶσι, Διὸς δ᾽ ἐτελείετο βουλή,']


In [5]:
# Preprocessing and text processing functions

import sys
import unicodedata
import re

def remove_punctuation(text_with_punctuation):
    text_without_punctuation = re.sub(r'[^\w\s]','', text_with_punctuation)
    return text_without_punctuation

def remove_spaces(text_with_spaces):
    text_without_spaces = text_with_spaces.replace(' ', '')
    return text_without_spaces

def remove_diacriticals(text_with_diacriticals):
    ''''''
    combining_character_table = dict.fromkeys(c for c in range(sys.maxunicode) 
                                          if unicodedata.combining(chr(c))
                                         )
    text_with_diacriticals = unicodedata.normalize('NFD', text_with_diacriticals)
    text_without_diacriticals = text_with_diacriticals.translate(combining_character_table)
    return text_without_diacriticals

# Customing text processing from (Packard, 1974)

def protect_diaeresis(text):
    diaresis_chars = list('ϊϋΐΰῒῢῗῧ')
    for char in diaresis_chars:
        text = text.replace(char, f'|{char}')
        return text

def make_iota_adscript(text):
    text = unicodedata.normalize('NFD', text)
    text = text.replace('\u0345','ι')
    text = unicodedata.normalize('NFC', text)
    return text

def replace_consonants(text):
    """
    See n. 8 for details; P. replaces xi and psi, but ("perhaps arbitrarily") not zeta.
    """
    clusters = [('ξ', 'κς'), ('ψ', 'πς'), ('σ', 'ς'),]
    for cluster in clusters:
        text = text.replace(cluster[0], cluster[1])
    return text

def replace_diphthongs(text):
    """
    See n. 8 for details; P. writes that the "vowels exclude diphthongs which are listed separately"; 
    I have assigned them single characters (numerals, an arbitrary choice) for ease of splitting the texts.
    """
    clusters = [('αι', '0'), ('αυ', '1'), ('ει', '2'), ('ευ', '3'), 
                ('οι', '4'), ('ου', '5'), ('υι', '6'), ('ηι', '7'), 
                ('ηυ', '8'), ('ωι', '9'),]
    
    
    for cluster in clusters:
        text = text.replace(cluster[0], cluster[1])
    return text


def preprocess(text):
    text = text.lower()
    text = protect_diaeresis(text)
    text = make_iota_adscript(text)
    text = remove_diacriticals(text)
    text = replace_consonants(text)
    text = replace_diphthongs(text)
    text = remove_punctuation(text)    
    text = remove_spaces(text)
    return text

In [6]:
# Preprocess texts

iliad_text = iliad_text_orig = preprocess(iliad_raw)
odyssey_text = odyssey_text_orig = preprocess(odyssey_raw)

In [7]:
# Show sample from preprocessed text

print('Before:')
pprint(iliad_raw.split('\n')[:5])
print('\nAfter:')
pprint(iliad_text.split('\n')[:5])

Before:
['μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος',
 'οὐλομένην, ἣ μυρί᾽ Ἀχαιοῖς ἄλγε᾽ ἔθηκε,',
 'πολλὰς δ᾽ ἰφθίμους ψυχὰς Ἄϊδι προΐαψεν',
 'ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν',
 'οἰωνοῖσί τε πᾶσι, Διὸς δ᾽ ἐτελείετο βουλή,']

After:
['μηνινα2δεθεαπηληιαδεωαχιληος',
 '5λομενηνημυριαχ04ςαλγεεθηκε',
 'πολλαςδιφθιμ5ςπςυχαςαιδιπρ4απςεν',
 'ηρωων1τ5ςδεελωριατ3χεκυνεςςιν',
 '4ων4ςιτεπαςιδιοςδετελ2ετοβ5λη']


In [8]:
# Break texts into lines

iliad_lines = iliad_text.split('\n')
odyssey_lines = odyssey_text.split('\n')

In [9]:
# Get total lines for each work

iliad_lines_count = len(iliad_lines)
odyssey_lines_count = len(odyssey_lines)

print(f'There are {iliad_lines_count} lines in the Iliad.')
print(f'There are {odyssey_lines_count} lines in the Odyssey.')

There are 15683 lines in the Iliad.
There are 12107 lines in the Odyssey.


## Checking specific counts

In [10]:
# Count betas in the Iliad; a good place to start as there is not much room for orthographic variation/defintion
# in this character. (As opposed to, say, vowels, which vary according to diacriticals.)

test = 'β'

counts = []

# Make a counter for each line, limited to only the test character
for line in iliad_lines:
    count = Counter([char for char in [*line.lower().strip()] if char == test])
    counts.append(count)

# Transform the counter to a list of tuples    
d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

# Add zero counts by summing values and subtracting from total lines
sound_density[(test, 0)] = iliad_lines_count - sum(Counter(d).values())

# Print result
sorted(sound_density.items())

[(('β', 0), 13127),
 (('β', 1), 2288),
 (('β', 2), 249),
 (('β', 3), 17),
 (('β', 4), 2)]

Encouraging results! Very encouraging, in fact. Packard prints the following counts:

β, 0: 13127
β, 1: 2287
β, 2: 249
β, 3: 17
β, 4: 2

The only difference here is that I pick up one (1) additional line with a single beta—2288 v. 2287. The other figures are exact matches. This could very well be due to errors in Packard's source texts. He writes (240 n. 8) that his source text "contains some errors," though these are not specifically noted. Of course, this could also be due to errors in my source texts, or variation between editions. (Both Packard's source and the Perseus text that I use here are based on Munro's 1920 Oxford text; but there could any number of differences in encoding between the two.) Packard mentions in his note that the errors are not numerous enough "to alter significantly the statistics." We are firmly on this ground as well.

Let's look at another consonant—namely, sigma...

In [11]:
# Make function to produce 'densities'

def make_line_densities_char(lines, char):
    counts = []

    # Make a counter for each line, limited to only the test character
    for line in lines:
        count = Counter([char_ for char_ in [*line.lower().strip()] if char_ == char])
        counts.append(count)

    # Transform the counter to a list of tuples    
    d = [(k, v) for f in counts for k, v in f.items()]

    # Make a new counter of the list of tuples
    sound_density = Counter(d)

    # Add zero counts by summing values and subtracting from total lines
    sound_density[(char, 0)] = len(lines) - sum(Counter(d).values())

    # Print result
    return sorted(sound_density.items())

In [12]:
pprint(make_line_densities_char(iliad_lines, 'ς'))

[(('ς', 0), 824),
 (('ς', 1), 2587),
 (('ς', 2), 3907),
 (('ς', 3), 3919),
 (('ς', 4), 2608),
 (('ς', 5), 1235),
 (('ς', 6), 450),
 (('ς', 7), 126),
 (('ς', 8), 23),
 (('ς', 9), 3),
 (('ς', 10), 1)]


Also encouraging. Packard prints the following counts:

ς,  0: 824
ς,  1: 2585
ς,  2: 3908
ς,  3: 3918
ς,  4: 2608
ς,  5: 1236
ς,  6: 450
ς,  7: 126
ς,  8: 23
ς,  9: 3
ς, 10: 1

More variation, but not much, and still nothing that would "alter significantly the statistics." Slightly variations in the numbers for 1, 2, 3, and 5—and even these all within counts of 2; the remaining figures match Packard's exactly. This particular consontant count is encouraging since it includes the conversion of both ξ and ψ (to κς and πς, respectively) as well as the normalization of σ to ς. If there were a consontant that could present difficulty in replicating Packard's method, sigma would be the best candidate.

Let's look now at a vowel...

In [13]:
pprint(make_line_densities_char(iliad_lines, 'α'))

[(('α', 0), 658),
 (('α', 1), 2196),
 (('α', 2), 3720),
 (('α', 3), 3680),
 (('α', 4), 2802),
 (('α', 5), 1580),
 (('α', 6), 718),
 (('α', 7), 236),
 (('α', 8), 71),
 (('α', 9), 19),
 (('α', 10), 2),
 (('α', 11), 1)]


Less encouraging, but not hopelessly so. Packard prints the following counts:

α,  0: 671
α,  1: 2214
α,  2: 3723
α,  3: 3676
α,  4: 2791
α,  5: 1565
α,  6: 714
α,  7: 235
α,  8: 71
α,  9: 19
α, 10: 2
α, 11: 1

First, the good news. The long tail looks good; that is, the counts for 8, 9, 10, and 11 are matches. The rest have variations that I am unable to account for. I will plead 1. lack of access to the original data and code, and 2. a certain amount of obscurity in the reported method.

As for the first point, all I know from the article is that "the statistics were compiled by computer from a text originally prepared originally by A. Q. Morton" (241 n. 8). This is clear without being precise. I simply do not know which version of this data was used (and so cannot check the source), what was involved—including encoding, manipulation, etc. of the text—in the compliation, or what program was used to generate the statistics. Accordingly, I am as pleased to see results that are within an acceptable range as I am to see exact matches.

As for the second point, Packard writes that "the counts for the vowels exclude diphthongs which are listed separately; but long and short α, ι, and υ are not distinguished" (243 n. 8). I have tried to accommodate this and have made this manipulation of the text clear in the preprocessing scripts above (spec. ```protect_diaeresis``` and ```convert_iota_subscript```). But in the end I am left unsure that I have handled this transformation *exactly* as Packard did as well as unsure whether or not there are edge cases that we have handled differently.

In [14]:
pprint(make_line_densities_char(iliad_lines, '9'))

[(('9', 0), 14027),
 (('9', 1), 1462),
 (('9', 2), 186),
 (('9', 3), 7),
 (('9', 4), 1)]


This is more difficult to diagnose. I have used '9' to record instances in the text of ωι. Packard prints the following counts:

ωι, 0: 10742
ωι, 1: 1185
ωι, 2: 175
ωι, 3: 8
ωι, 4: 0

As opposed to the beta, sigma, and even alpha counts reported above, there is large difference between the figures for the ωι diphthong. Did we use different for working with iota subscripts/adscripts? Is it possible that the iotas are simply encoded differently in the source texts? I don't know. Predicably though, the single omega and iota counts are similarly off...

In [15]:
pprint(make_line_densities_char(iliad_lines, 'ω'))

[(('ω', 0), 6569),
 (('ω', 1), 5728),
 (('ω', 2), 2524),
 (('ω', 3), 673),
 (('ω', 4), 165),
 (('ω', 5), 20),
 (('ω', 6), 4)]


In [16]:
pprint(make_line_densities_char(iliad_lines, 'ι'))

[(('ι', 0), 2915),
 (('ι', 1), 5379),
 (('ι', 2), 4416),
 (('ι', 3), 2162),
 (('ι', 4), 634),
 (('ι', 5), 145),
 (('ι', 6), 30),
 (('ι', 7), 2)]


There is likely a problem of definition here about when to count a single ω or a single ι or the diphthong ωι. I will continue to experiment with such formalizations and definitions in search of better alignment between Packard's figures and my own. (For a similar discussion of discrepancies in an attempt to match Packard's counts, see Lynam 2012, 23-24.) Until then, I will focus on consonant patterns only in the remainder of his argument.

### Packard 1974: Table 1

Here is the code for recreating Table 1 in Packard 1974. I break it up into a series of small steps to make clear the train of thought. The basic steps are: 1. Get the letter frequencies for each line; 2. Get the frequency distribution of lines including each count; 3. Make a pivot table to present a summary of this frequency distribution.  

In [17]:
counts = []

# Make a counter for each line, limited to only the test character
for line in iliad_lines:
    count = Counter([char for char in [*line.lower().strip()]])
    counts.append(count)

for i, count in enumerate(counts[:3]):
    print(f'Line {i+1}:\n{sorted(count.items())}\n')

Line 1:
[('2', 1), ('α', 4), ('δ', 2), ('ε', 3), ('η', 4), ('θ', 1), ('ι', 3), ('λ', 2), ('μ', 1), ('ν', 2), ('ο', 1), ('π', 1), ('ς', 1), ('χ', 1), ('ω', 1)]

Line 2:
[('0', 1), ('4', 1), ('5', 1), ('α', 2), ('γ', 1), ('ε', 4), ('η', 3), ('θ', 1), ('ι', 1), ('κ', 1), ('λ', 2), ('μ', 2), ('ν', 2), ('ο', 1), ('ρ', 1), ('ς', 1), ('υ', 1), ('χ', 1)]

Line 3:
[('4', 1), ('5', 1), ('α', 4), ('δ', 2), ('ε', 1), ('θ', 1), ('ι', 4), ('λ', 2), ('μ', 1), ('ν', 1), ('ο', 1), ('π', 4), ('ρ', 1), ('ς', 5), ('υ', 1), ('φ', 1), ('χ', 1)]



In [18]:
# Transform the counter to a list of tuples  

d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

print(sorted(sound_density.items())[:25])

[(('0', 1), 5555), (('0', 2), 1692), (('0', 3), 299), (('0', 4), 40), (('0', 5), 4), (('1', 1), 2049), (('1', 2), 106), (('1', 3), 5), (('2', 1), 5148), (('2', 2), 1287), (('2', 3), 169), (('2', 4), 9), (('2', 5), 1), (('3', 1), 2462), (('3', 2), 159), (('3', 3), 7), (('4', 1), 4803), (('4', 2), 1332), (('4', 3), 217), (('4', 4), 33), (('4', 5), 2), (('4', 6), 2), (('5', 1), 3897), (('5', 2), 828), (('5', 3), 82)]


In [19]:
# Add zero counts

letters = set([l for l,v in sound_density.keys()])

for letter in letters:
    density_count = sum([v for k, v in sound_density.items() if k[0] == letter])
    sound_density[(letter, 0)] = iliad_lines_count - density_count

print(sorted(sound_density.items())[:25]) 

[(('0', 0), 8093), (('0', 1), 5555), (('0', 2), 1692), (('0', 3), 299), (('0', 4), 40), (('0', 5), 4), (('1', 0), 13523), (('1', 1), 2049), (('1', 2), 106), (('1', 3), 5), (('2', 0), 9069), (('2', 1), 5148), (('2', 2), 1287), (('2', 3), 169), (('2', 4), 9), (('2', 5), 1), (('3', 0), 13055), (('3', 1), 2462), (('3', 2), 159), (('3', 3), 7), (('4', 0), 9294), (('4', 1), 4803), (('4', 2), 1332), (('4', 3), 217), (('4', 4), 33)]


In [20]:
# Replace codes used for diphthongs in preprocessing

vowel_clusters = [('αι', '0'), ('αυ', '1'), ('ει', '2'), ('ευ', '3'), 
            ('οι', '4'), ('ου', '5'), ('υι', '6'), ('ηι', '7'), 
            ('ηυ', '8'), ('ωι', '9'),]
vowel_clusters = dict([(str(i), cluster[0]) for i, cluster in enumerate(vowel_clusters)])

# Flatten dicts to tuples to prepare data for pandas

flat_sound_density = []

for k, v in sound_density.items():
    if k[0] in vowel_clusters.keys():
        flat_sound_density.append((vowel_clusters[k[0]], k[1], v))
    else:
        flat_sound_density.append((k[0], k[1], v))

print(sorted(flat_sound_density)[:25])

[('α', 0, 658), ('α', 1, 2196), ('α', 2, 3720), ('α', 3, 3680), ('α', 4, 2802), ('α', 5, 1580), ('α', 6, 718), ('α', 7, 236), ('α', 8, 71), ('α', 9, 19), ('α', 10, 2), ('α', 11, 1), ('αι', 0, 8093), ('αι', 1, 5555), ('αι', 2, 1692), ('αι', 3, 299), ('αι', 4, 40), ('αι', 5, 4), ('αυ', 0, 13523), ('αυ', 1, 2049), ('αυ', 2, 106), ('αυ', 3, 5), ('β', 0, 13127), ('β', 1, 2288), ('β', 2, 249)]


In [21]:
# Build pandas dataframe and pivot table

df = pd.DataFrame(flat_sound_density, columns=['letter', 'density', 'count'])
table = pd.pivot_table(df, values='count', index='letter', columns=['density'])

# Use Packard 1974 list sort
reorder_list = ['α', 'β', 'γ', 'δ', 'ε', 'ζ', 'η', 
       'θ', 'ι', 'κ', 'λ', 'μ', 'ν', 'ο', 'π', 'ρ', 'ς', 'τ', 'υ',
       'φ', 'χ', 'ω', 'αι', 'αυ', 'ει', 'ευ', 'οι', 'ου', 'υι', 'ηι', 'ηυ', 'ωι']
table = table.reindex(reorder_list)

print('Table 1a. Sound Densities in the Iliad')
iliad_table = table
iliad_table

Table 1a. Sound Densities in the Iliad


density,0,1,2,3,4,5,6,7,8,9,10,11
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
α,658.0,2196.0,3720.0,3680.0,2802.0,1580.0,718.0,236.0,71.0,19.0,2.0,1.0
β,13127.0,2288.0,249.0,17.0,2.0,,,,,,,
γ,9426.0,4806.0,1205.0,223.0,21.0,2.0,,,,,,
δ,4457.0,6296.0,3483.0,1165.0,244.0,35.0,3.0,,,,,
ε,288.0,1413.0,2923.0,3857.0,3450.0,2138.0,1058.0,402.0,115.0,28.0,9.0,2.0
ζ,14548.0,1107.0,28.0,,,,,,,,,
η,4683.0,6042.0,3367.0,1214.0,317.0,54.0,6.0,,,,,
θ,8459.0,5451.0,1550.0,200.0,22.0,1.0,,,,,,
ι,2915.0,5379.0,4416.0,2162.0,634.0,145.0,30.0,2.0,,,,
κ,4343.0,6103.0,3669.0,1221.0,291.0,46.0,9.0,1.0,,,,


One additional step is necessary to reproduce Packard's table. He includes at the bottom of the table four rows for letter classes, i.e. labials, dentals, gutturals, and liquids, as well as a fifth for nasalized gamma.

In [22]:
# Replace consontants by class...

iliad_text = iliad_text_orig

labials = ['π', 'β', 'φ']
dentals = ['τ', 'δ', 'θ']
gutturals = ['κ', 'γ', 'χ']
liquids = ['λ', 'ρ', 'μ', 'ν']

for labial in labials:
    iliad_text = iliad_text.replace(labial, 'P')
    
for dental in dentals:
    iliad_text = iliad_text.replace(dental, 'T')

for guttural in gutturals:
    iliad_text = iliad_text.replace(guttural, 'K')
    
for liquid in liquids:
    iliad_text = iliad_text.replace(liquid, 'L')

    
print(iliad_text[:100])

LηLιLα2TεTεαPηLηιαTεωαKιLηος
5LοLεLηLηLυLιαK04ςαLKεεTηKε
PοLLαςTιPTιL5ςPςυKαςαιTιPL4αPςεL
ηLωωL1T5ςT


In [23]:
iliad_lines = iliad_text.split('\n')

counts = []

# Make a counter for each line, limited to only the test character
for line in iliad_lines:
    count = Counter([char for char in [*line.strip()] if char in 'PTKL'])
    counts.append(count)
    
# Transform the counter to a list of tuples  

d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

# Add zero counts

letters = set([l for l,v in sound_density.keys()])

for letter in letters:
    density_count = sum([v for k, v in sound_density.items() if k[0] == letter])
    sound_density[(letter, 0)] = iliad_lines_count - density_count

# Flatten dicts to tuples to prepare data for pandas

flat_sound_density = []

for k, v in sound_density.items():
    if k[0] in vowel_clusters.keys():
        flat_sound_density.append((vowel_clusters[k[0]], k[1], v))
    else:
        flat_sound_density.append((k[0], k[1], v))

# Build pandas dataframe and pivot table

df = pd.DataFrame(flat_sound_density, columns=['letter', 'density', 'count'])
table = pd.pivot_table(df, values='count', index='letter', columns=['density'])

print('Table 1b. Class Densities in the Iliad')
iliad_classes_table = table
iliad_classes_table

Table 1b. Class Densities in the Iliad


density,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
K,1360.0,3894.0,4875.0,3317.0,1524.0,532.0,150.0,26.0,4.0,,1.0,,,,,
L,1.0,27.0,159.0,538.0,1344.0,2253.0,2975.0,2985.0,2391.0,1575.0,846.0,374.0,144.0,52.0,18.0,1.0
P,2196.0,4560.0,4668.0,2814.0,1010.0,362.0,65.0,5.0,1.0,2.0,,,,,,
T,178.0,917.0,2400.0,3719.0,3799.0,2596.0,1340.0,513.0,177.0,40.0,4.0,,,,,


In [24]:
# Replace nasalized gamma...

iliad_text = iliad_text_orig

gutturals = ['κ', 'γ', 'χ']

for guttural in gutturals:
    iliad_text = iliad_text.replace(f'γ{guttural}', 'ŋ')

iliad_lines = iliad_text.split('\n')

counts = []

# Make a counter for each line, limited to only the test character
for line in iliad_lines:
    count = Counter([char for char in [*line.strip()] if char in 'ŋ'])
    counts.append(count)
    
# Transform the counter to a list of tuples  

d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

# Add zero counts

letters = set([l for l,v in sound_density.keys()])

for letter in letters:
    density_count = sum([v for k, v in sound_density.items() if k[0] == letter])
    sound_density[(letter, 0)] = iliad_lines_count - density_count

# Flatten dicts to tuples to prepare data for pandas

flat_sound_density = []

for k, v in sound_density.items():
    if k[0] in vowel_clusters.keys():
        flat_sound_density.append((vowel_clusters[k[0]], k[1], v))
    else:
        flat_sound_density.append((k[0], k[1], v))

# Build pandas dataframe and pivot table

df = pd.DataFrame(flat_sound_density, columns=['letter', 'density', 'count'])
table = pd.pivot_table(df, values='count', index='letter', columns=['density'])

print('Table 1b. Nasalized Gamma Densities in the Iliad')
iliad_gamma_table = table
iliad_gamma_table    

Table 1b. Nasalized Gamma Densities in the Iliad


density,0,1,2
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ŋ,14983,694,6


In [25]:
# Concatenate the three tables; replace NaN and sort

table1 = pd.concat([iliad_table, iliad_classes_table, iliad_gamma_table])

# Replace NaN
table1 = table1.fillna(value=0)
table1 = table1.astype(int)
table1[table1 == 0] = ''

# Use Packard 1974 list sort
reorder_list = ['α', 'β', 'γ', 'δ', 'ε', 'ζ', 'η', 
       'θ', 'ι', 'κ', 'λ', 'μ', 'ν', 'ο', 'π', 'ρ', 'ς', 'τ', 'υ',
       'φ', 'χ', 'ω', 'αι', 'αυ', 'ει', 'ευ', 'οι', 'ου', 'υι', 'ηι', 'ηυ', 'ωι', 
               'L', 'P', 'T', 'K', 'ŋ']
table1 = table1.reindex(reorder_list)

print('Table 1. Sound Densities in the Iliad')
table1

Table 1. Sound Densities in the Iliad


density,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
α,658,2196,3720.0,3680.0,2802.0,1580.0,718.0,236.0,71.0,19.0,2.0,1.0,,,,
β,13127,2288,249.0,17.0,2.0,,,,,,,,,,,
γ,9426,4806,1205.0,223.0,21.0,2.0,,,,,,,,,,
δ,4457,6296,3483.0,1165.0,244.0,35.0,3.0,,,,,,,,,
ε,288,1413,2923.0,3857.0,3450.0,2138.0,1058.0,402.0,115.0,28.0,9.0,2.0,,,,
ζ,14548,1107,28.0,,,,,,,,,,,,,
η,4683,6042,3367.0,1214.0,317.0,54.0,6.0,,,,,,,,,
θ,8459,5451,1550.0,200.0,22.0,1.0,,,,,,,,,,
ι,2915,5379,4416.0,2162.0,634.0,145.0,30.0,2.0,,,,,,,,
κ,4343,6103,3669.0,1221.0,291.0,46.0,9.0,1.0,,,,,,,,


### Packard 1974: Table 2

In [26]:
# Same as above w/ Odyssey; kept all in one cell for ease of use. Could be refactored to minimize code repetition

counts = []

# Make a counter for each line, limited to only the test character
for line in odyssey_lines:
    count = Counter([char for char in [*line.lower().strip()]])
    counts.append(count)
    
# Transform the counter to a list of tuples  

d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

# Add zero counts

letters = set([l for l,v in sound_density.keys()])

for letter in letters:
    density_count = sum([v for k, v in sound_density.items() if k[0] == letter])
    sound_density[(letter, 0)] = odyssey_lines_count - density_count
    
# Replace codes used for diphthongs in preprocessing

vowel_clusters = [('αι', '0'), ('αυ', '1'), ('ει', '2'), ('ευ', '3'), 
            ('οι', '4'), ('ου', '5'), ('υι', '6'), ('ηι', '7'), 
            ('ηυ', '8'), ('ωι', '9'),]
vowel_clusters = dict([(str(i), cluster[0]) for i, cluster in enumerate(vowel_clusters)])

# Flatten dicts to tuples to prepare data for pandas

flat_sound_density = []

for k, v in sound_density.items():
    if k[0] in vowel_clusters.keys():
        flat_sound_density.append((vowel_clusters[k[0]], k[1], v))
    else:
        flat_sound_density.append((k[0], k[1], v))

# Build pandas dataframe and pivot table

df = pd.DataFrame(flat_sound_density, columns=['letter', 'density', 'count'])
odyssey_table = pd.pivot_table(df, values='count', index='letter', columns=['density'], fill_value=0)


# Replace consontants by class...

odyssey_text = odyssey_text_orig

labials = ['π', 'β', 'φ']
dentals = ['τ', 'δ', 'θ']
gutturals = ['κ', 'γ', 'χ']
liquids = ['λ', 'ρ', 'μ', 'ν']

for labial in labials:
    odyssey_text = odyssey_text.replace(labial, 'P')
    
for dental in dentals:
    odyssey_text = odyssey_text.replace(dental, 'T')

for guttural in gutturals:
    odyssey_text = odyssey_text.replace(guttural, 'K')
    
for liquid in liquids:
    odyssey_text = odyssey_text.replace(liquid, 'L')


odyssey_lines = odyssey_text.split('\n')

counts = []

# Make a counter for each line, limited to only the test character
for line in odyssey_lines:
    count = Counter([char for char in [*line.strip()] if char in 'PTKL'])
    counts.append(count)
    
# Transform the counter to a list of tuples  

d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

# Add zero counts

letters = set([l for l,v in sound_density.keys()])

for letter in letters:
    density_count = sum([v for k, v in sound_density.items() if k[0] == letter])
    sound_density[(letter, 0)] = odyssey_lines_count - density_count

# Flatten dicts to tuples to prepare data for pandas

flat_sound_density = []

for k, v in sound_density.items():
    if k[0] in vowel_clusters.keys():
        flat_sound_density.append((vowel_clusters[k[0]], k[1], v))
    else:
        flat_sound_density.append((k[0], k[1], v))

# Build pandas dataframe and pivot table

df = pd.DataFrame(flat_sound_density, columns=['letter', 'density', 'count'])
table = pd.pivot_table(df, values='count', index='letter', columns=['density'])

odyssey_classes_table = table

# Replace nasalized gamma...

odyssey_text = odyssey_text_orig

gutturals = ['κ', 'γ', 'χ']

for guttural in gutturals:
    odyssey_text = odyssey_text.replace(f'γ{guttural}', 'ŋ')

odyssey_lines = odyssey_text.split('\n')

counts = []

# Make a counter for each line, limited to only the test character
for line in odyssey_lines:
    count = Counter([char for char in [*line.strip()] if char in 'ŋ'])
    counts.append(count)
    
# Transform the counter to a list of tuples  

d = [(k, v) for f in counts for k, v in f.items()]

# Make a new counter of the list of tuples
sound_density = Counter(d)

# Add zero counts

letters = set([l for l,v in sound_density.keys()])

for letter in letters:
    density_count = sum([v for k, v in sound_density.items() if k[0] == letter])
    sound_density[(letter, 0)] = odyssey_lines_count - density_count

# Flatten dicts to tuples to prepare data for pandas

flat_sound_density = []

for k, v in sound_density.items():
    if k[0] in vowel_clusters.keys():
        flat_sound_density.append((vowel_clusters[k[0]], k[1], v))
    else:
        flat_sound_density.append((k[0], k[1], v))

# Build pandas dataframe and pivot table

df = pd.DataFrame(flat_sound_density, columns=['letter', 'density', 'count'])
table = pd.pivot_table(df, values='count', index='letter', columns=['density'])

odyssey_gamma_table = table
odyssey_gamma_table    
    
# Concatenate the three tables; replace NaN and sort

table2 = pd.concat([odyssey_table, odyssey_classes_table, odyssey_gamma_table])

# Replace NaN
table2 = table2.fillna(value=0)
table2 = table2.astype(int)
table2[table2 == 0] = ''

# Use Packard 1974 list sort
reorder_list = ['α', 'β', 'γ', 'δ', 'ε', 'ζ', 'η', 
       'θ', 'ι', 'κ', 'λ', 'μ', 'ν', 'ο', 'π', 'ρ', 'ς', 'τ', 'υ',
       'φ', 'χ', 'ω', 'αι', 'αυ', 'ει', 'ευ', 'οι', 'ου', 'υι', 'ηι', 'ηυ', 'ωι', 
               'L', 'P', 'T', 'K', 'ŋ']
table2 = table2.reindex(reorder_list)

print('Table 2. Sound Densities in the Odyssey')
table2          

Table 2. Sound Densities in the Odyssey


density,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
α,530,1910,2829.0,2858.0,1939.0,1243.0,539.0,190.0,47.0,17.0,3.0,2.0,,,,
β,10232,1703,165.0,7.0,,,,,,,,,,,,
γ,7148,3818,930.0,193.0,15.0,3.0,,,,,,,,,,
δ,3367,4937,2734.0,876.0,169.0,23.0,1.0,,,,,,,,,
ε,229,1039,2286.0,3004.0,2571.0,1719.0,797.0,319.0,113.0,24.0,6.0,,,,,
ζ,11221,854,32.0,,,,,,,,,,,,,
η,3188,4446,2942.0,1117.0,342.0,66.0,6.0,,,,,,,,,
θ,6280,4244,1319.0,239.0,25.0,,,,,,,,,,,
ι,2549,4509,3139.0,1423.0,392.0,88.0,6.0,1.0,,,,,,,,
κ,3340,4684,2689.0,1045.0,294.0,49.0,6.0,,,,,,,,,


In [27]:
# Pickle dataframes for future reference

table1.to_pickle('../data/serial/packard-1974-table1.p')
table2.to_pickle('../data/serial/packard-1974-table2.p')

So we now have tables of "sound densities" in both the *Iliad* and the *Odyssey*—built up directly from text files with all transformations documented in the code above—that come reasonably close to Table 1 and Table 2 of \[Packard 1974\]. In the next part of this installment, I will review the first set of literary critical arguments that Packard bases on this data, looking at patterns of "unusual" sound densities in the Homeric epics.

### Works Cited
- Craik, E. M., and Kaferly, D. H. A. 1987. “The Computer and Sophocles’ *Trachiniae*.” *LLC* 2(2): 86–97. [doi:10.1093/llc/2.2.86](https://doi.org/10.1093/llc/2.2.86).
- Forstall, C. W., and Scheirer, W. J. 2011. “Visualizing Sound as Functional N-Grams in Homeric Greek Poetry.” Poster presented at *DH2011*, Stanford University, Palo Alto, CA. [abstract](http://dh2011abstracts.stanford.edu/xtf/view?docId=tei/ab-385.xml).
- Stanford, W.B. 1981. “Sound, Sense, and Music in Greek Poetry.” *G&R* 28: 127–40. [doi:10.1017/S0017383500033234](https://doi.org/10.1017/S0017383500033234).
- Lynam, H. 2012. “Computational Pattern Analysis of Ancient Greek Texts.” Thesis. Trinity College, Dublin, Ireland. [link](http://www.academia.edu/2946558/Computational_Pattern_Analysis_of_Ancient_Greek_Texts).
- Packard, D. W. 1974. “Sound-Patterns in Homer.” *TAPA* 104: 239–60. [doi:10.2307/2936092](https://doi.org/10.2307/2936092).

---

If you have any questions, comments, etc. about this notebook or the *Replicating Classics* series in general, or if you see an error in the code, please open a GitHub issue at [https://github.com/diyclassics/replicating-classics/issues](https://github.com/diyclassics/replicating-classics/issues).