## Feature Statistics

Based on the results from the [mahalanobis analysis](eleg_maha.ipynb), the most interesting features in terms of identifying the _Consolatio_ as non-Ovidian are the overuse of elision, and the underuse of the strong 3rd foot caesura in the hexameter. In this notebook I quickly do some statistics to compare the style of other poets. This is mainly because of the mild claim (aka unproveable theory) that more elision may well be a consequence of the study of Vergil, whom I know as a reader to be very fond of the device but for which I lacked quantitative data.

In [1]:
from mqdq import babble
from mqdq import line_analyzer as la

import numpy as np
import pandas as pd

import glob
from collections import defaultdict

In [2]:
# Can't use the existing corpus because we are creating random samples from the
# whole texts and we need to raw `Babbler` object for that.

non_elegy = []

aen_single_bab = babble.Babbler.from_file(
    "../corpus/VERG-aene.xml", name="Aeneid", author="Vergil"
)
non_elegy.append(aen_single_bab)

geo_single_bab = babble.Babbler.from_file(
    "../corpus/VERG-geor.xml", name="Georgics", author="Vergil"
)
non_elegy.append(geo_single_bab)

sat_single_bab = babble.Babbler.from_file(
    "../corpus/IVV-satu.xml", name="Juv. Sat.", author="Juvenal"
)
non_elegy.append(sat_single_bab)

puni_single_bab = babble.Babbler.from_file(
    "../corpus/SIL-puni.xml", name="Punica", author="Silius"
)
non_elegy.append(puni_single_bab)

theb_single_bab = babble.Babbler.from_file(
    "../corpus/STAT-theb.xml", name="Thebaid", author="Statius"
)
non_elegy.append(theb_single_bab)

met_single_bab = babble.Babbler.from_file(
    "../corpus/OV-meta.xml", name="Metamorphoses", author="Ovid"
)
non_elegy.append(met_single_bab)

phars_single_bab = babble.Babbler.from_file(
    "../corpus/LVCAN-phar.xml", name="Pharsalia", author="Lucan"
)
non_elegy.append(phars_single_bab)

arg_single_bab = babble.Babbler.from_file(
    "../corpus/VAL_FL-argo.xml", name="Argonautica", author="V.Flaccus"
)
non_elegy.append(arg_single_bab)

rena_single_bab = babble.Babbler.from_file(
    "../corpus/LVCR-rena.xml", name="DRN", author="Lucretius"
)
non_elegy.append(rena_single_bab)

horsat_single_bab = babble.Babbler.from_file(
    *sorted(glob.glob("../corpus/HOR-sat*.xml")), name="Hor. Sat.", author="Horace"
)
non_elegy.append(horsat_single_bab)

In [3]:
def subsample(
    ary: list[babble.Babbler], mu, sd: float, n: int, min_length: int = 0
) -> list[babble.Babbler]:
    samps: list[babble.Babbler] = []
    lengths = [
        x for x in np.random.normal(mu, sd, n * 2).astype("int") if x > min_length
    ]
    for i in range(n):
        work = ary[i % len(ary)]
        l = lengths[i]
        start = np.random.randint(len(work) - l)
        b = babble.Babbler(
            work.raw_source[start : start + l],
            name=f"{i}-{work.name}",
            author=work.author,
        )
        samps.append(b)
    return samps

In [4]:
# Why do samples instead of just analyse the whole texts? We want to normalize
# as much as possible for different lengths, and also it is good to have an idea
# of the variability of authorial practice, and analysing the whole text just
# gets us an empirical mean.

non_elegy_samples = subsample(
    non_elegy, mu=100, sd=10, n=100 * len([b.author for b in non_elegy])
)

In [5]:
hexameter = defaultdict(list)

for b in non_elegy_samples:
    elisions = sum([la.elision_count(l) for l in b.raw_source])
    hexameter[f"{b.author}-{b.name.split('-')[1]}"].append(elisions / len(b.raw_source))

## Hexameter Elision Stats

It is interesting to note here that Ovid's elision in the _Metamorphoses_ is almost twice as common as his overall elegiac style. It is more than just 'half of elegy is pentameter', because he (obviously) elides sometimes in the pentameter as well, it seems more likely that there is an aesthetic related either to epic or to continuous hexameters to which Ovid was sensitive.

In [6]:
for k, v in hexameter.items():
    print(f"{k:<25}: Mean: {np.mean(v):.2f} Std: {np.std(v):.2f}")

Vergil-Aeneid            : Mean: 0.53 Std: 0.08
Vergil-Georgics          : Mean: 0.49 Std: 0.06
Juvenal-Juv. Sat.        : Mean: 0.32 Std: 0.08
Silius-Punica            : Mean: 0.43 Std: 0.07
Statius-Thebaid          : Mean: 0.39 Std: 0.08
Ovid-Metamorphoses       : Mean: 0.20 Std: 0.05
Lucan-Pharsalia          : Mean: 0.13 Std: 0.04
V.Flaccus-Argonautica    : Mean: 0.28 Std: 0.08
Lucretius-DRN            : Mean: 0.45 Std: 0.11
Horace-Hor. Sat.         : Mean: 0.41 Std: 0.12


## Pentameter Statistics

Here we can use the corpus from the rest of the study, since the vectors contain the statistics we need on a per-poem basis. Apart from Catullus (ie for Augustan Elegy) we see that elsion is used much more sparingly.

The strong central caesura (the other feature we are interested in) is mostly around 90% (nine lines out of ten). Tibullus is a rebel in this regard. Ovid, it seems, became more punctilious about this feature in later work. The Nux is 'in the middle' but entirely plausible. The _Consolatio_ seems much too low even for his earlier style, although to be fair, if it were genuine it would be a later work. Remember that the raw percentage differences are poor intuition for 'how unusual is this feature'. The Mahalanobis analysis gives a more realistic evaluation, based on the typical variability of each author.

In terms of elision, the style of the _Consolatio_ is clearly divergent from Ovid (it is used more than twice as often), and much more than any elegist except Propertius.

In [7]:
short_elegy = pd.read_csv("elegy_poetic.csv", index_col=0)

In [8]:
short_elegy.groupby(["Author", "Work"])["H3SC"].agg(["mean", "std", "min", "max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,std,min,max
Author,Work,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Baldricus,Baldricus,0.954208,0.034324,0.88764,0.985714
Catullus,Cat.,0.90268,0.088778,0.765957,1.0
Ovid,Am.,0.926371,0.051372,0.785714,1.0
Ovid,Ep.,0.917946,0.044301,0.827586,0.984962
Ovid,Pont.,0.958238,0.039045,0.842105,1.0
Ovid,Tr.,0.960635,0.038219,0.84,1.0
Propertius,Prop.,0.927296,0.055169,0.764706,1.0
Radulfus,Radulfus,0.969691,0.022789,0.935484,1.0
Tibullus,Tib.,0.774148,0.118333,0.5,0.928571
ps-Ovid,Consolatio,0.85654,0.071978,0.797468,0.936709


In [9]:
short_elegy.groupby(["Author", "Work"])["ELC"].agg(["mean", "std"])

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,std
Author,Work,Unnamed: 2_level_1,Unnamed: 3_level_1
Baldricus,Baldricus,0.03603,0.016802
Catullus,Cat.,0.491109,0.312907
Ovid,Am.,0.092543,0.056981
Ovid,Ep.,0.091988,0.030318
Ovid,Pont.,0.07797,0.042059
Ovid,Tr.,0.085079,0.044752
Propertius,Prop.,0.235744,0.10646
Radulfus,Radulfus,0.008557,0.007555
Tibullus,Tib.,0.108277,0.047566
ps-Ovid,Consolatio,0.242616,0.03815


In [10]:
%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Mon Jan 20 2025

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.20.0

numpy : 1.26.4
mqdq  : 0.8.2
pandas: 2.2.2

Watermark: 2.5.0

