# Baselines and Significance Testing

### Conser's baseline

#### Prose baseline

Conser 2020, p. 264: 
> In order to determine how accentual contours might align by chance,
it is first necessary to establish a baseline of chance alignment, using a con-
trol group. In the prose of Lysias’ *Against Eratosthenes*, for example, the rate of
matched accents between sections is **5.6%**, and the rate of compatible syllables
is **73.6%**, providing a minimum baseline for chance alignment.

Continuing in a footnote:

> Random ‘stanza pairs’ were created by pairing odd and even paragraphs of the first 12 sections and trimming the longer section to have the same number of syllables as the shorter.
This resulted in six stanza pairs, each containing an average of 86.5 syllables.

#### Trimeter baseline

Most important:

> Prose, however, is a poor choice of comparison for poetic texts, because of the effect of
metrical responsion. [...] It is not surprising, then, that the percentage of both matched accents and
compatible syllables are higher between sections of iambic trimeter, at **9.7%**
and **76.9%** respectively.

> Random stanza pairs were created by pairing sequential groups of eight lines, drawn from
Antigone 1-96 and 162-321 (Prologue and Episode 1). Resolutions were treated as a single
syllable. This resulted in sixteen stanza pairs, each containing 96 syllables.

**Summary:**

16 antistrophic pairs of 2x8 lines (strikingly, 8 is the mean for Aristophanes' cantica too!)

Accentual responsion: **9.7%**
Compatibility: **76.9%**

## My baseline

Strophe line length should match the mean length of the stanzas in the corpus. Constructing 79 baseline cantica would be overdoing a bit, so 16 seems fine. 

Most importantly, I'm going to make two things beyond Conser:
- triadic and quadratic baselines, and
- lyric Frankenstain cantica, using metres that appear in multiple songs, most importantly 4 tr^, cr and ar.
  

In [10]:
from comp import compatibility_corpus
from statistics import mean

all_sets = compatibility_corpus('compiled/')

flat = []
length = 0
cantica_lengths = []

for play in all_sets:
    for canticum in play:
        for line_group in canticum:
            flat.append(line_group)
            length += 1
    cantica_lengths.append(len(canticum))
mean_cantica_length = mean(cantica_lengths)

print(f'Total nr of sets of responding lines: {length} lines')
print(f'Mean strophe length: {mean_cantica_length} ≈ {round(mean_cantica_length)} lines')


Total nr of sets of responding lines: 296 lines
Mean strophe length: 7.625 ≈ 8 lines


Ach: 4x2x8. 
Excludes the extrametrical lines 43 and 61, and Pseudartabas weirdness, and also lines with anapests, e.g. 
    6,7 (36, 37 instead)

cantica = [
    [(1, 8), (9, 17)],
    [(18, 26), (27, 35)],
    [(72, 80), (81, 89)],
    [(108, 116), (117, 125)]
]

Eq. 4x2x8

cantica = [
    [(18, 26), (27, 35)],
    [(72, 80), (81, 89)],
    [(108, 116), (117, 125)],
    [(126, 134), (135, 143)]
]

Nu. 
Av.

Let's find trochaic tetrameter catalectics, and make a pseudo-canticum with them!

In [8]:
from lxml import etree
from pathlib import Path

compiled = [
    Path('scan') / file for file in [
        'responsion_ach_scan.xml', 
        'responsion_av_scan.xml', 
        'responsion_eq_scan.xml', 
        'responsion_nu_scan.xml', 
        'responsion_pax_scan.xml', 
        'responsion_v_scan.xml'
    ]
]

trochaic_tetrameter_catalectic = []

for xml_file in compiled:
    try:
        tree = etree.parse(xml_file)
        for l in tree.xpath("//l[@metre='4 tr^']"):
            text = ''.join(l.itertext()) # good lxml method to know; recursively joins all texts and tails
            title = tree.xpath("//title/text()")[0]
            provenience = title + l.attrib['n']
            if text == '' or l.attrib.get('skip', 'False') == 'True':
                continue
            complete_description = [provenience, text]
            trochaic_tetrameter_catalectic.append(complete_description)

    except etree.XMLSyntaxError as e:
        print(f"Error parsing XML: {e}")

print(trochaic_tetrameter_catalectic)
print(f'Lyric trochaic tetrameter catalectics found: {len(trochaic_tetrameter_catalectic)}')

with open('scan/lyricbaseline.txt', 'w', encoding='utf-8') as file:
    for line in trochaic_tetrameter_catalectic:
        file.write(f'<l n="{line[0]}">{line[1]}</l>\n')

[['Acharnenses204', '[Τῇ]{δε} [πᾶ]{#ς ἕ}[που], {δί}[ω]{#κε} [καὶ] {τὸ}[ν ἄν]{#δρα} [πυν]{θά}[νου] '], ['Acharnenses205', '[τῶν] {ὁ}[δοι]{#πό}[ρων] {ἁ}[πάν][#των]· [τῇ] {πό}[λει] {#γὰρ} [ἄξ]{ι}[ον] '], ['Acharnenses206', '[ξυλ]{λα}[βεῖν] {#τὸν} [ἄν]{δρα} [τοῦ]{#τον}. [Ἀλ]{λά} [μοι] [#μη][νύ]{σα}{τε}, '], ['Acharnenses207', "[εἴ] {τις} [οἶ]{#δ' ὅ}[ποι] {τέ}[τραπ][#ται] [γῆς] {ὁ} [τὰς] [#σπον][δὰς] {φέ}[ρων]. "], ['Acharnenses219', "[Νῦν] {δ' ἐ}[πει][#δὴ] [στερ]{ρὸ}[ν ἤ][#δη] [τοὐ]{μὸ}[ν ἀν][#τικ][νή]{μι}[ον] "], ['Acharnenses220', '[καὶ] {πα}[λαι][#ῷ] [Λακ]{ρα}[τεί][#δῃ] [τὸ σ]{κέ}[λος] {#βα}[ρύ]{νε}[ται], '], ['Acharnenses221', '[οἴ]{χε}[ται]. [#Δι][ωκ]{τέ}[ος] {#δέ}· [μὴ] {γὰ}[ρ ἐγ]{#χά}[νοι] {πο}{τὲ} '], ['Acharnenses222', '[μη]{δέ} [περ] {#γέ}[ρον]{τα}[ς ὄν]{#τα}[ς ἐκ]{φυ}[γὼ]{#ν Ἀ}[χαρ]{νέ}[ας], '], ['Acharnenses284', '[Ἡ]{ρά}[κλεις] [#του][τὶ] {τί} [ἐσ]{#τι}; [τὴν] {χύ}[τραν] [#συν][τρίψ]{ε}{τε}. '], ['Acharnenses286-287', '[Ἀν]{τὶ} [ποί][#ας] [αἰ]{τί}[ας], [#ὦ][χαρ]{νέ}[ων] {#γε}[

In [None]:
from comp import compatibility_corpus, compatibility_play, compatibility_ratios_to_stats

all_sets = compatibility_corpus('compiled/')
total_comp = compatibility_ratios_to_stats(all_sets)
print(all_sets)
print(f'Total compatibility: {total_comp}')

lyric_baseline = compatibility_play('scan/lyricbaseline.txt')
comp_lyric_baseline = compatibility_ratios_to_stats([lyric_baseline])
print(f'Lyric baseline compatibility: {comp_lyric_baseline}')

[[[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, 1.0, 1.0, 0.5, 0.5, 1.0, 1.0, 0.5, 1.0], [0.5, 0.5, 0.5, 0.5, 1.0, 0.5, 0.5, 1.0, 1.0, 1.0], [0.5, 1.0, 0.5, 1.0, 1.0, 1.0, 0.5, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 0.5, 1.0, 1.0]], [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, 0.5, 1.0, 0.5, 0.5, 0.5, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [0.5, 1.0, 1.0, 1.0, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 1.0, 0.5, 1.0, 0.5, 1.0], [1.0, 1.0, 0.5, 0.5, 0.5, 0.5, 1.0], [0.5, 1.0, 0.5, 0.5, 0.5, 0.5, 1.0, 1.0, 0.5, 0.5, 0.5, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 0.5, 1.0, 0.5, 1.0, 0.5, 1.0, 1.0, 1.0, 0.5, 0.5, 1.0], [1.0, 0.5, 1.0, 0.5, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, 1.0, 1.0, 1.0], [0.5, 1.0, 1.0, 0.5, 1.0, 1.0, 0.5, 1.0, 0.5, 1.0, 1.0, 1.0, 0.5, 1.0, 1.0, 1.0], [0.5, 0.5, 1.0, 1.0, 1.0, 0.5, 0.5, 1.0, 1.0, 1.0, 0.5, 1.0], [0.5, 0.5, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, 0.5, 1.0, 1.0, 1.0]], [[0.5, 0.5, 1.0, 0.5, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 