# Statistical Tests

Name: **Isaac Anderson**

Date: **16 Sept 2025**

## Hypothesis Tests

### Means
1. Identify 5 theologically salient lemmas from the top 20 lemmas (end of PS1) and compute per verse rates by book.
2. Are the per-verse rates between Synoptic Gospels and John the same for all 5 lemmas.
3. Report effect sizes using Cohen's d along with confidence intervals.
4. Correct these results for multiple tests using the Bonferroni correction.
5. Correct these results for multiple tests using the Benjamini-Hochberg correction.
6. Why are the numbers from the Bonferroni correction different from the Benjamini-Hochberg correction? Which results should you use and why?

### Proportion
7. Are the proportion of term presence vs term absence in verse the same between Synoptic Gospels and John for all 5 lemmas?
8. Report effect sizes using Cohen's d along with confidence intervals.
9. Correct these results for multiple tests using the Bonferroni correction.
10. Correct these results for multiple tests using the Benjamini-Hochberg correction.
6. Why are the numbers from the Bonferroni correction different from the Benjamini-Hochberg correction? Which results should you use and why?

### ANOVA
12. Do a term rate ANOVA comparison on a theologically salient lemma from the top 20 lemmas across the following groups: {Matthew and Mark, Luke and Acts, Johannine books, Pauline books, all other books}
13. Perform a Post-hoc Tukey HSD to see which pairs differ

### Chi-square Test
14. Build a 2xk contingency table for collocation: presence of target term vs top-k companion terms within verse.
15. Run Chi-square tests of independence; compute standardized residuals to find surprising co-occurrences
16. Visualize a heatmap of residuals
17. Explain results

18. Repeat the ANOVA and Chi-square test sections for 2 other terms.

## Power Analysis Extension

18. Conduct a power analysis for the question in #7 to determine how many verses are needed to detect the specified effect.
13. Simulate the answer.
    <ol type="a">
    <li>Simulate n verses with the proportions found in #8 for Synoptic Gospels and John. Do a test to see if significantly different and save this result.</li>
    <li>Repeat step a many times.</li>
    <li>Calculate power: power = (number of significant findings) / (total simulations)</li>
    <li>Repeat steps a-c for a different n.</li>
    <li>Plot power as a function of number of verses.</li>
    <li>At what number of verses is the power at least 80%? i.e. This is the number of verses you need per text to have an 80% chance of detecting the effect if it's really there.</li>
    </ol>

## Sliding Windows Stretch
21. Repeat ANOVA and Chi-square sections on the same three theological terms, but instead of using verse boundaries, use sliding windows (+/- n tokens)
22. How do results of verse boundaries compare to using sliding windows?

## **Problem 1**: Identify 5 theologically salient lemmas from the top 20 lemmas (end of PS1) and compute per verse rates by book.

In [1]:
import pandas as pd

new_testament = pd.read_pickle('../Data/nt.pickle')
top20 = pd.read_pickle("../Data/zipf-top20.pickle")
strongs_dictionary = pd.read_pickle("./pickles/ps1/strongs_dictionary")

In [2]:
print(strongs_dictionary.columns)

Index(['greek-word', 'kjv-def'], dtype='object')


##### Finding "Salient" top 20 words (I.E Verbs/Nouns/Adjectives/Adverbs)

In [3]:
new_testament = pd.merge(
    strongs_dictionary,
    new_testament, 
    left_index=True, 
    right_on="strong"
)


        greek-word                kjv-def            word       rmac  strong  \
73673   ἀγαθοεργέω                do good     ἀγαθουργῶν,  V-PAP-NSM      14   
112892  ἀγαθοεργέω                do good    ἀγαθοεργεῖν,      V-PAN      14   
33810   ἀγαθοποιέω  (when) do good (well)    ἀγαθοποιῆσαι      V-AAN      15   
34204   ἀγαθοποιέω  (when) do good (well)     ἀγαθοποιῆτε   V-PAS-2P      15   
34206   ἀγαθοποιέω  (when) do good (well)  ἀγαθοποιοῦντας  V-PAP-APM      15   
...            ...                    ...             ...        ...     ...   
117258       ὥσπερ        (even, like) as           ὥσπερ        ADV    5618   
118015       ὥσπερ        (even, like) as           ὥσπερ        ADV    5618   
120919       ὥσπερ        (even, like) as           ὥσπερ        ADV    5618   
131632       ὥσπερ        (even, like) as           ὥσπερ        ADV    5618   
95679      ὡσπερεί                     as         ὡσπερεὶ        ADV    5619   

        book  chapter  verse  \
73673  

In [17]:
salient_nt['kjv-def'].value_counts()

kjv-def
Jesus                      3454
 God, Lord, master, Sir    1408
child, foal, son            231
father, parent              209
Name: count, dtype: int64

In [None]:
freq_strong_ids = salient_nt['strong'].value_counts().nlargest(20).index

salient_top_20 = salient_nt[salient_nt['strong'].isin(freq_strong_ids)]
salient_top_20.head()

Index([2424, 2962, 5207, 3962], dtype='int64', name='strong')


Unnamed: 0,greek-word,kjv-def,word,rmac,strong,book,chapter,verse,parsed_rmac,strong_definition,part_speech
242,Ἰησοῦς,Jesus,Ἰησοῦς,N-NSM,2424,40,1,16,Noun - Nominative - Singular - Masculine,Ἰησοῦς of Hebrew origin (יְהוֹשׁ֫וּעַ); Jesus (i....,Noun
1129,Ἰησοῦς,Jesus,Ἰησοῦς,N-NSM,2424,40,3,13,Noun - Nominative - Singular - Masculine,Ἰησοῦς of Hebrew origin (יְהוֹשׁ֫וּעַ); Jesus (i....,Noun
1162,Ἰησοῦς,Jesus,Ἰησοῦς,N-NSM,2424,40,3,15,Noun - Nominative - Singular - Masculine,Ἰησοῦς of Hebrew origin (יְהוֹשׁ֫וּעַ); Jesus (i....,Noun
1182,Ἰησοῦς,Jesus,Ἰησοῦς,N-NSM,2424,40,3,16,Noun - Nominative - Singular - Masculine,Ἰησοῦς of Hebrew origin (יְהוֹשׁ֫וּעַ); Jesus (i....,Noun
1222,Ἰησοῦς,Jesus,Ἰησοῦς,N-NSM,2424,40,4,1,Noun - Nominative - Singular - Masculine,Ἰησοῦς of Hebrew origin (יְהוֹשׁ֫וּעַ); Jesus (i....,Noun


1. κύριος from kuros (supremacy); supreme in authority
2. υἱός apparently a primary word; a "son"
3. Ἰησοῦς of Hebrew origin (יְהוֹשׁ֫וּעַ); Jesus (i.e. Jehoshua)
4. εἰμί (I exist) Used for when being emphatic. 
5. πατήρ apparently a primary word; a "father"

##### Finding the "per verse" rates by book.

In [6]:
salient_words = ["κύριος", "υἱός", "Ἰησοῦς", "εἰμί", "πατήρ"]
salient_nt = new_testament[new_testament['word'].isin(salient_words)]
print(salient_nt['word'].value_counts())
# def per_verse(group):


# Collect occurences of words per book
words_in_each_gospel = salient_nt.pivot_table(
    index= "book",
    columns = "word",
    aggfunc = "size",
)

# Collect total number of verses per book
verses_per_chapter = salient_nt.groupby(['book','chapter'])['verse'].max()
verses_per_book = verses_per_chapter.groupby('book').sum()  


per_verse_df = pd.merge(verses_per_book, words_in_each_gospel, on="book")
per_verse_df.fillna(0, inplace=True)
per_verse_df = per_verse_df[["κύριος","πατήρ", "υἱός", "Ἰησοῦς"]].div(per_verse_df['verse'], axis=0)
print(per_verse_df)




word
Ἰησοῦς    3454
κύριος    1408
υἱός       231
πατήρ      209
Name: count, dtype: int64
        κύριος     πατήρ      υἱός    Ἰησοῦς
book                                        
40    0.256477  0.085492  0.042746  1.211140
41    0.205607  0.000000  0.051402  1.207944
42    0.319226  0.039903  0.106409  0.571947
43    0.015647  0.156472  0.046942  1.674253
44    0.611111  0.000000  0.038194  0.305556
45    0.632184  0.000000  0.000000  0.126437
46    0.942857  0.000000  0.000000  0.314286
47    0.670732  0.000000  0.000000  0.268293
48    5.500000  0.000000  0.000000  5.500000
50    1.375000  0.000000  0.000000  0.687500
51    0.000000  0.000000  0.000000  1.000000
52    1.571429  0.000000  0.000000  0.785714
53    1.718750  0.000000  0.000000  0.687500
54    0.354839  0.000000  0.000000  0.709677
55    1.491525  0.000000  0.000000  0.000000
58    0.578947  0.000000  0.289474  0.289474
59    0.846154  0.000000  0.000000  0.000000
60    0.000000  0.000000  0.846154  0.000000
61    0.9

## Problem 2

In [7]:
synoptics = new_testament.query("book in [40,41,42]")
john = new_testament[new_testament["book"] == 43]

display(synoptics.head())
display(john.head())

Unnamed: 0,greek-word,kjv-def,word,rmac,strong,book,chapter,verse,parsed_rmac,strong_definition,part_speech
29557,Ἀαρών,Aaron,"Ἀαρών,",N-PRI,2,42,1,5,Indeclinable proper noun,"Ἀαρών of Hebrew origin (אַהֲרוֹן); Aaron, the br...",Indeclinable
27923,Ἀββᾶ,Abba,ἀββᾶ,N-PRI,5,41,14,36,Indeclinable proper noun,Ἀββᾶ of Chaldee origin (אַב); father as a vocat...,Indeclinable
14038,Ἄβελ,Abel,Ἅβελ,N-PRI,6,40,23,35,Indeclinable proper noun,"Ἄβελ of Hebrew origin (הָ֫בֶל); Abel, the son ...",Indeclinable
39419,Ἄβελ,Abel,Ἄβελ,N-PRI,6,42,11,51,Indeclinable proper noun,"Ἄβελ of Hebrew origin (הָ֫בֶל); Abel, the son ...",Indeclinable
108,Ἀβιά,Abia,"Ἀβιά,",N-PRI,7,40,1,7,Indeclinable proper noun,"Ἀβιά of Hebrew origin (אֲבִיָּה); Abijah, the n...",Indeclinable


Unnamed: 0,greek-word,kjv-def,word,rmac,strong,book,chapter,verse,parsed_rmac,strong_definition,part_speech
55199,Ἀβραάμ,Abraham,Ἀβραάμ,N-PRI,11,43,8,33,Indeclinable proper noun,"Ἀβραάμ of Hebrew origin (אַבְרָהָם); Abraham, t...",Indeclinable
55258,Ἀβραάμ,Abraham,Ἀβραάμ,N-PRI,11,43,8,37,Indeclinable proper noun,"Ἀβραάμ of Hebrew origin (אַבְרָהָם); Abraham, t...",Indeclinable
55296,Ἀβραάμ,Abraham,Ἀβραάμ,N-PRI,11,43,8,39,Indeclinable proper noun,"Ἀβραάμ of Hebrew origin (אַבְרָהָם); Abraham, t...",Indeclinable
55305,Ἀβραάμ,Abraham,Ἀβραάμ,N-PRI,11,43,8,39,Indeclinable proper noun,"Ἀβραάμ of Hebrew origin (אַבְרָהָם); Abraham, t...",Indeclinable
55310,Ἀβραάμ,Abraham,Ἀβραὰμ,N-PRI,11,43,8,39,Indeclinable proper noun,"Ἀβραάμ of Hebrew origin (אַבְרָהָם); Abraham, t...",Indeclinable
