<img align="right" src="images/tf-small.png" width="90"/>
<img align="right" src="images/etcbc.png" width="100"/>


# Statistics of a Coreference-annotated Corpus for Biblical Hebrew

## 1. Introduction

This notebook contains several functions that offer descriptive statistics of the corpus of texts that have been annotated for coreference:

* Genesis 1
* Numbers 
* Isaiah 42
* Psalms 

Genesis 1, Isaiah 42 and the whole book of Psalms have been annotated by me (Christiaan), and the whole book of Numbers has been annotated by Gyusang Jin. 

The statistics for the Psalms are part of my dissertation and are generated with code in `analyse.py` and shown in `Pandas` data frames. The Pandas data frames can be exported as a LateX table with the function `ExportToLatex()`. The fuction takes as arguments: the output location on your pc `OUTPUT_LOC`, the name of the LaTeX table in string from, e.g. `overall_coref_ann`, the name of the data frame as generated in this NB, e.g. `overall_df`, and if an index is needed specify: `indx = True`, otherwise `false`. 

## 2. Load modules

In [1]:
%load_ext autoreload
%autoreload 2

import os
from export_utils import ExportToLatex
from analyse import (ParseAnnotations, 
                     MakePandasTables, 
                     PrintThisTable
                    )

To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/
To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/
To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/
To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/
To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/
To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API


## 3. Specify output location

In [2]:
OUTPUT_LOC = os.path.expanduser('~/Documents/PhD/1-dissertation/DISSERTATIONlatex/Tables/')

## 4. Specify corpus

In [3]:
my_book_name = 'Psalms'
from_chapter = 1
to_chapter = 150

## 5. Run code

In [4]:
mentions, corefs, suffix_errors, reconsider_rpt = ParseAnnotations(my_book_name, from_chapter, to_chapter)

In [5]:
overall_df, pos_df, \
pronoun_df, pronoun_pos_class_df, \
pronoun_pos_sing_df = MakePandasTables(corefs, mentions)

## 6. Print tables 

In [6]:
PrintThisTable(overall_df)

Unnamed: 0,total
mentions,18570
singletons,4789
classes,2000
notes,715


In [7]:
#print(overall_df.to_latex(index=True))

ExportToLatex(OUTPUT_LOC, 'overall_coref_ann-2', overall_df, indx = True)

In [8]:
PrintThisTable(pos_df)

Unnamed: 0,NP,VP,Sffx,PrNP,DPrP,PPrP,PtcP,AdjP,CP,AdvP,PP,prep,advb,art,total_type
in class,3087,4982,4569,795,31,287,16,2,7,3,2,0,0,0,13781
singleton,4405,97,40,164,15,2,1,24,3,18,0,14,5,1,4789
total,7492,5079,4609,959,46,289,17,26,10,21,2,14,5,1,18570
% total,40,27,25,5,0,2,0,0,0,0,0,0,0,0,100
first in chain,1001,702,142,122,13,12,4,2,1,1,0,0,0,0,2000
% chain,50,35,7,6,1,1,0,0,0,0,0,0,0,0,100


In [9]:
#print(pos_df.to_latex(index=True))

ExportToLatex(OUTPUT_LOC, 'pos_coref_ann-2', pos_df, indx = True)

In [10]:
PrintThisTable(pronoun_df)

Unnamed: 0,p1upl,p1usg,p2fsg,p2mpl,p2msg,p3fpl,p3fsg,p3mpl,p3msg,p3upl,ufpl,ufsg,umpl,umsg,uuu,total_pgn
in class,332,2415,29,282,2282,21,344,1172,2089,386,5,21,91,284,80,9833
singleton,11,13,0,10,16,0,3,8,34,6,0,1,9,8,13,132
total,343,2428,29,292,2298,21,347,1180,2123,392,5,22,100,292,93,9965
% total,3,24,0,3,23,0,3,12,21,4,0,0,1,3,1,100


In [11]:
#print(pronoun_df.to_latex(index=True))

ExportToLatex(OUTPUT_LOC, 'pronoun_coref_ann', pronoun_df, indx = True)

In [12]:
PrintThisTable(pronoun_pos_class_df)

Unnamed: 0,p1upl,p1usg,p2fsg,p2mpl,p2msg,p3fpl,p3fsg,p3mpl,p3msg,p3upl,ufpl,ufsg,umpl,umsg,uuu,total_pgn
VP,91,768,29,254,998,20,234,589,1131,386,5,21,91,284,80,4981
Sffx,233,1565,0,28,1167,0,104,559,909,0,0,0,0,0,0,4565
PPrP,8,82,0,0,117,1,6,24,49,0,0,0,0,0,0,287
total,332,2415,29,282,2282,21,344,1172,2089,386,5,21,91,284,80,9833
% total,3,25,0,3,23,0,3,12,21,4,0,0,1,3,1,100


In [13]:
#print(pronoun_pos_class_df.to_latex(index=True))

ExportToLatex(OUTPUT_LOC, 'pronoun_pos_class_ann', pronoun_pos_class_df, indx = True)

In [14]:
PrintThisTable(pronoun_pos_sing_df)

Unnamed: 0,p1upl,p1usg,p2mpl,p2msg,p3fsg,p3mpl,p3msg,p3upl,ufsg,umpl,umsg,uuu,total_pgn
VP,0,5,9,12,1,4,28,6,1,9,8,13,96
Sffx,11,7,1,4,2,3,6,0,0,0,0,0,34
PPrP,0,1,0,0,0,1,0,0,0,0,0,0,2
total,11,13,10,16,3,8,34,6,1,9,8,13,132
% total,8,10,8,12,2,6,26,5,1,7,6,10,100


In [15]:
#print(pronoun_pos_sing_df.to_latex(index=True))

ExportToLatex(OUTPUT_LOC, 'pronoun_pos_sing_ann', pronoun_pos_sing_df, indx = True)