# Crasis (N1904-TF)

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Analysis</a>
  * <a href="#bullet3x1">3.1 - Identify all occurences of crasis</a>
  * <a href="#bullet3x2">3.2 - Unique crasis lemma's</a>
  * <a href="#bullet3x3">3.3 - Unique crasis wordforms</a>
  * <a href="#bullet3x4">3.4 - Distribution of crasis among the books</a>
  * <a href="#bullet3x5">3.5 - Visualizing crasis distribution among the books</a>
  * <a href="#bullet3x6">3.6 - Syntactic roles of crasis occurances per book</a>
  * <a href="#bullet3x7">3.7 - Exploring the clausal context</a>
* <a href="#bullet4">4 - Attribution and footnotes</a>
* <a href="#bullet5">5 - Required libraries</a>
* <a href="#bullet6">6 - Notebook and environment details</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

Crasis (from the Greek κρᾶσις, meaning "mixing" or "blending") is a linguistic phenomenon in Ancient Greek where two adjacent vowels, typically at the junction of two words, combine into a single syllable. This occurs most commonly when a word ending in a vowel is followed by another word beginning with a vowel. Example: καὶ ἐγώ ("and I") becomes κἀγώ in crasis. Crasis does not alter the grammatical or syntactical meaning of the combined words.

Information about occurrences of crasis is not included in the 'standard' N1904-TF dataset. However, these occurrences can be identified using the additional feature [bol_suffix](https://github.com/CenterBLC/N1904/blob/main/docs/additions/bol_suffix.md). According to the feature's documentation, there are 146 instances of crasis recorded in the Greek New Testament.

# 2 - Load app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

Since our analysis depends on a feature available only as an optional module, we will include this additional module by specifying the `mod=` parameter. See also [here](https://centerblc.github.io/N1904/additions/#start) for a description on these add-on features.

In [3]:
# load the N1904 app and data
N1904 = use ("CenterBLC/N1904", version="1.0.0", mod="CenterBLC/N1904/BOLcomplement/tf/", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes

In [4]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

# 3 - Analysis <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

## 3.1 - Identify all occurences of crasis <a class="anchor" id="bullet3x1"></a>

Using a query based on the `bol_suffix` feature, we will identify all occurences of crasis and display the syntaxtree for the first instance, κἀγώ in Matthew 2:8.

In [5]:
crasisQuery = '''
word bol_suffix=crasis
'''

crasisResults = N1904.search(crasisQuery)

  0.09s 146 results


In [6]:
N1904.show(crasisResults,end=1,extraFeatures={'bol_suffix'})

## 3.2 - Unique crasis lemma's <a class="anchor" id="bullet3x2"></a>

In [7]:
uniqueLemmas = set()

for node, in crasisResults:
    uniqueLemmas.add(F.lemma.v(node))

In [8]:
print (uniqueLemmas)

{'κἀκεῖ', 'κἀγώ', 'τὸ ἐναντίον', 'κἀκεῖθεν', 'τὸ ὄνομα', 'κἄν', 'κἀκεῖνος'}


## 3.3 - Unique crasis wordforms <a class="anchor" id="bullet3x2"></a>

In [9]:
uniqueWords = set()

for node, in crasisResults:
    uniqueWords.add(F.text.v(node))

In [10]:
print (uniqueWords)

{'κἀκεῖνοι', 'κἂν', 'κἀγώ', 'τοὔνομα', 'κἀγὼ', 'κἀκεῖνός', 'κἀκείνους', 'κἀκεῖ', 'κἀκεῖνα', 'Κἀγώ', 'Κἀμὲ', 'Κἂν', 'Κἀγὼ', 'τοὐναντίον', 'κἀκεῖνος', 'κἀκεῖνον', 'Κἀκεῖθεν', 'κἀμοί', 'κἀμοὶ', 'κἀμὲ', 'κἀκεῖθεν', 'Κἀκεῖ'}


## 3.4 - Distribution of crasis among the books<a class="anchor" id="bullet3x4"></a>

In [11]:
from collections import Counter
# set up counter to store results
crasisCounts = Counter()

# Count the occurrences of each unique combination
for node, in crasisResults:
    book, chapter, versenumber = T.sectionFromNode(node)
    combination = f"{book}"
    crasisCounts[combination] += 1

# Print the combinations and their frequencies
for combination, count in crasisCounts.most_common(): # most_common() returns sorted items
    print(f"{count:2}x {combination}")
print (f'\n{len(crasisCounts)} books with crasis and {27-len(crasisCounts)} without.')

40x John
21x Acts
17x Matthew
14x Luke
11x I_Corinthians
11x II_Corinthians
 9x Mark
 5x Revelation
 3x Romans
 3x Galatians
 3x Hebrews
 3x James
 2x Philippians
 1x Ephesians
 1x I_Thessalonians
 1x II_Timothy
 1x I_Peter

17 books with crasis and 10 without.


To generate a well-structured table with frequency data, the `Counter` class from the `collections` module will be utilized.

In [12]:
from collections import Counter, defaultdict
import json

# Set up a dictionary to store results grouped by book
crasisData = defaultdict(lambda: {"total": 0, "types": Counter()})

# Count the occurrences of each Crasis
for node, in crasisResults:
    book, chapter, versenumber = T.sectionFromNode(node)
    wordform = F.text.v(node)
    # Update total count for the book
    crasisData[book]["total"] += 1
    # Count occurrences of the specific wordform (Crasis type)
    crasisData[book]["types"][wordform] += 1

# Convert defaultdict to a regular dict and Counter to a dictionary for JSON compatibility
jsonData = {
    book: {"total": data["total"], "types": dict(data["types"])}
    for book, data in crasisData.items()
}

# Convert to JSON format
jsonOutput = json.dumps(jsonData, indent=4, ensure_ascii=False)

To print or save the JSON output, you can use:

```python
print(jsonOutput)
```

## 3.5 - Visualizing crasis distribution among the books <a class="anchor" id="bullet3x5"></a>

To visualize the results, we will create a pie chart using Bokeh, where each segment represents a book and the hover functionality displays details (wordforms and their frequencies). This will take the JSON data created in the previous executable cell, jsonOutput.

In [13]:
# to enable Bokeh for Jupyter Notebook
from bokeh.io import output_notebook
output_notebook()

In [14]:
from bokeh.io import show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure
from bokeh.transform import cumsum
from math import pi
import pandas as pd
from matplotlib import cm
from matplotlib.colors import to_hex

# Generate 17 distinct colors from the 'tab20' colormap (for each book with a crasis)
num_colors = 17
colors = [to_hex(cm.tab20(i / num_colors)) for i in range(num_colors)]


# Prepare data for the pie chart
books = []
totals = []
details = []

for book, data in jsonData.items():
    books.append(book)
    totals.append(data["total"])
    wordforms = "\n".join([f"{wordform}: {count}" for wordform, count in data["types"].items()])
    details.append(wordforms)

# Create a DataFrame for the pie chart
df = pd.DataFrame({"book": books, "total": totals, "details": details})
df["angle"] = df["total"] / df["total"].sum() * 2 * pi
# Assign the colors to the DataFrame
df["color"] = colors[:len(df)]

# Create a ColumnDataSource
source = ColumnDataSource(df)

# Create the pie chart figure
p = figure(
    title="Crasis distribution by GNT book (Nestle 1904)",
    height=500,
    width=800,
    toolbar_location=None,
    tools="hover",
    tooltips="@book: @total occurrences\ndetails:\n@details",
    x_range=(-0.5, 1),
)

# Add the pie wedges
p.wedge(
    x=0, y=1,
    radius=0.4,
    start_angle=cumsum("angle", include_zero=True),
    end_angle=cumsum("angle"),
    line_color="white",
    fill_color="color",
    legend_field="book",
    source=source,
)

# Style the plot
p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
p.legend.title = "Books"

# Show the plot
show(p)

After executing the previous cell which displays the piechart, executing the following cell will create a download button allowing the  chart to be downloaded as an interactive HTML file for offline usage.

In [15]:
from IPython.display import HTML
import base64  # used to encode the data to be downloaded
from bokeh.embed import file_html
from bokeh.resources import CDN

def createDownloadLink(htmlContent, fileName, documentTitle, buttonText):

    # Convert plot to HTML string
    htmlContent = file_html(htmlContent, CDN, documentTitle)
    
    # Encode the HTML content to base64
    b64Html = base64.b64encode(htmlContent.encode()).decode()
    
    # Create the HTML download link
    downloadLink = f'''
    <a download="{fileName}" href="data:text/html;base64,{b64Html}">
        <button>{buttonText}</button>
    </a>
    '''
    return HTML(downloadLink)

# Display the download link in the notebook 
createDownloadLink(p, 'crasis_use_per_book.html', 'Interactive map cresis use per GNT book', 'Download piechart')

## 3.6 - Syntactic roles of crasis occurances per book<a class="anchor" id="bullet3x6"></a>

This section examines the use of crasis, analyzing its distribution across different syntactic roles, summarized by book.

In [16]:
from collections import Counter, defaultdict
import json

# Set up a dictionary to store results grouped by syntactic role
syntacticData = defaultdict(lambda: {"total": 0, "roles": Counter()})

# Count the occurrences of each Crasis
for node, in crasisResults:
    book, chapter, versenumber = T.sectionFromNode(node)  # Extract book info
    function = F.function.v(node)  # Extract syntactic role
    # Update total count for the book
    syntacticData[book]["total"] += 1
    # Count occurrences of the specific grammatical function
    syntacticData[book]["roles"][function] += 1

# Convert defaultdict to a regular dict and Counter to a dictionary for JSON compatibility
rolesData = {
    book: {"total": data["total"], "roles": dict(data["roles"])}
    for book, data in syntacticData.items()
}

Note that the just created dictionairy `rolesOutput` can also be dumped as JSON for better readability:
```python
rolesOutput = json.dumps(rolesData, indent=4, ensure_ascii=False)
print(rolesOutput)
```
In the next cell we will generate a nice table using the dictionairy data created in previous cell.

In [17]:
import pandas as pd
import json

# Prepare data for the table
tableData = []
allRoles = set()

# Prepare data for the table
tableData = []
allRoles = set()

# Flatten data for the table
for book, data in rolesData.items():
    row = {"book": book, "# of occ": data["total"]}
    # Add role counts
    for role, count in data["roles"].items():
        row[role] = count
        allRoles.add(role)
    tableData.append(row)

# Create a DataFrame
df = pd.DataFrame(tableData)

# Replace NaN with 0 for missing roles
df = df.fillna(0)

# Ensure all possible roles are included as columns
for role in allRoles:
    if role not in df.columns:
        df[role] = 0

# Calculate percentages for each role per book
for role in allRoles:
    df[role] = df[role] / df["# of occ"] * 100

# Add grand total row
totalOccurrences = df["# of occ"].sum()
grandTotalRow = {
    "book": "Grand Total",
    "# of occ": totalOccurrences,
}

# Calculate the grand total percentages
for role in allRoles:
    totalRoleOccurrences = sum(
        (df["# of occ"] * df[role] / 100).replace("", 0).astype(float)
    )
    grandTotalRow[role] = (totalRoleOccurrences / totalOccurrences) * 100

# Append the grand total row
df = pd.concat([df, pd.DataFrame([grandTotalRow])], ignore_index=True)

# Format percentages and suppress 0.0%
for role in allRoles:
    df[role] = df[role].apply(lambda x: f"{x:.1f}%" if x > 0 else "")

# Display the table
print(df)

               book  # of occ    Subj     Adv   None   Objc  Cmpl
0           Matthew        17   64.7%   17.6%  11.8%   5.9%      
1              Mark         9   22.2%   33.3%  22.2%  22.2%      
2              Luke        14   42.9%   14.3%  28.6%  14.3%      
3              John        40   77.5%    2.5%  10.0%   7.5%  2.5%
4              Acts        21   19.0%   57.1%   9.5%   4.8%  9.5%
5            Romans         3  100.0%                            
6     I_Corinthians        11   72.7%          18.2%         9.1%
7    II_Corinthians        11   81.8%          18.2%             
8         Galatians         3   66.7%   33.3%                    
9         Ephesians         1  100.0%                            
10      Philippians         2  100.0%                            
11  I_Thessalonians         1  100.0%                            
12       II_Timothy         1  100.0%                            
13          Hebrews         3   66.7%          33.3%             
14        

## 3.7 - Exploring the clausal context<a class="anchor" id="bullet3x7"></a>

The following outlines an attempt to programmatically identify the predicate associated with crasis. The core approach involves determining the parent clause of the Crasis, extracting the contained word nodes, and scanning for words labeled with `function=Pred`. The `@1` behind each (potential) predicate indicates its distance from the crasis.

In [18]:
for node, in crasisResults:
    book, chapter, verseNumber =T.sectionFromNode(node)
    location=f"{book} {chapter}:{verseNumber}"
    function=F.function.v(node)
    wordAndFunction=f"{F.text.v(node)} as {function}"
    parentClause=L.u(node,'clause')  # returns a tuple with one or more clause nodes
    if len(parentClause)!=0:
        clauseNodes=L.d(parentClause[0],'word')
        predicateList=''
        for clauseNode in clauseNodes:
            if F.function.v(clauseNode)=='Pred':
               predicateList+=f'{T.text(clauseNode).strip()}@{clauseNode-node} '
        print (f'{location:20} {wordAndFunction:20}  Related predicate(s):{predicateList}')
    else:
        print (f'{location:20} {wordAndFunction:20}')

Matthew 2:8          κἀγὼ as Subj          Related predicate(s):ἐλθὼν@1 προσκυνήσω@2 
Matthew 5:23         κἀκεῖ as Adv          Related predicate(s):μνησθῇς@1 ἔχει@6 
Matthew 10:11        κἀκεῖ as Adv          Related predicate(s):μείνατε@1 
Matthew 10:32        κἀγὼ as Subj          Related predicate(s):ὁμολογήσει@-7 ὁμολογήσω@-1 
Matthew 10:33        κἀγὼ as Subj          Related predicate(s):ἀρνήσομαι@-1 
Matthew 11:28        κἀγὼ as Subj          Related predicate(s):ἀναπαύσω@1 
Matthew 15:18        κἀκεῖνα as Subj       Related predicate(s):κοινοῖ@1 
Matthew 16:18        κἀγὼ as Subj          Related predicate(s):λέγω@3 οἰκοδομήσω@13 κατισχύσουσιν@21 
Matthew 18:33        κἀγὼ as Subj          Related predicate(s):ἠλέησα;@2 
Matthew 21:21        κἂν as None           Related predicate(s):εἴπητε@4 Ἄρθητι@5 βλήθητι@7 
Matthew 21:24        κἀγὼ as Subj          Related predicate(s):Ἐρωτήσω@-2 εἴπητέ@5 ἐρῶ@9 ποιῶ·@14 
Matthew 21:24        κἀγὼ as Subj          Related predicate(s):ἐρ

# 4 - Attribution and footnotes <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

Some reference material on crasis can be found in:

> A. T. Robertson, *A Grammar of the Greek New Testament in the Light of Historical Research* (Logos Bible Software, 2006), 208.

# 5 - Required libraries<a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:

    base64
    bokeh
    collections
    IPython
    json
    pandas
    requests

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 6 - Notebook and environment details<a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.0</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>14 January 2025</td>
    </tr>
  </table>
</div>

The following cell displays the active Anaconda environment along with a list of all installed packages and their versions within that environment.

In [19]:
import subprocess
from IPython.display import display, HTML

# Display the active conda environment
!conda env list | findstr "*"

# Run conda list and capture the output
condaListOutput = subprocess.check_output("conda list", shell=True).decode("utf-8")

# Wrap the output with <details> and <summary> HTML tags
htmlOutput = "<details><summary>Click to view installed packages</summary><pre>"
htmlOutput += condaListOutput
htmlOutput += "</pre></details>"

# Display the HTML in the notebook
display(HTML(htmlOutput))

Text-Fabric           *  C:\Users\tonyj\anaconda3\envs\Text-Fabric
