<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-xsmall.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="left" src="images/VU-ETCBC-xsmall.png"/></a>
<a href="http://tla.mpi.nl" target="_blank"><img align="right" src="images/TLA-xsmall.png"/></a>
<a href="http://www.dans.knaw.nl" target="_blank"><img align="right"src="images/DANS-xsmall.png"/></a>

# Clause and Phrase Typology

The Hebrew text database divides the text material in sentences, clauses and phrases.

On top of this plain segmentation there is an elaborate set of distinctions between clauses and between phrases.
We explore the features that express these distinctions. They are:

**clause_constituent_relation** indicates what function the clause has as *constituent* in the sentence in which occurs. It could function as a subject, object, and much more. The maning is quite involved.

**clause_type** indicates the internal structure of the clause.

**phrase_type** indicates the syntactic function of the phrase.

**phrase_function** gives information about the internal composition of phrase.

In this notebook we explore these features. 
We inspect the values they can take, describe the meanings of those values, and collect examples.

Constructing the examples involves

* finding the sentence nodes above phrases and clauses
* finding the words for phrases, clauses and sentences
* providing verse labels.

So it is a good example of how to get data from three different natures combined: words, linguistic objects, and sections (verses).

In [1]:
import sys
import collections
from laf.fabric import LafFabric
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.4.3
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html



In [13]:
API = fabric.load('etcbc4', '--', 'clauses_phrase_types', {
    "xmlids": {"node": False, "edge": False},
    "features":
    ('''
        otype label g_cons
        typ function
        rela
        domain txt
    ''','''
        functional_parent
        distributional_parent
        mother
    '''),
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s DETAIL: COMPILING m: UP TO DATE
  0.00s INFO: USING DATA COMPILED AT: 2014-07-23T09-31-37
  0.00s DETAIL: COMPILING a: UP TO DATE
  0.01s DETAIL: keep main: G.node_anchor_min
  0.01s DETAIL: keep main: G.node_anchor_max
  0.01s DETAIL: keep main: G.node_sort
  0.01s DETAIL: keep main: G.node_sort_inv
  0.01s DETAIL: keep main: G.edges_from
  0.01s DETAIL: keep main: G.edges_to
  0.01s DETAIL: keep main: F.etcbc4_db_otype [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_domain [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_function [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_g_cons [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_rela [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_txt [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_typ [node] 
  0.02s DETAIL: keep main: F.etcbc4_sft_label [node] 
  0.02s DETAIL: keep main: F.etcbc4_ft_functional_parent [e] 
  0.02s DETAIL: keep main: F.etcbc4_ft_mother [e] 
  0.02s DETAIL: keep main: C.etcbc4_

# Trees

We need trees in order to connect clauses to sentences and words to clauses and sentences.
Our trees are based on the *parents* edges.

We start with the sentence objects.

In [14]:
msg("Finding top nodes ... ")
top_nodes = set(NN(test=F.otype.v, value='sentence'))
msg("Top nodes found: {}".format(len(top_nodes)))

  7.35s Finding top nodes ... 
  8.38s Top nodes found: 66045


Now we are going to *walk* the trees, from sentence node downwards to all the children and so on.

We are not going to cnstruct those trees. We only need to collect certain kinds of information while we go:

**to_sentence[node] = top_node**
For each node we encounter we store the top node of which it is a descendant.

**has_words[sentence or clause or phrase] = set of object's word nodes**
For each sentence, clause and phrase we encounter we collect the word nodes that are descendants of that clause.

**sentence_verse[sentence] = verse label of sentence** The passage thesentence belongs to.

**clauses_of[sentence] = set of sentence's clause nodes** The 'immediate' children of type clause of a sentence. We need this later when we want to check how many clauses coincide with their sentence. 'Immediate' means: with only a node of type *sentence_atom* in between.

**N.B. 1** We gather word *nodes*, not words themselves. Multiple occurrences of the same word are still distinct word nodes,
but we loose the ordering of the words.
However, the numeric ordering of the nodes corresponds with the textual ordering of the words.
This is an undocumented property. 
Alternatively, we could have extracted the *monad* number of the owrds, and sorted on that. The monad numbers are guaranteed to correspond with the textual order.

**N.B. 2** In practice, the *parents* edges are not a pure tree, but a directed acyclic graph. So we will visit some words multiple times. The algorithm below detects that and prevents multiple visits. That is the function of **nodes_seen**.

In [15]:
nodes_seen = set()
to_sentence = {}
has_words = collections.defaultdict(lambda: set())
sentence_verse = {}
clauses_of = collections.defaultdict(lambda: set())
verse_label = None

# we add extra parameters to the walk function: 
#    sentence is the top node above node
#    clause, phrase is the clause, phrase node above node

def walk_tree(node, sentence, clause, phrase):
    if node in nodes_seen:
        return
    
    nodes_seen.add(node)
    to_sentence[node] = sentence
    new_clause = clause
    new_phrase = phrase

    otype = F.otype.v(node)
    if otype == 'clause':
        new_clause = node
    elif otype == 'phrase':
        new_phrase = node
    elif otype == 'word':
        for parent in (sentence, clause, phrase):
            has_words[parent].add(node)
    
    children = Ci.distributional_parent.v(node)
    for child in children:
        if F.otype.v(node) == 'sentence_atom' and F.otype.v(child) == 'clause':
            clauses_of[sentence].add(child)
        walk_tree(child, sentence, new_clause, new_phrase)

# we count the trees we visit, mainly to give progress messages        
s = 0
sc = 0
chunk = 10000 # we only issue a message every chunk many trees

# Walk through all top nodes and verse nodes in document order

for node in NN(test=lambda n: n in top_nodes or F.otype.v(n) == 'verse', value=True):
    if F.otype.v(node) == 'verse':
        verse_label = F.label.v(node)
        continue
    sentence_verse[node] = verse_label
    nodes_seen = set()
    walk_tree(node, node, None, None)
    s += 1
    sc += 1
    if sc == chunk:
        msg("{} trees visited".format(s))
        sc = 0
    
msg("{} trees visited".format(s))

    20s 10000 trees visited
    20s 20000 trees visited
    20s 30000 trees visited
    21s 40000 trees visited
    21s 50000 trees visited
    21s 60000 trees visited
    21s 66045 trees visited


# Clause constituent relations

Let's have a look at the values that clause constituent relations can take.

We collect examples on the fly. By setting **nr_of_examples** below you can specify how much examples you want to generate per value.

In [5]:
nr_of_examples = 3
example_features = {
    'clause': ('rela', 'typ'),
    'phrase': ('typ', 'function'),
}

values = collections.defaultdict(lambda: collections.defaultdict(lambda: collections.defaultdict(int)))
examples = collections.defaultdict(lambda: collections.defaultdict(lambda: collections.defaultdict(lambda: set())))

for i in NN(test=F.otype.v, values=example_features.keys()):
    this_type = F.otype.v(i)
    for otype in example_features:
        if otype == this_type:
            for feature in example_features[otype]:
                value = F.item[feature].v(i)
                values[otype][feature][value] += 1
                if len(examples[otype][feature][value]) < nr_of_examples:
                    examples[otype][feature][value].add(i)

Let us print out how often each value occurs:

In [6]:
counts_file = outfile("counts.txt")

for otype in values:
    counts_file.write("{}\n".format(otype))
    for feature in values[otype]:
        counts_file.write("\t{}\n".format(feature))
        for (value, occ) in sorted(values[otype][feature].items(), key=lambda x: -x[1]):
            counts_file.write("\t\t{:<10}: {:>6} x\n".format(value, values[otype][feature][value]))

In [10]:
counts_file.close()
!cat {my_file('counts.txt')}

phrase
	typ
		VP        :  68893 x
		PP        :  58476 x
		CP        :  52612 x
		NP        :  41913 x
		PrNP      :  10294 x
		NegP      :   6768 x
		AdvP      :   4755 x
		PPrP      :   4297 x
		InjP      :   1878 x
		AdjP      :   1867 x
		InrP      :   1229 x
		IPrP      :    861 x
		DPrP      :    821 x
	function
		Pred      :  57046 x
		Conj      :  46291 x
		Subj      :  28957 x
		Cmpl      :  27949 x
		Objc      :  20816 x
		PreC      :  17765 x
		Unkn      :  11367 x
		Adju      :   8857 x
		Rela      :   6338 x
		Nega      :   6058 x
		PreO      :   5514 x
		Modi      :   3703 x
		Time      :   3551 x
		Loca      :   2510 x
		Intj      :   1627 x
		Voct      :   1524 x
		Ques      :   1275 x
		Frnt      :   1026 x
		PreS      :    778 x
		NCop      :    609 x
		Supp      :    296 x
		IntS      :    250 x
		PtcO      :    166 x
		Exst      :    144 x
		NCoS      :    101 x
		PrAd      :     84 x
		ModS      :     36 x
		ExsS      :  

# Description of the values

## Phrase function (aka parsing label)

### Ques
Interrogative particle. (lamed mem heh, alef yod he, he)

### IrpS 
Interrogative pronoun functioning as as a subject.

### IrpO
Interrogative pronoun functioning as as an object.

### IrpC 
Interrogative pronoun functioning as as a complement.

### IrpP
Interrogative pronoun functioning as as a nominal predicate.

### Modi
Adverb indicating the manner of the verb is performed.

### PreS
Verbal infinitive with suffix functioning as subject.

### Intj
Interjection without any suffix.

### IntS
Interjection with suffix functioning as subject. (e.g. hinneni)

### ModS
Modifier/(durative)particle with a subject suffix. (ayin wav daleth)

### Exst
Existential particle (yod shin)

### ExsS 
Existential particle (yod shin) with a subject suffix

### Nega 
Negative particle (lamed alef) and
negative existential particle (alef yod nun) without any suffix

### NegS
Negative existential particle (alef yod nun) with a subject suffix

### Subj
Grammatical subject of the predicate of the clause

### Objc 
Grammatical direct object of the predicate of the clause

### Cmpl 
Grammatical non-optional, non-subj, non-obj complement of the predicate of the clause

### Adju
Grammatical optional, non-subj, non-obj complement of the predicate of the clause, e.g. manner, goal.

### Loca
Specific case of ``Adju``, indicating location.
If a location happens to be non-optional, it is marked as a ``Cmpl``.

### Time 
Specific case of ``Adju``, indicating time.

### Pred 
Verbal predicate of the clause

### PreO 
Verbal predicate with object suffix

### PreC 
Nominal predicate

### PtcO 
Nominal predicate (in the shape of of participle) with object suffix

### PtSp

Participium with suffix which is possessive.

### Rela
Relative pronoun or article followed by predication.

### Supp
Supplementary constituent. 
Dativus ethicus. 
(lamed plus pronominal suffix referring to the subject of the clause).

### Frnt
Extrapolated element, which is resumed in the following clause by an explicit constituent.
See ``Resu`` in clause_constituent_relation.

### Voct
Vocative, usually in its own clause (without predicate).

### Conj
Conjunction between clauses, only the single word.

### Unkn
Unparsed. 
This means that the phrase in question has not been assigned a parse label yet.

# Phrases and mothers

In [7]:
for itype in ('sentence', 'sentence_atom', 'clause', 'clause_atom', 'phrase', 'phrase_atom', 'subphrase', 'word'):
    msg("Counting {}s and their mothers".format(itype))
    mother = {}
    np = 0
    for node in NN(test=F.otype.v, value=itype):
        np += 1
        mothers = list(C.mother.v(node))
        if len(mothers): mother[node] = mothers[0]
    msg("{} {}s have {} mothers".format(np, itype, len(mother)))

 1m 02s Counting sentences and their mothers
 1m 04s 66045 sentences have 0 mothers
 1m 04s Counting sentence_atoms and their mothers
 1m 05s 66701 sentence_atoms have 0 mothers
 1m 05s Counting clauses and their mothers
 1m 06s 87978 clauses have 18580 mothers
 1m 06s Counting clause_atoms and their mothers
 1m 08s 90144 clause_atoms have 89079 mothers
 1m 08s Counting phrases and their mothers
 1m 10s 254664 phrases have 207 mothers
 1m 10s Counting phrase_atoms and their mothers
 1m 12s 267965 phrase_atoms have 13301 mothers
 1m 12s Counting subphrases and their mothers
 1m 14s 112229 subphrases have 55244 mothers
 1m 14s Counting words and their mothers
 1m 17s 426555 words have 0 mothers


## One word phrases

Let us check the sizes (in words) of that phrases can have when they have a certain value for *phrase function*.

In [8]:
counts_file = outfile("phrase_counts.txt")

phrase_counts = collections.defaultdict(lambda: collections.defaultdict(int))

for node in NN(test=F.otype.v, value='phrase'):
    pf = F.function.v(node)
    pn = len(has_words[node])
    phrase_counts[pf][pn] += 1

for pf in sorted(phrase_counts):
    counts_file.write("{}\n".format(pf))
    for (pn, occ) in sorted(phrase_counts[pf].items(), key=lambda x: -x[1]):
        counts_file.write("\t{}: {} x\n".format(pn, occ))

In [9]:
counts_file.close()
!cat {my_file('phrase_counts.txt')}

Adju
	2: 3165 x
	3: 2495 x
	1: 1314 x
	4: 745 x
	5: 504 x
	6: 207 x
	7: 153 x
	9: 57 x
	8: 55 x
	10: 44 x
	11: 40 x
	13: 13 x
	15: 11 x
	14: 10 x
	12: 9 x
	16: 6 x
	18: 6 x
	19: 6 x
	17: 5 x
	20: 3 x
	22: 3 x
	21: 2 x
	24: 2 x
	23: 1 x
	33: 1 x
Cmpl
	2: 8641 x
	1: 8498 x
	3: 6455 x
	4: 1948 x
	5: 1026 x
	6: 384 x
	7: 339 x
	8: 193 x
	9: 118 x
	10: 83 x
	11: 63 x
	13: 40 x
	12: 31 x
	14: 25 x
	15: 20 x
	16: 19 x
	18: 12 x
	19: 12 x
	17: 10 x
	20: 8 x
	22: 6 x
	23: 4 x
	21: 3 x
	25: 3 x
	24: 2 x
	27: 2 x
	30: 2 x
	26: 1 x
	29: 1 x
Conj
	1: 45331 x
	2: 923 x
	3: 16 x
	4: 16 x
	5: 1 x
	7: 1 x
	8: 1 x
	9: 1 x
	13: 1 x
EPPr
	1: 4 x
ExsS
	1: 14 x
Exst
	1: 144 x
Frnt
	1: 400 x
	2: 275 x
	3: 135 x
	4: 70 x
	5: 59 x
	6: 23 x
	7: 16 x
	8: 12 x
	9: 12 x
	10: 7 x
	12: 5 x
	13: 5 x
	11: 3 x
	14: 1 x
	15: 1 x
	17: 1 x
	18: 1 x
IntS
	1: 250 x
Intj
	1: 1618 x
	2: 6 x
	3: 3 x
Loca
	3: 921 x
	2: 589 x
	1: 4

**Questions**

* The phrases with phrase_function ``ModS`` having 2 words are interesting.
* Conjunctions between clauses versus between phrases.

## Clause Constituent Relation

### Subject
``Subj`` *subject clause*.
clause that has the function of subject

### Object
``Objc`` *object clause*.
clause that has the function of object

### Complement
``Cmpl`` *complement clause, but not subject or object*.
clause that has a function of a verb complement, but not subject or object 

### Attributive
``Attr`` *attributive clause*.
clause that has an attributive function (often with a relative pronoun)

### Adjunct
``Adju`` *adjunct clause*.
clauses with additional information, usually without a finite verb

### Predicative clause
``Pred`` *predicative clause*.
clause that has a predicative function

### Coordination
``Coor`` *coordination*.
multiple dependent clauses coordinated (with and, or etc) to each other under the same head (a main clause or a phrase (asher))

### Continuation of the vocative
``CoVo`` *continuation of the vocative*.
clause that follows after a vocative: *Adam*, where are you.

### Resumptive
``Resu`` *clause after an extrapolated fronted element*.
Contains an element that resumes the earlier extrapolated element.
King David, Nathan the prohpet spoke severly to *him* [here King David is *casus pendens* or extrapolated element.

### Regens Rectum
``RgRc`` *Regens rectum (governing governed)*.
You shall reign over the birds and the animals and **all** *creeps on the face of the earth* [Here **all** governs the *reptiles*]

### NA
``NA`` *not marked*.
No clause constituent relation marked, because the clause does not act as a constituent.

## None values

Why so many ``NA`` values? 
Do all clauses have a mother?
Let us check.

In [10]:
n_clauses = 0
mothers_empty = 0
mothers_empty_none = 0
for node in NN(test=F.otype.v, value='clause'):
    n_clauses += 1
    these_mothers = set(C.mother.v(node))
    if len(these_mothers) == 0:
        mothers_empty += 1
        if F.rela.v(node) == 'NA':
            mothers_empty_none += 1
print("{:<20}: {:>6} x\n{:<20}: {:>6} x\n{:<20}: {:>6} x".format(
    'Clauses', n_clauses, 
    'Mother empty', mothers_empty, 
    'Mother empty + NA', mothers_empty_none
))

Clauses             :  87978 x
Mother empty        :  69398 x
Mother empty + NA   :  69398 x


So, indeed, the motherless clauses are exactly the ones that have the value ``NA`` for the *rel* feature.

What are motherless clauses? Probably those that are not dependent on another word, clause, or phrase.

Which ones are they?
Maybe they are the clauses that coincide with the sentences they are part of.
Let us check.

In [16]:
len(clauses_of)

0

In [11]:
n_sentences = 0
n_sg_clause = 0
n_sg_clause_none = 0
for sentence in clauses_of:
    n_sentences += 1
    clauses = clauses_of[sentence]
    if len(clauses) == 1:
        n_sg_clause += 1
        ccrs = [F.rela.v(i) for i in clauses]
        if ccrs[0] == 'NA':
            n_sg_clause_none += 1
print("{:<25}: {:>6} x\n{:<25}: {:>6} x\n{:<25}: {:>6} x\n{:<25}: {:>6} x".format(
    'Sentences', n_sentences, 
    'Single clauses', n_sg_clause, 
    'Single clause + NA', n_sg_clause_none,
    'Remaining NA clauses', mothers_empty - n_sg_clause_none,
))
        

Sentences                :      0 x
Single clauses           :      0 x
Single clause + NA       :      0 x
Remaining NA clauses     :  69398 x


Indeed, this accounts for the majority of ``none`` values of the *clause_constituent_relation* feature.
Can we pinpoint the remaining 14,158 ?

Are all clauses part of a sentence atom?

In [16]:
clause_parents = set()
for node in NN(test=F.shebanq_db_otype.v, value='clause'):
    clause_parents |= set([F.shebanq_db_otype.v(p) for p in C.shebanq_parents_.v(node)])
clause_parents

{'sentence_atom'}

Let us see what the children patterns are of sentence atoms.
A clause with *clause_constituent_relation* = ``none`` will be marked as *.

In [17]:
abbrev = {
    'sentence': 'S',
    'sentence_atom': 's',
    'clause': 'C',
    'clause_atom': 'c',
    'phrase': 'P',
    'phrase_atom': 'p',
    'subphrase': 'b',
    'word': 'w',
}

patterns = collections.defaultdict(int)

for node in NN(test=F.shebanq_db_otype.v, value='sentence_atom'):
    patterns[''.join([
        abbrev[F.shebanq_db_otype.v(c)]
            if F.shebanq_db_otype.v(c) != 'clause' or F.shebanq_ft_clause_constituent_relation.v(c) != 'none'
            else '*' for c in Ci.shebanq_parents_.v(node)
    ])] += 1

for (pat, occ) in sorted(patterns.items(), key=lambda x: (-x[1], x[0])):
    print("{:>5} x: {}".format(occ, pat))

61256 x: *
 4945 x: *C
 1130 x: **
  947 x: C*
  916 x: *CC
  305 x: *C*
  193 x: **C
  188 x: C*C
  179 x: CC*
  174 x: *CCC
  116 x: *CC*
   87 x: ***
   79 x: C**
   68 x: *C*C
   63 x: C
   59 x: *CCCC
   46 x: **CC
   45 x: CC*C
   44 x: CCC*
   43 x: C*C*
   42 x: C*CC
   32 x: *C**
   28 x: C**C
   28 x: CC**
   25 x: ****
   24 x: *CC*C
   24 x: *CCC*
   20 x: CC*CC
   19 x: C*CCC
   19 x: CC
   19 x: CCCC*
   17 x: **C*
   16 x: *C*CC
   16 x: *CCCC*
   15 x: ***C
   15 x: *CCCCC
   14 x: **CCC
   14 x: C***
   14 x: CCC*C
   11 x: CCC**
   10 x: *CCC*C
   10 x: CCC
    9 x: **CCCC
    9 x: C*C*C
    8 x: ***CC
    8 x: *CCCCCC
    8 x: CC*C*
    7 x: ***C*C*
    7 x: **C*C
    7 x: *C**C
    7 x: *C*C*
    7 x: *C*CC*
    7 x: *C*CCC
    7 x: *CC**
    7 x: *CC*C*
    7 x: *CC*CC
    7 x: C*CC*
    7 x: CC**C
    6 x: *CC**C
    6 x: *CCCC*C
    6 x: C**CC
    6 x: C*CCCC
    5 x: CC*C**
    4 x: *****
    4 x: ******
    4 x: *******
    4 x: **C*CC
    4 x: **CC*
    4 x: *

Hmm.

# Examples

Let us compile and print out the examples.

Here we use the information we collected when we walked the trees, above.

For each example we print two lines:

    value of clause constituent relation    passage (verse label)    the words of the clause in question
        the words of the whole sentence in which the clause occurs

In [18]:
for otype in sorted(examples):
    print("{}".format(otype))
    for feature in sorted(examples[otype]):
        print("{}".format(feature))
        for value in sorted(examples[otype][feature]):
            print("{}".format(value))
            for node in examples[otype][feature][value]:
                sentence = to_sentence[node]
                my_words = sorted(has_words[node])
                swords = sorted(has_words[sentence])
                vlabel = sentence_verse[sentence]
                print("\t{:<10}: {}\n\t{:<16}{}".format(
                    vlabel, 
                    " ".join([F.shebanq_ft_text.v(word) for word in my_words]),
                    '',
                    " ".join([F.shebanq_ft_text.v(word) for word in swords]),
                ))

clause
clause_constituent_relation
Adju
	 GEN 01,16: לְ הָאִ֖יר עַל הָ אָֽרֶץ
	                וַ יִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּ רְקִ֣יעַ הַ שָּׁמָ֑יִם לְ הָאִ֖יר עַל הָ אָֽרֶץ וְ לִ מְשֹׁל֙ בַּ  יֹּ֣ום וּ בַ  לַּ֔יְלָה וּֽ לֲ הַבְדִּ֔יל בֵּ֥ין הָ אֹ֖ור וּ בֵ֣ין הַ חֹ֑שֶׁךְ
	 GEN 01,15: לְ הָאִ֖יר עַל הָ אָ֑רֶץ
	                וְ הָי֤וּ לִ מְאֹורֹת֙ בִּ רְקִ֣יעַ הַ שָּׁמַ֔יִם לְ הָאִ֖יר עַל הָ אָ֑רֶץ
	 GEN 01,14: לְ הַבְדִּ֕יל בֵּ֥ין הַ יֹּ֖ום וּ בֵ֣ין הַ לָּ֑יְלָה
	                יְהִ֤י מְאֹרֹת֙ בִּ רְקִ֣יעַ הַ שָּׁמַ֔יִם לְ הַבְדִּ֕יל בֵּ֥ין הַ יֹּ֖ום וּ בֵ֣ין הַ לָּ֑יְלָה
Attr
	 GEN 01,07: אֲשֶׁ֖ר מֵ עַ֣ל לָ  רָקִ֑יעַ
	                וַ יַּבְדֵּ֗ל בֵּ֤ין הַ מַּ֨יִם֙ אֲשֶׁר֙ מִ תַּ֣חַת לָ  רָקִ֔יעַ וּ בֵ֣ין הַ מַּ֔יִם אֲשֶׁ֖ר מֵ עַ֣ל לָ  רָקִ֑יעַ
	 GEN 01,11: מַזְרִ֣יעַ זֶ֔רַע
	                תַּֽדְשֵׁ֤א הָ אָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב מַזְרִ֣יעַ זֶ֔רַע עֵ֣ץ פְּרִ֞י עֹ֤שֶׂה פְּרִי֙ לְ מִינֹ֔ו אֲשֶׁ֥ר זַרְעֹו בֹ֖ו עַל הָ אָ֑רֶץ
	 GEN 01,07: אֲשֶׁר֙ מִ תַּ֣חַת לָ  רָקִ֔יעַ
	            

In [19]:
close()

 1m 48s Results directory:
/Users/dirk/laf-fabric-data/bhs3/tasks/clauses_phrase_types

__log__clauses_phrase_types.txt         1210 Tue May 13 08:59:48 2014
counts.txt                             1995 Tue May 13 08:58:38 2014
phrase_counts.txt                      3004 Tue May 13 08:59:10 2014


# Domain and text type

Let us have a look at the domain feature.

In [26]:
Fotypev = F.otype.v
Fdomainv = F.domain.v
Ftexttypev = F.text_type.v
distribd = collections.defaultdict(lambda: collections.defaultdict(lambda: 0))
distribt = collections.defaultdict(lambda: collections.defaultdict(lambda: 0))
for n in NN():
    otype = Fotypev(n)
    domain = Fdomainv(n)
    ttype = Ftexttypev(n)
    distribd[domain][otype] += 1
    distribt[ttype][otype] += 1
for d in distribd:
    print("{}".format(d))
    for otype in distribd[d]:
        print("\t{}: {}x".format(otype, distribd[d][otype]))
for t in distribt:
    print("{}".format(t))
    for otype in distribt[t]:
        print("\t{}: {}x".format(otype, distribt[t][otype]))

Narrative
	clause: 21710x
Discursive
	clause: 4138x
None
	book: 39x
	clause_atom: 90061x
	phrase: 257109x
	subphrase: 109536x
	sentence_atom: 71727x
	word: 426499x
	half_verse: 44683x
	chapter: 929x
	phrase_atom: 269638x
	verse: 23213x
	sentence: 71354x
Unknown
	clause: 32993x
Quotation
	clause: 29546x
QQQQN
	clause: 3x
?QNQN
	clause: 1x
?QNQQ
	clause: 23x
NQQN
	clause: 240x
NQQQNQ
	clause: 38x
DN
	clause: 6x
None
	book: 39x
	clause_atom: 90061x
	phrase: 257109x
	subphrase: 109536x
	sentence_atom: 71727x
	word: 426499x
	half_verse: 44683x
	chapter: 929x
	phrase_atom: 269638x
	verse: 23213x
	sentence: 71354x
?QQQQ
	clause: 92x
NDQN
	clause: 8x
ND
	clause: 816x
NNN
	clause: 4x
NQQQQQ
	clause: 27x
NQQ
	clause: 4119x
?N
	clause: 1x
QQQQ
	clause: 38x
QNDQ
	clause: 24x
NQQQQ
	clause: 186x
?QQQQQQ
	clause: 1x
Q
	clause: 1821x
DQN
	clause: 11x
NQQQNQN
	clause: 3x
N
	clause: 19430x
?QQQQQ
	clause: 33x
DQQ
	clause: 51x
NQQNQN
	clause: 2x
QNQQ
	clause: 2x
?QQNQQQ
	clause: 11x
?QQNQ
	clause: 6x
NQ