# [Best viewed in NBviewer](https://nbviewer.jupyter.org/github/ETCBC/heads/blob/master/tutorial.ipynb)

# Heads Tutorial


## Introduction 

This notebook provides a basic introduction to the `heads` edge and node features for BHSA, produced in `etcbc/heads`. Syntactic phrase heads are important because they provide the semantic core for a phrase. To give a simple example:

> a good man

In this phrase, the word "man" serves as the phrase head. The word "good" is an adjective, and "a" is an indefinite article. While important, "good" and "a" are more optional, and their influence is mostly local to the phrase and to their relation to "man." The phrase head, by comparison, has a stronger influence on the surrounding syntactic-semantic environment. For instance: 

> A good man walks.

The use of the subject phrase head "man" coincides with the use of the verb phrase "walks." This is possible because "man" has the semantic property of "things that walk," or perhaps animacy. Having heads for Hebrew phrases in BHSA enables more sophisticated analyses such as [participant tracking](https://en.wikipedia.org/wiki/Named-entity_recognition) and [word embeddings](https://en.wikipedia.org/wiki/Word_embedding) (see use case [here](https://nbviewer.jupyter.org/github/codykingham/noun_semantics_SBL18/blob/master/analysis/noun_semantics.ipynb)). 

## Phrase Types and Their Heads

Different phrases in BHSA have different type values (BHSA: `typ`, [see bottom here](https://etcbc.github.io/bhsa/features/hebrew/c/typ)). A type value is derived from the part of speech of the phrase's head. Below are some example type values and expected parts of speech in BHSA:

| type               | BHSA typ  | head part of speech |
|--------------------|------|---------------------|
| verb phrase        | VP   | verb                |
| noun phrase        | NP   | noun (BHSA: subs)   |
| preposition phrase | PP   | preposition         |
| conjunction phrase | CP   | conjunction         |
| adverb phrase      | AdvP | adverb              |
| adjective phrase   | AdjP | adjective           |

Additional phrase types and parts of speech can be browsed [here](https://etcbc.github.io/bhsa/features/hebrew/c/typ) and [here](https://etcbc.github.io/bhsa/features/hebrew/c/pdp.html). To give an example: in the phrase בראשית, the head is simply ב, since this phrase has the type: prepositional phrase, or `PP`. The `heads` data is calculated for phrases using these correspondences. So its important to know when searching for certain kinds of data. In case one wants to isolate ראשית (and ignore any potential modifying elements) another feature is made available, called `nheads` (nominal heads), which provides for such cases.

## Dataset

The `heads` dataset can be downloaded and used with Text-Fabric's `BHSA` app. An additional argument is passed to the `use` method: `mod=etcbc/heads/tf`. The location is not a local directory, but rather provides TF with coordinates to find the latest data in Github. The coordinates are organized as `organization/repository/tf`.

In [2]:
from tf.app import use

A = use('bhsa', mod='etcbc/heads/tf', hoist=globals())

Using etcbc/bhsa/tf - c r1.5 in /Users/cody/text-fabric-data
Using etcbc/phono/tf - c r1.2 in /Users/cody/text-fabric-data
Using etcbc/parallels/tf - c r1.2 in /Users/cody/text-fabric-data
Using etcbc/heads/tf - c rv1.02 in /Users/cody/text-fabric-data


**Documentation:** <a target="_blank" href="https://etcbc.github.io/bhsa" title="provenance of BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis">BHSA</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Writing/Hebrew" title="('Hebrew characters and transcriptions',)">Character table</a> <a target="_blank" href="https://etcbc.github.io/bhsa/features/hebrew/c/0_home.html" title="BHSA feature documentation">Feature docs</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Apps/Bhsa/" title="bhsa API documentation">bhsa API</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/" title="text-fabric-api">Text-Fabric API 7.2.1</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Use/Search/" title="Search Templates Introduction and Reference">Search Reference</a>

Four primary datasets are available in `heads`:

* `head` - an edge feature from a syntactic phrase head to its containing phrase node
* `obj_prep` - an edge feature from an object of a preposition to its governing preposition
* `nhead` - an edge feature from a nominal syntactic phrase head to its containing phrase node; handy for prepositional phrases with nominal elements     contained within
* `sem_set` - a semantic set feature which contains the following feature values:
    * `quant` - enhanced quantifier sets
    * `prep` - enhanced preposition sets

## Getting a Phrase Head

The feature `head` contains an edge feature from a word node to the phrase for which it contributes headship. Let's have a look at heads in Genesis 1 with a simple query and display:

In [3]:
# configure TF display
A.displaySetup(condenseType='phrase', condensed=True)

# search for heads in Genesis 1:1-2
genesis_heads = '''

book book@en=Genesis
    chapter chapter=1
        verse verse<2
            phrase
            <head- word

'''

genesis_heads = A.search(genesis_heads)

A.show(genesis_heads)

  0.84s 5 results




**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*



Note how the selected heads agree with their phrase types: prepositional phrases have preposition heads, noun phrases have noun heads (`subs`), etc. Note also how the `head` feature encodes multiple heads in cases where there are coordinated heads.

Below we find cases where there are at least 3 coordinate heads in a phrase.

In [4]:
three_heads = '''

p:phrase
    w1:word
    < w2:word
    < w3:word

p <head- w1
p <head- w2
p <head- w3
'''

three_heads = A.search(three_heads)

A.show(three_heads, end=10, withNodes=True)

  3.77s 11285 results




**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*





**phrase** *5*





**phrase** *6*





**phrase** *7*





**phrase** *8*





**phrase** *9*





**phrase** *10*



### Hand Coding Heads

Heads features can also be accessed by hand-coding with the `E` and `F` classes. The `E` (edge) class must be called on a word node with `.f` (edge from word node).

In [5]:
T.text(1)

'בְּ'

In [6]:
E.head.f(1)

(651542,)

In [7]:
T.text(E.head.f(1)[0])

'בְּרֵאשִׁ֖ית '

But the heads can also be located by calling `E.head` on the phrase, but with `.t` at the end (edge to):

In [8]:
E.head.t(651542)

(1,)

In [9]:
T.text(E.head.t(651542)[0])

'בְּ'

NB that the edge class always returns a tuple. This is because multiple edges are possible. This is valuable in cases where you want to find a phrase with multiple heads, as we did in the template above:

In [10]:
example = three_heads[0][0]

E.head.t(example)

(185, 186, 189)

In [11]:
A.prettyTuple((example,) + E.head.t(example), seqNumber=0, withNodes=True)



**Result** *0*



## Getting an Object of a Preposition

Often one wants to find objects of prepositions without accidentally selecting secondary, modifying elements, and without omitting coordinated objects. This is now possible with the new `obj_prep` edge feature. A few examples are provided below. We highlight prepositions in blue, and their objects in pink. 

In [12]:
E.obj_prep.f(23)

(22,)

In [13]:
find_objs = '''

phrase
    word
    <obj_prep- word

'''

find_objs = A.search(find_objs)

highlights = {}
for result in find_objs:
    highlights[result[1]] = 'lightblue'
    highlights[result[2]] = 'pink'
    
A.show(find_objs, highlights=highlights, end=10)

  1.49s 64233 results




**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*





**phrase** *5*





**phrase** *6*





**phrase** *7*





**phrase** *8*





**phrase** *9*





**phrase** *10*



Note that prepositional terms such as פני and תוך are properly treated as prepositions. This is due to a new custom set of prepositional words which were needed when processing the `heads` and `obj_prep` features. This feature is made available in `sem_set`, which has two values: `prep` and `quant`.

Below are cases of `prep` that are marked in BHSA as `subs` (noun):

In [20]:
sem_preps = A.search('''

word pdp=subs sem_set=prep

''')

A.show(sem_preps, end=10, extraFeatures={'sem_set'})

  0.53s 2069 results




**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*





**phrase** *5*





**phrase** *6*





**phrase** *7*





**phrase** *8*





**phrase** *9*





**phrase** *10*



Returning to `obj_prep`, let's find cases where a preposition has more than one object. We do this with a hand-coded method this time. NB that only the cases of multiple objects of prepositions are highlighted.

In [15]:
results = []
highlights = {}

for prep in F.sem_set.s('prep'):
    
    objects = E.obj_prep.t(prep)

    if len(objects) > 1:
        phrase = L.u(prep, 'phrase')[0]
        results.append((phrase, prep)+objects)
        
        # update highlights
        highlights[prep] = 'lightblue'
        highlights.update({obj:'pink' for obj in objects})
    
A.show(results, highlights=highlights, end=10, withNodes=True)



**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*





**phrase** *5*





**phrase** *6*





**phrase** *7*





**phrase** *8*





**phrase** *9*





**phrase** *10*



## Getting Nominal Heads

There are cases where it is beneficial to simply select any nominal head elements from underneath prepositional phrases ("nominal" here meaning any non-prepositional head). This is especially relevant when prepositional phrases are chained together, and the nominal element is difficult to recover. For these cases there is the feature `nhead`. Below we find such cases with a simple search:

In [16]:
find_nheads = A.search('''

p:phrase typ=PP
/with/
    =: word sem_set=prep
    <: word sem_set=prep
/-/
    < w1:word
    < w2:word

p <nhead- w1
p <nhead- w2
''')

A.show(find_nheads, end=10, withNodes=True)

  2.52s 177 results




**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*





**phrase** *5*





**phrase** *6*





**phrase** *7*





**phrase** *8*





**phrase** *9*





**phrase** *10*



## Using Enhanced Prepositions and Quantifiers

The `sem_set` feature brings enhanced preposition and quantifier semantic data, which was used in the calculations of `heads` features. We have already seen `prep` in action. Let's have a look at quantifiers.

In the BHSA base dataset, quantifiers can only be known through the `ls` (lexical set) feature where there is a value of `card` for cardinal numbers. But other kinds of quantifiers are not idenfiable. Let's have a look at what kinds of quantifiers `sem_set` = `quant` makes available...

In [28]:
sem_quants = A.search('''

phrase
    word ls#card sem_set=quant

''')

A.show(sem_quants, end=5, extraFeatures={'sem_set'})

  0.90s 6041 results




**phrase** *1*





**phrase** *2*





**phrase** *3*





**phrase** *4*





**phrase** *5*



These are all cases of כל. Let's find other kinds. We add a shuffle to randomize them as well. 

In [29]:
import random

In [31]:
nonKL = [result for result in sem_quants if F.lex.v(result[1]) != 'KL/']

random.shuffle(nonKL)

for i, result in enumerate(nonKL[:10]):
    A.prettyTuple(result, extraFeatures={'sem_set'}, seqNumber=i)



**Result** *0*





**Result** *1*





**Result** *2*





**Result** *3*





**Result** *4*





**Result** *5*





**Result** *6*





**Result** *7*





**Result** *8*





**Result** *9*

