# SemHyP (Semantic Hypergraph Parser)

## Module for semantic hypergraph parsing from textual annotations

This notebook demonstrates the usage of Semantic Hypergraph Parser module `semhyp`. The follow functionalities are covered:

1. Reading annotated text
2. Visualizing annotated text
3. Parsing annotated text into semantic hypergraph
4. Visualizing semantic hypergraph

In [1]:
import semhyp

## 1. Reading annotated text

An example of an annotated sentence in CoNLL-X format is provided below:

```
# Patrick knew about IBM's plans, but he was ready to ignore them.
0 0  Patrick + Patrick PROPN NNP nsubj 1  B-PERSON -         B-ARG0 O      O      B-MAIN1 O       -
0 1  knew    + know    VERB  VBD ROOT  1  O        know.01   B-V    O      O      O       O       know.v.01
0 2  about   + about   ADP   IN  prep  1  O        -         B-ARG1 O      O      O       O       -
0 3  IBM     - IBM     PROPN NNP poss  5  B-ORG    -         I-ARG1 O      O      O       B-MAIN2 -
0 4  's      + 's      PART  POS case  3  O        -         I-ARG1 O      O      O       I-MAIN2 -
0 5  plans   - plan    NOUN  NNS pobj  2  O        -         I-ARG1 O      O      O       I-MAIN2 plan.n.01
0 6  ,       + ,       PUNCT ,   punct 1  O        -         O      O      O      O       O       -
0 7  but     + but     CCONJ CC  cc    1  O        -         O      O      O      O       O       -
0 8  he      + he      PRON  PRP nsubj 9  O        -         O      B-ARG1 B-ARG0 B-REF1  O       -
0 9  was     + be      AUX   VBD conj  1  O        be.01     O      B-V    O      O       O       -
0 10 ready   + ready   ADJ   JJ  acomp 9  O        -         O      B-ARG2 O      O       O       ready.a.01
0 11 to      + to      PART  TO  aux   12 O        -         O      I-ARG2 O      O       O       -
0 12 ignore  + ignore  VERB  VB  xcomp 10 O        ignore.01 O      I-ARG2 B-V    O       O       ignore.v.01
0 13 them    - they    PRON  PRP dobj  12 O        -         O      I-ARG2 B-ARG1 O       B-REF2  -
0 14 .       + .       PUNCT .   punct 9  O        -         O      O      O      O       O       -
```

Each row represents a token in the sentence with various annotations. The columns are as follows:

1. **Sentence ID**: The ID of the sentence in the document (0 for the first sentence).
2. **Token ID**: The position of the token in the sentence.
3. **Token**: The original token in the sentence.
4. **Space**: Is token followed by space.
5. **Lemma**: The base form of the token.
6. **POS**: Part of Speech tag.
7. **Detailed POS**: More specific POS tag.
8. **Dependency Relation**: Syntactic dependency relation to the head of the sentence.
9. **Head ID**: The ID of the head token in the dependency tree.
10. **NER**: Named Entity Recognition tag.
11. **FrameNet**: FrameNet annotation.
12. **SRL**:  Semantic Role Label of the verb "knew" in the sentence.
13. **SRL**:  Semantic Role Label of the verb "was" in the sentence.
14. **SRL**:  Semantic Role Label of the verb "ignore" in the sentence.
15. **Coreference**: Coreference chain for token "he".
16. **Coreference**: Coreference chain for token "them".
17. **WordNet**: WordNet sense.



In [2]:
FILENAME = "dataset/patrick.txt"

annotated_text = open(FILENAME).read()
annotated_text

"# Patrick knew about IBM's plans, but he was ready to ignore them.\n0 0  Patrick + Patrick PROPN NNP nsubj 1  B-PERSON -         B-ARG0 O      O      B-MAIN1 O       -\n0 1  knew    + know    VERB  VBD ROOT  1  O        know.01   B-V    O      O      O       O       know.v.01\n0 2  about   + about   ADP   IN  prep  1  O        -         B-ARG1 O      O      O       O       -\n0 3  IBM     - IBM     PROPN NNP poss  5  B-ORG    -         I-ARG1 O      O      O       B-MAIN2 -\n0 4  's      + 's      PART  POS case  3  O        -         I-ARG1 O      O      O       I-MAIN2 -\n0 5  plans   - plan    NOUN  NNS pobj  2  O        -         I-ARG1 O      O      O       I-MAIN2 plan.n.01\n0 6  ,       + ,       PUNCT ,   punct 1  O        -         O      O      O      O       O       -\n0 7  but     + but     CCONJ CC  cc    1  O        -         O      O      O      O       O       -\n0 8  he      + he      PRON  PRP nsubj 9  O        -         O      B-ARG1 B-ARG0 B-REF1  O       -\n0 9  w

The function `semhyp.read` is used to read the textual annotations of the sentences. The final result is a `doc` object, which is similar to the one used by **spaCy**.

In [3]:
doc = semhyp.read(FILENAME)
doc

Patrick knew about IBM's plans, but he was ready to ignore them. He focused on his own plan, believing his team could still succeed. They just needed to stay on course and work together.

This code snippet demonstrates how to:

1. Iterate over sentences in the `doc` object.
2. Iterate over tokens in each sentence and print their text, lemma, part of speech, dependency relation, head token, roleset and synset.
3. Print named entities in each sentence.
4. Print semantic roles in each sentence
4. Print coreference chains in each sentence.


In [4]:
# Assuming `doc` is the object returned by `semhyp.read`

# Iterate over sentences in the doc
for sent in doc.sents:
    print("Sentence:", sent.text)
    print("Sentence index:", sent.label)
    
    # Iterate over tokens in the sentence
    for token in sent:
        print(f" Token: {token.text}, Index: {token.i}, Lemma: {token.lemma}, POS: {token.pos}, Dependency: {token.dep}, Head: {token.head.text}, Roleset: {token.roleset}, Synset: {token.synset}")

    # Print named entities in the sentence
    for ent in sent.ent:
        print(f" Entity: {ent.text}, Label: {ent.label}")

    # Print semantic roles in the sentence
    for verb_token in sent.srl:
        print(f" Verb: {verb_token.text}")
        for arg in sent.srl[verb_token]:
            print(f"  Argument: {arg.text}, Label: {arg.label}")

    # Print coreference chains in the sentence
    for chain in sent.coref:
        print(f" Coreference chain: {chain}")
        for mention in sent.coref[chain]:
            print(f"  Mention: {mention.text}, Label: {mention.label}")
    print("\n")

Sentence: Patrick knew about IBM's plans, but he was ready to ignore them.
Sentence index: 0
 Token: Patrick, Index: 0, Lemma: Patrick, POS: PROPN, Dependency: nsubj, Head: knew, Roleset: None, Synset: None
 Token: knew, Index: 1, Lemma: know, POS: VERB, Dependency: ROOT, Head: knew, Roleset: know.01, Synset: know.v.01
 Token: about, Index: 2, Lemma: about, POS: ADP, Dependency: prep, Head: knew, Roleset: None, Synset: None
 Token: IBM, Index: 3, Lemma: IBM, POS: PROPN, Dependency: poss, Head: plans, Roleset: None, Synset: None
 Token: 's, Index: 4, Lemma: 's, POS: PART, Dependency: case, Head: IBM, Roleset: None, Synset: None
 Token: plans, Index: 5, Lemma: plan, POS: NOUN, Dependency: pobj, Head: about, Roleset: None, Synset: plan.n.01
 Token: ,, Index: 6, Lemma: ,, POS: PUNCT, Dependency: punct, Head: knew, Roleset: None, Synset: None
 Token: but, Index: 7, Lemma: but, POS: CCONJ, Dependency: cc, Head: knew, Roleset: None, Synset: None
 Token: he, Index: 8, Lemma: he, POS: PRON, Dep

## 2. Visualizing annotated text

This code generates and displays an SVG visualization of annotated text from a `doc` object using the `draw_text` function from the `semhyp.drawer` module. It includes various annotations such as text, dependency relations, part of speech, tags, named entities, rolesets, semantic role labels, coreference chains, and WordNet synsets. The visualization is then displayed in a Jupyter notebook using the HTML function from `IPython.display`.

In [5]:
from IPython.display import HTML

FONT_SIZE = 16
svg = semhyp.draw_text(doc, 
                       show_spans=True, 
                       annos="text, dep, pos, tag, ent, roleset, srl, coref, synset", 
                       font_height=FONT_SIZE, font_width=FONT_SIZE*0.7, offset=5, margin=10)

HTML(svg)


This code dislays SVG visualization of one sentence from `doc` object. All available annotations are included.

In [6]:
sents = list(doc.sents)

svg = semhyp.draw_text(sents[0])
HTML(svg)

## 3. Parsing annotated text into semantic hypergraph

This code parses annotated text of a document `doc` by using `semhyp.parse` function and returns hypergraph `graph`. This hypergraph is implemented as a list whose elements are semantic hyperedges for each sentence in the document. 

The first sentence "Patrick knew about IBM's plans, but he was ready to ignore them." has following semantic hyperedge:

```
(but/J 
  (knew/Pd.sx:01.<f 
    patrick/Cp..s.p 
    (about/T 
    ('s/Bp ibm/Cp..s.o plans/Cc..p))) 
  (was/Pd.sc:12.<f 
    (+/Jc.rm.rp he/Ci patrick/Cp..s.p)
    (+/Br.am 
      ready/Ca 
      ((to/Mi.< ignore/P.-o:01.-i) 
        (+/Jc.rm.rp 
          he/Ci 
          patrick/Cp..s.p) 
        (+/Jc.rm.rc 
          them/Ci 
          ('s/Bp ibm/Cp..s.o plans/Cc..p))))))
```



In [7]:
graph = semhyp.parse(doc)
graph

[(but/J (knew/Pd.sx:01.<f patrick/Cp..s.p (about/T ('s/Bp ibm/Cp..s.o plans/Cc..p))) (was/Pd.sc:12.<f (+/Jc.rm.rp he/Ci patrick/Cp..s.p) (+/Br.am ready/Ca ((to/Mi.< ignore/P.-o:01.-i) (+/Jc.rm.rp he/Ci patrick/Cp..s.p) (+/Jc.rm.rc them/Ci ('s/Bp ibm/Cp..s.o plans/Cc..p)))))),
 (focused/Pd.sxx:02h.<f (+/Jc.rm.rp he/Ci patrick/Cp..s.p) (on/T ((+/Jc.rm.rp his/Mp patrick/Cp..s.p) (own/Ma.< plan/Cc..s))) (believing/P.-r:01.|pg (+/Jc.rm.rp he/Ci patrick/Cp..s.p) ((could/Mm.< succeed/P.sx:0t.-i) ((+/Jc.rm.rp his/Mp patrick/Cp..s.p) team/Cc..s) still/M))),
 (needed/Pd.sxr:0r1.<f (+/Jc.rm.rc they/Ci (his/Mp team/Cc..s)) just/M (and/J ((to/Mi.< stay/P.-x:13.-i) (+/Jc.rm.rc they/Ci (his/Mp team/Cc..s)) (on/T course/Cc..s)) (work/P.-x:0m.-i (+/Jc.rm.rc they/Ci (his/Mp team/Cc..s)) together/M)))]

The following code retrieves the first hyperedge from the hypergraph and outputs various details about it. It prints the original edge, its simplified version, and its rooted version. It then iterates over the atoms in the edge, printing details such as the atom's label, type, roles, morphology, and entity. Finally, it iterates over the subedges of the edge, printing each subedge in its simplified form.

In [8]:
# Assuming `edge` is the first semantic hyperedge in `graph`

edge = graph[0]
print("Edge:\n  ", edge)
print("Simplified edge:\n  ", edge.simplify())
print("Rooted edge:\n  ", edge.roots())

print()
# Iterate over atoms in the edge
for atom in edge.atoms():
    print("Atom:", atom, "label:", atom.label(), "type:", atom.type(), "roles:", atom.roles(), "morph:", atom.morph(), "entity:", atom.entity())
    

print()
# Iterate over subedges of the edge
for subedge in edge.subedges():
    print("Subedge:", subedge.simplify())

Edge:
   (but/J (knew/Pd.sx:01.<f patrick/Cp..s.p (about/T ('s/Bp ibm/Cp..s.o plans/Cc..p))) (was/Pd.sc:12.<f (+/Jc.rm.rp he/Ci patrick/Cp..s.p) (+/Br.am ready/Ca ((to/Mi.< ignore/P.-o:01.-i) (+/Jc.rm.rp he/Ci patrick/Cp..s.p) (+/Jc.rm.rc them/Ci ('s/Bp ibm/Cp..s.o plans/Cc..p))))))
Simplified edge:
   (but/J (knew/P patrick/C (about/T ('s/B ibm/C plans/C))) (was/P (+/J he/C patrick/C) (+/B ready/C ((to/M ignore/P) (+/J he/C patrick/C) (+/J them/C ('s/B ibm/C plans/C))))))
Rooted edge:
   (but (knew patrick (about ('s ibm plans))) (was (+ he patrick) (+ ready ((to ignore) (+ he patrick) (+ them ('s ibm plans))))))

Atom: but/J label: but type: J roles: None morph: None entity: None
Atom: plans/Cc..p label: plans type: Cc roles: ('',) morph: ('p',) entity: None
Atom: patrick/Cp..s.p label: patrick type: Cp roles: ('',) morph: ('s',) entity: ('p',)
Atom: ibm/Cp..s.o label: ibm type: Cp roles: ('',) morph: ('s',) entity: ('o',)
Atom: knew/Pd.sx:01.<f label: knew type: Pd roles: ('sx', '01

## 4. Visualizing semantic hypergraph

This code generates and displays an SVG visualization of a semantic hypergraph from a `graph` object using the `draw_hyper` function from the `semhyp.drawer` module. The idea of semantic hypergraph visualization is to represent the hypergraph as a tree. Representing a hypergraph as a tree involves representing a hyperedge as a tree whose root node is the first subedge, and whose subnodes are the remaining subedges. If a remaining subnode is not an atom or a coreference subedge whose subedges are not in the same sentence, the representation continues recursively.

Therefore, the first semantic hyperedge:

```
(but/J 
  (knew/Pd.sx:01.<f 
    patrick/Cp..s.p 
    (about/T 
    ('s/Bp ibm/Cp..s.o plans/Cc..p))) 
  (was/Pd.sc:12.<f 
    (+/Jc.rm.rp he/Ci patrick/Cp..s.p)
    (+/Br.am 
      ready/Ca 
      ((to/Mi.< ignore/P.-o:01.-i) 
        (+/Jc.rm.rp 
          he/Ci 
          patrick/Cp..s.p) 
        (+/Jc.rm.rc 
          them/Ci 
          ('s/Bp ibm/Cp..s.o plans/Cc..p))))))
```

is represented as 

```                    
           ------------but/J-------
          /                        \
       knew/P                 ---- was/P ------
      /      \               /                  \
 Patrick/C  about/T        +/J                 +/B
              |           /   \                /  \
            's/B        he/C [Patrick/C]  ready/C  (to/M ignore/P)
             /  \                                        |
        IBM/C   plans/C                                 +/J
                                                        /   \
                                                    them/C  ['s/B]
                                                            [/  \]
                                                        [IBM/C   plans/C]
```



In [9]:
FONT_SIZE = 16
svg = semhyp.draw_hyper(graph, font_height=FONT_SIZE, font_width=FONT_SIZE*0.7, margin=10)
HTML(svg)

This code dislays SVG visualization of one semantic hyperedge from `graph` object.

In [10]:
svg = semhyp.draw_hyper(graph[0], font_height=FONT_SIZE, font_width=FONT_SIZE*0.7, margin=10)
HTML(svg)