# Dependency Grammars with NLTK


## Objectives

- Understanding:
    - Dependency Relations and Grammars
    - Probabilistic Dependency Grammars
    - Projective and Non-Projective Parses
    - Transition-based Dependency Parsing

- Learning how to:
    - define dependency grammar in NLTK
    - identify a syntactic relation between Head and Dependent
    - parse with dependency grammar
    - evaluate dependency parser
    - use dependency parser of spacy and stanza

### Recommended Reading
- Dan Jurafsky and James H. Martin. [__Speech and Language Processing__ (SLP)](https://web.stanford.edu/~jurafsky/slp3/) (3rd ed. draft)
- Steven Bird, Ewan Klein, and Edward Loper. [__Natural Language Processing with Python__ (NLTK)](https://www.nltk.org/book/)
- Kübler, McDonald, and Nivre (2009) Dependency Parsing. 

### Covered Material
- SLP
    - [Chapter 18: Dependency Parsing](https://web.stanford.edu/~jurafsky/slp3/18.pdf) 
- NLTK 
    - [Chapter 8: Analyzing Sentence Structure](https://www.nltk.org/book/ch08.html)
- Kübler, McDonald, and Nivre (2009) Dependency Parsing.

### Requirements

- [NLTK](https://www.nltk.org/)
    - run `pip install nltk`
- [spaCy](https://spacy.io/)
    - run `pip install spacy`
    - run `python -m spacy download en_core_web_sm` to install English models
- [stanza](https://stanfordnlp.github.io/stanza/) for Stanford Parser
    - run `pip install stanza`
    - run `stanza.download('en')` to intall English models
    

## 1. Dependency Grammars

Unlike Constituency (Phrase Structure) Grammar that addresses how words and sequences of words combine to form constituents, Dependency Grammar addresses on how words relate to each other. 

Dependency Grammar assumes that syntactic structure consists of words linked by binary, asymmetrical relations called __dependency relations__. A dependency relation is a binary asymmetric relation that holds between a syntactically subordinate word, called the __dependent__, and another word on which it depends, called the __head__.

The __head of a sentence__ is usually taken to be the tensed verb, and every other word is either dependent on the sentence head, or connects to it through a path of dependencies. Thus, a dependency parse is a __directed graph__, where the nodes are the lexical items (words) and the arcs represent dependency relations from heads to dependents. 

A __typed dependency structure__ contains of the __labeled__ arcs are drawn from a fixed inventory of grammatical relations, that also includes a __root node__ that explicitly marks the root of the tree, the head of the entire structure.

### 1.1. Dependency Relation Types

Universal Dependency set defines the following __core__ relations (from Jurafsky & Martin). 

(See https://universaldependencies.org/u/dep/index.html for the full set.)

| __Clausal Argument Relations__ | Description | Example |
|:-------------------------------|:------------|:--------
| NSUBJ  | Nominal subject       | __We__ booked her the cheapest morning flight to Miami.
| DOBJ   | Direct object         | We booked her the cheapest morning __flight__ to Miami.
| IOBJ   | Indirect object       | We booked __her__ the cheapest morning flight to Miami.
| CCOMP  | Clausal complement    |
| XCOMP  | Open clausal complement (subject of clause is out of its span) 
| __Nominal Modifier Relations__ ||
| NMOD   | Nominal modifier      | We booked her the cheapest __morning__ flight to Miami.
| AMOD   | Adjectival modifier   | We booked her the __cheapest__ morning flight to Miami.
| NUMMOD | Numeric modifier
| APPOS  | Appositional modifier
| DET    | Determiner            | We booked her __the__ cheapest morning flight to Miami.
| CASE   | Prepositions, postpositions and other case markers | We booked her the cheapest morning flight __to__ Miami.
| __Other Notable Relations__ | 
| CONJ   | Conjunct
| CC     | Coordinating conjunction


### 1.2. Defining Dependency Grammar in NLTK

Similar to Phrase Structure Grammar, Dependecy Grammar is defined as a list of production rules.

Below is an example grammar that defines only __bare__ dependency relations without specifying their types.

In [None]:
import nltk
s_bold = '\033[1m'
e_bold = '\033[0m'

# for sentence "i saw the man with the telescope"
# only string input is accepted
rules = """
    'saw' -> 'i' | 'man' | 'with'
    'man' ->  'telescope' | 'the' | 'with'
    'telescope' -> 'the' | 'with' | 'a'
"""


toy_grammar = nltk.DependencyGrammar.fromstring(rules)

print(toy_grammar)

Unlike Phrase Structure Grammar, 

- there is no start symbol (thus, no method to access it)
- there is no method to access productions, but it is still possible using the attribute

    - `grammar._productions`

- there is a method to check if grammar contains a production

    - `grammar.contains(head, mod)`


__Dependency Production__ has 2 attributes:

- `_lhs` (left-hand side) -- head
- `_rhs` (right-hand side) -- modifier

In [None]:
print(toy_grammar._productions)

for production in toy_grammar._productions:
    print(production._lhs, production._rhs)

print(toy_grammar.contains('man', 'the'))  # True
print(toy_grammar.contains('the', 'man'))  # False

#### How to Identify a Syntactic Relation between Head and Dependent
(From Kübler et al. & NLTK Book)

Here is a list of some of the more common criteria that have been proposed for identifying a syntactic relation between a head __H__ and a dependent __D__ in a linguistic construction __C__:

1. __H__ determines the syntactic category of __C__ and can often replace __C__.
2. __H__ determines the semantic category of __C__; __D__ gives semantic specification.
3. __H__ is obligatory; __D__ may be optional.
4. __H__ selects __D__ and determines whether __D__ is obligatory or optional. 
5. The form of __D__ depends on __H__ (agreement or government).
6. The linear position of __D__ is specified with reference to __H__ .


__Example__:

_I prefer a morning flight_

- **C**: _morning flight_
- **H**: _flight_
    - determines syntactic category: whole construction is nominal
    - determines semantic category
    - comes after _morning_ (English is head final)
- **D**: _morning_
    - optional w.r.t. _flight_

### 1.3. Parsing with Dependency Grammar

Since Dependency Graphs can be projective and non-projective (allow crossing dependencies), there are __projective__ and __non-projective__ parsers. 


#### 1.3.1. Projective Dependency Parser (Rule-based)
>**Definition**:
A dependecy tree is projective if **all the arcs of the tree are projective**. An arc from a head to a dependent is said to be projective projective <mark style="background-color: rgba(0, 255, 0, 0.2)"> if there is a path from the head to every word that lies between the head and the dependent </mark> in the sentence. *(Dan Jurafsky and James H. Martin, 2022)*

> **NLTK**: A projective, rule-based, dependency parser. A [`ProjectiveDependencyParser`](http://www.nltk.org/api/nltk.parse.html#module-nltk.parse.projectivedependencyparser) is created with a `DependencyGrammar`, a set of productions specifying word-to-word dependency relations. The `parse()` method will then return the set of all parses, in tree representation, for a given input sequence of tokens. 
`parse()` method returns iterator over [`Tree`](http://www.nltk.org/_modules/nltk/tree.html) objects.






In [None]:
parser = nltk.ProjectiveDependencyParser(toy_grammar)

sent = "i saw the man with a telescope"

for tree in parser.parse(sent.split()):
    print(tree.pretty_print(unicodelines=True, nodedist=4))
    # print ROOT node
    print("The ROOT is '{}'".format(s_bold + tree.label() + e_bold))

#### 1.3.2. Non-Projective Dependency Parser (Rule-Based)

>**Definition**:
A dependecy tree is projective if **one or more arcs of the tree are non-projective**. An arc non-projective  when there some words that lies between the head and the dependent in the sentence do not have a path from the head.


> A non-projective, rule-based, dependency parser. This parser will return the set of all possible non-projective parses based on the word-to-word relations defined in the parser’s dependency grammar, and <mark style="background-color: rgba(0, 255, 0, 0.2)"> will allow the branches of the parse tree to cross</mark> in order to capture a variety of linguistic phenomena that a projective parser will not .

`parse()` method returns iterator over [`DependencyGraph`](https://www.nltk.org/api/nltk.parse.html#nltk.parse.dependencygraph.DependencyGraph) objects. 

`tree()` method of the `DependencyGraph` object builds a dependency tree using the NLTK Tree constructor, starting with the `root` node and omitting labels.

![image.png](https://i.postimg.cc/hvQDRNDg/Screenshot-2022-12-19-at-17-22-53.png)


Flight -> was **is not projective**, since *this morning* is not reachable from *flight*

In [None]:
np_parser = nltk.NonprojectiveDependencyParser(toy_grammar)

for graph in np_parser.parse(sent.split()):
    graph.tree().pretty_print(unicodelines=True, nodedist=4)
    print("The ROOT is '{}'".format(s_bold + graph.root['word'] + e_bold))


Since the sentence is ambiguous, similar to Phrase Structure Grammar, our Dependency Grammar yields 2 parses.

#### 1.3.3. Accessing the Graph

- `DependencyGraph` object has 2 attrubutes
    - nodes (of `defaultdict` type)
    - root (of `dict` type), which is also a node

- Each node in a graph is represented as a dict that defines its:
    - address (sentence index starting from 1) -- required
    - word (string form) -- required
    - head (address)
    - deps (dependents)
    - rel (dependency relation to head)

Thus, we can print the graph as a list of tokens with their attributes.

In [None]:
# printing root address and word
for graph in np_parser.parse(sent.split()): 
    print(graph.root['address'], graph.root['word'])

In [None]:
# printing all the nodes with dependent positions
for graph in np_parser.parse(sent.split()):    
    # sorting is required since graph starts from root, which is not the first token
    for _, node in sorted(graph.nodes.items()):
        if node['word'] is not None:
            print('{address}\t{word}:\t{dependents}'.format(dependents=node['deps'][''], **node))
    break  # just to print 1 graph

It is also possible to convert the graph into other supported formats, such as CoNLL using `to_conll(style)` method, where [style](https://www.nltk.org/api/nltk.parse.html#nltk.parse.dependencygraph.DependencyGraph) is either 3, 4, or 10. 

In [None]:
for graph in np_parser.parse(sent.split()):    
    print(graph.to_conll(3))
    break  # just to print 1 graph

### Exercise

- Define grammar that covers the following sentences.

    - show flights from new york to los angeles
    - list flights from new york to los angeles
    - show flights from new york
    - list flights to los angeles
    - list flights
    
- Use one of the parsers to parse the sentences (i.e. test your grammar)


In [None]:
# As reference  
# rules = """
#     'saw' -> 'i' | 'man' | 'with'
#     'man' ->  'telescope' | 'the' | 'with'
#     'telescope' -> 'the' | 'with' | 'a'
# """

sentences = ['show flights from new york to los angeles', 
             'show flights from los angeles to new york'
             'list flights from new york to los angeles',
            'show flights from new york', 
             'list flights to los angeles', 
             'list flights']
rules = """
    'show' -> 'flights' 
    'list' -> 'flights'
    'flights'  -> 'york' | 'angeles'
    'york' -> 'new' | 'from' | 'to'
    'angeles' -> 'los' | 'from' | 'to'
"""

toy_grammar = nltk.DependencyGrammar.fromstring(rules)

np_parser = nltk.ProjectiveDependencyParser(toy_grammar)
for sent in sentences:
    for graph in np_parser.parse(sent.split()):
        print("Sentence:", sent)

        if type(graph) != nltk.tree.Tree:
            graph.tree().pretty_print(unicodelines=True, nodedist=4)
            print("The ROOT is '{}'".format(s_bold + graph.root['word'] + e_bold), '\n')
        else:
            graph.pretty_print(unicodelines=True, nodedist=4)
            print("The ROOT is '{}'".format(s_bold + graph.label() + e_bold), '\n')
        

## 2. Probabilistic Dependency Grammars & Parsing


Similar to CFGs, we can learn dependency grammar from data using treebanks.
NLTK provides `ProbabilisticProjectiveDependencyParser` that returns the most probable projective parse derived from the probabilistic dependency grammar derived from the `train()` method. 

> The probabilistic model is an implementation of Eisner's (1996) Model C, which conditions on head-word, head-tag, child-word, and child-tag. The decoding uses a bottom-up chart-based span concatenation algorithm that's identical to the one utilized by the rule-based projective parser.

Without going into details, that is an example of Dynamic Programming approach to Dependency Parsing.

In [None]:
# downloading treebank
import nltk
nltk.download('dependency_treebank')

In [None]:
# example from NLTK
from nltk.parse.dependencygraph import DependencyGraph
from nltk.parse import ProbabilisticProjectiveDependencyParser
from nltk.corpus import dependency_treebank

# print dependency graph in CoNLL format
print(dependency_treebank.parsed_sents()[0].to_conll(10))

ppdp = ProbabilisticProjectiveDependencyParser()

# train parser on graphs
ppdp.train(dependency_treebank.parsed_sents())

In [None]:
# parse the sentence
parse = ppdp.parse(['the', 'price', 'of', 'the', 'stock', 'fell'])

# returns set of trees ordered by probability score
for tree in parse:
    print(tree)


#### Exercise

Write a function that given a dependency graph, for each token (word), produces list of words from it to ROOT.

(Construct normal `dict` for simplicity first.)

In [None]:
from pprint import pprint
# With .nodes we get a dict
dg_tree = dependency_treebank.parsed_sents()[0].tree()
dg = dependency_treebank.parsed_sents()[0].nodes
# Let's print to see what it contains
#pprint(dg)
print(dg_tree.pretty_print(unicodelines=True, nodedist=4))

def go_to_root(token, head, dg):
    end = False
    path = [token]
    if head == 0:
        return path
    while not end:
        next_token = dg[head]['word']
        head = dg[head]['head']
        path.append(next_token)
        if head == 0:
            end = True
    return path

for k, v in sorted(dg.items()):
    if k != 0:
        print(v['word'],":" ,go_to_root(v['word'], v['head'], dg))
        
    

## 3. Transition-Based Dependency Parsing

There are several methods for data-driven dependency parsing
- Dynamic Programming-based (e.g. `ProbabilisticProjectiveDependencyParser`)
- Graph-based (e.g. [Minimum Spanning Tree Parser](https://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html))
- Transition-Based Dependency Parsing (e.g. NLTK's interface to [MaltParser](https://www.nltk.org/_modules/nltk/parse/malt.html))

Transition-based parsing (or "deterministic dependency parsing") proved to be very effective and is the State-of-the-Art approach (with neural twist).

### Algorithm

(from Jurafsky & Martin)

In transition-based parsing there is:
- a **stack** on which we build the parse
- a **buffer** of tokens to be parsed
- a **parser** which takes actions on the parse via a predictor called an **oracle**

The parser walks through the sentence left-to-right, successively shifting items from the buffer onto the stack. At each time point we examine the top two elements on the stack, and the oracle makes a decision about what transition to apply to build the parse.

Arc-Standard Transition System (there are alternatives, i.e. Arc Eager) defines 3 transition operators that will operate on the top two elements of the stack:

- __LEFTARC__: 
    - Assert a head-dependent relation between the word at the top of the stack and the word directly beneath it;
    - Remove the lower word from the stack.
- __RIGHTARC__: 
    - Assert a head-dependent relation between the second word on the stack and the word at the top; 
    - Remove the word at the top of the stack;
- __SHIFT__: 
    - Remove the word from the front of the input buffer and push it onto the stack.

Naturally fits into Machine Learning framework where 
- configuration (buffer & stack) are features, 
- operations are labels
- oracle is a classifier

### Transition Parser in NLTK

In [None]:
from nltk.parse.transitionparser import TransitionParser

tp = TransitionParser('arc-standard')
tp.train(dependency_treebank.parsed_sents()[:100], 'tp.model')
print(tp)

In [None]:
# parsing takes a list of dependency graphs and a model as arguments
parses = tp.parse(dependency_treebank.parsed_sents()[-10:], 'tp.model')
print(len(parses))
print(parses[0])

## 4. Evaluation of Dependency Parsing

Dependency Parsing performance is evaluated as __labeled__ and __unlabeled attachment scores__ which are calculated as 

$$ UAS/LAS = \frac{\text{# of corrent dependency relations}}{\text{# of dependency relations}}$$

the difference between the two is whether the relation labels are considered or not.

NLTK provides `DependencyEvaluator` class to perform the evaluation, that takes predicted and reference parses as arguments. The evaluation ignores punctuation.

In [None]:
from nltk.parse import DependencyEvaluator

de = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-10:])
las, uas = de.eval()

# no labels, thus identical
print(las)
print(uas)

### Exercise
- Train `arc-standard` and `arc-eager` transition parsers on the same portion (slightly bigger than 100, otherwise it takes a lot of time)
- Evaluate both of them comparing the attachment scores

In [None]:
tp = TransitionParser('arc-standard')
tp.train(dependency_treebank.parsed_sents()[:100], 'tp.model', verbose=False)

parses = tp.parse(dependency_treebank.parsed_sents()[-150:], 'tp.model')
print(dependency_treebank.parsed_sents()[-1:])
de = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-150:])
las, uas = de.eval()

# no labels, thus identical
print(las)
print(uas)

tp = TransitionParser('arc-eager')
tp.train(dependency_treebank.parsed_sents()[:100], 'tp.model', verbose=False)

parses = tp.parse(dependency_treebank.parsed_sents()[-150:], 'tp.model')
de = DependencyEvaluator(parses, dependency_treebank.parsed_sents()[-150:])
las, uas = de.eval()

# no labels, thus identical
print(las)
print(uas)

In [None]:
parses = tp.parse(dependency_treebank.parsed_sents()[-1:], 'tp.model')
parses[0].tree().pprint()

## 5. Dependency Parsing with Stanza and Spacy

Both spaCy and Stanza (python package Stanford NLP Tools) provide pre-trained dependency parsing models.
The libraries are quite similar in usage.

- initialize pipeline (with other processing steps such as tokenization, POS-tagging)
- process a sentence 
- iterate over tokens accessing dependency parsing attributes

### 5.1. Stanford Dependency Parser

[paper on stanza](https://arxiv.org/pdf/2003.07082.pdf)

A neural graph-based dependency parser. 
[paper on parser](https://nlp.stanford.edu/pubs/dozat2017deep.pdf)

> We implement a Bi-LSTM-based deep biaffine neural dependency parser (Dozat and Manning, 2017). We
further augment this model with two linguistically motivated features: one that predicts the linearization order of two words in a given language, and the other that predicts the typical distance in linear order between them.



### 5.2. Spacy Dependency Parser

> A transition-based dependency parser component. The dependency parser jointly learns sentence segmentation and labelled dependency parsing, and can optionally learn to merge tokens that had been over-segmented by the tokenizer. The parser uses a variant of the non-monotonic arc-eager transition-system described by Honnibal and Johnson (2014), with the addition of a "break" transition to perform the sentence segmentation. Nivre (2005)’s pseudo-projective dependency transformation is used to allow the parser to predict non-projective parses.

> The parser is trained using an imitation learning objective. It follows the actions predicted by the current weights, and at each state, determines which actions are compatible with the optimal parse that could be reached from the current state. The weights are updated such that the scores assigned to the set of optimal actions is increased, while scores assigned to other actions are decreased. Note that more than one action may be optimal for a given state.

In [None]:
example = 'I saw the man with a telescope.'

In [None]:
# stanza example
import stanza

# Download the stanza model if necessary
# stanza.download("en")

stanza_nlp = stanza.Pipeline(lang='en', verbose=False)
stanza_doc = stanza_nlp(example)

for sent in stanza_doc.sentences:
    for word in sent.words:
        print("{}\t{}\t{}\t{}".format(word.id, word.text, word.head, word.deprel))


In [None]:
# spacy example
import spacy

# spacy_nlp = spacy.load("en-core-web-sm")

# un-comment the lines below, if you get 'ModuleNotFoundError'

import en_core_web_sm

spacy_nlp = en_core_web_sm.load()

spacy_doc = spacy_nlp(example)

for sent in spacy_doc.sents:
    for token in sent:
        print("{}\t{}\t{}\t{}".format(token.i, token.text, token.head, token.dep_))

## Lab Exercise


- Parse 100 last sentences from dependency treebank using `spacy` and `stanza`
    - are the depedency tags of spacy the same of stanza?
- Evaluate against the ground truth the parses using DependencyEvaluator
    - print LAS and UAS for each parser

**BUT!** To evaluate the parsers, the sentences parsed by spacy and stanza have to be [`DependencyGraph`](https://www.nltk.org/_modules/nltk/parse/dependencygraph.html) objects.  To do this , you have to covert the output of the spacy/stanza to [ConLL](https://universaldependencies.org/format.html) formant, from this format extract the columns following the [Malt-Tab](https://cl.lingfil.uu.se/~nivre/research/MaltXML.html) format and finally convert the resulting string into a DependecyGraph. Lucky for you there is a library that gets the job done.  You have to install the library [spacy_conll](https://github.com/BramVanroy/spacy_conll) and use and adapt to your needs the code that you can find below.


In [1]:
from nltk.corpus import dependency_treebank

# get the last 100 sentences
data = dependency_treebank.sents()[-100:]
tagged_data = dependency_treebank.parsed_sents()[-100:]
print(data)
print(tagged_data)

[['The', 'Army', 'Corps', 'is', 'cutting', 'the', 'flow', 'of', 'the', 'Missouri', 'River', 'about', 'two', 'weeks', 'earlier', 'than', 'normal', 'because', 'of', 'low', 'water', 'levels', 'in', 'the', 'reservoirs', 'that', 'feed', 'it', '.'], ['Barge', 'rates', 'on', 'the', 'Mississippi', 'River', 'sank', 'yesterday', 'on', 'speculation', 'that', 'widespread', 'rain', 'this', 'week', 'in', 'the', 'Midwest', 'might', 'temporarily', 'alleviate', 'the', 'situation', '.'], ...]
[<DependencyGraph with 30 nodes>, <DependencyGraph with 25 nodes>, <DependencyGraph with 17 nodes>, <DependencyGraph with 42 nodes>, <DependencyGraph with 19 nodes>, <DependencyGraph with 30 nodes>, <DependencyGraph with 28 nodes>, <DependencyGraph with 29 nodes>, <DependencyGraph with 29 nodes>, <DependencyGraph with 34 nodes>, <DependencyGraph with 16 nodes>, <DependencyGraph with 31 nodes>, <DependencyGraph with 22 nodes>, <DependencyGraph with 34 nodes>, <DependencyGraph with 15 nodes>, <DependencyGraph with 38

In [2]:
# Spacy version 
from nltk.parse.dependencygraph import DependencyGraph
from spacy.tokenizer import Tokenizer
import spacy 

# Load the spacy model
nlp = spacy.load("en_core_web_sm")

# Set up the conll formatter 
config = {"ext_names": {"conll_pd": "pandas"},
          "conversion_maps": {"DEPREL": {"nsubj": "subj"}}}

# Add the formatter to the pipeline
nlp.add_pipe("conll_formatter", config=config, last=True)
# Split by white space
nlp.tokenizer = Tokenizer(nlp.vocab)  

spacy_graphs = []

for sentence in data:
    # Join the words to a sentence
    sentence = " ".join(sentence)
    # Parse the sentence
    doc = nlp(sentence)
    # Convert doc to a pandas object
    df = doc._.pandas
    # Select the columns accoroding to Malt-Tab format
    tmp = df[["FORM", 'XPOS', 'HEAD', 'DEPREL']].to_string(header=False, index=False)
    print(tmp)
    # Get finally our the DepencecyGraph
    dp = DependencyGraph(tmp)
    spacy_graphs.append(dp)


       The  DT  3      det
      Army NNP  3 compound
     Corps NNP  5     subj
        is VBZ  5      aux
   cutting VBG  0     ROOT
       the  DT  7      det
      flow  NN  5     dobj
        of  IN  7     prep
       the  DT 11      det
  Missouri NNP 11 compound
     River NNP  8     pobj
     about  RB 13   advmod
       two  CD 14   nummod
     weeks NNS 15 npadvmod
   earlier RBR  5   advmod
      than  IN 15     prep
    normal  JJ 16     amod
   because  IN  5     prep
        of  IN 18    pcomp
       low  JJ 22     amod
     water  NN 22 compound
    levels NNS 18     pobj
        in  IN 22     prep
       the  DT 25      det
reservoirs NNS 23     pobj
      that WDT 27     subj
      feed VBP 25    relcl
        it PRP 27     dobj
         .   .  5    punct
      Barge  NN  2 compound
      rates NNS  7     subj
         on  IN  2     prep
        the  DT  6      det
Mississippi NNP  6 compound
      River NNP  3     pobj
       sank VBD  0     ROOT
  yesterday  NN  7 np

In [4]:
from nltk.parse import DependencyEvaluator

de = DependencyEvaluator(spacy_graphs, tagged_data)
las, uas = de.eval()

print(las)
print(uas)

0.0
0.6926873037236428


In [2]:
# Stanza
import stanza
import spacy_stanza

# Download the stanza model if necessary
stanza.download("en")

# Set up the conll formatter 
#tokenize_pretokenized used to tokenize by white space 
nlp = spacy_stanza.load_pipeline("en", verbose=False)

config = {"ext_names": {"conll_pd": "pandas"},
          "conversion_maps": {"DEPREL": {"nsubj": "subj", "root":"ROOT"}}}

# Add the formatter to the pipeline
nlp.add_pipe("conll_formatter", config=config, last=True)

# Parse the sentence
doc = nlp("Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .")
# Convert doc to a pandas object
df = doc._.pandas

# Select the columns accoroding to Malt-Tab format
tmp = df[["form", 'xpostag', 'head', 'deprel']].to_string(header=False, index=False)

# See the outcome
print(tmp)

# Get finally our the DepencecyGraph
dp = DependencyGraph(tmp)
print('Tree:')
dp.tree().pretty_print(unicodelines=True, nodedist=4)

ValueError: libcublas.so.*[0-9] not found in the system path ['/home/dhilab-mattia/Desktop/uni/nlu-labs', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '', '/home/dhilab-mattia/.cache/pypoetry/virtualenvs/nlu-labs-dssUpVH1-py3.10/lib/python3.10/site-packages']