# DiaParser
**Direct Attentive Dependency Parser**

In [1]:
from diaparser.parsers import Parser

### Create a parser
Load a pretrained model for English, named `en_ewt.electra-base`, i.e. a parser trained on the English EWT treebank, using the transformner model `electra-base-disciminator`.

The model will be downloaded anc cached locally for further use.

In [2]:
parser = Parser.load('en_ewt.electra-base')

Using bos_token, but it is not set yet.
Using eos_token, but it is not set yet.


You may parse plain text, by telling the language used: 

In [3]:
dataset = parser.predict('She enjoys playing tennis.', text='en')

`dataset` is an instance of `diaparser.utils.Dataset` containing the predicted syntactic trees.

Let's look at the first one:

In [4]:
dataset.sentences[0]

# sent_id = 1
# text = She enjoys playing tennis.
1	She	_	_	_	_	2	nsubj	_	_
2	enjoys	_	_	_	_	0	root	_	_
3	playing	_	_	_	_	2	xcomp	_	_
4	tennis	_	_	_	_	3	obj	_	_
5	.	_	_	_	_	2	punct	_	_

## Display parse tree

In [5]:
from spacy import displacy

In [6]:
sent = dataset.sentences[0]
displacy.render(sent.to_json(), style='dep', manual=True, options={'compact': True, 'distance': 120, 'word_spacing': 20})

## Parse from tokenized text

Or you can provide tokenized text, as weel ask to see the estimated probabiity for each predicted arc:

In [8]:
dataset = parser.predict(['She', 'enjoys', 'playing', 'tennis', '.'], prob=True)

You may then look at individual fields of the tokens in a sentence and the probability of their arcs.

In [9]:
import torch
print(f"arcs:  {dataset.arcs[0]}\n"
      f"rels:  {dataset.rels[0]}\n"
      f"probs: {dataset.probs[0].gather(1,torch.tensor(dataset.arcs[0]).unsqueeze(1)).squeeze(-1)}")

arcs:  [2, 0, 2, 3, 2]
rels:  ['nsubj', 'root', 'xcomp', 'obj', 'punct']
probs: tensor([1.0000, 1.0000, 1.0000, 1.0000, 0.9999])
