# DiaParser
**Direct Attentive Dependency Parser**

In [1]:
from diaparser.parsers import Parser

I1021 12:54:27.761593 140104082409280 file_utils.py:39] PyTorch version 1.5.0 available.
I1021 12:54:29.640621 140104082409280 file_utils.py:55] TensorFlow version 2.4.0-dev20200826 available.


### Create a parser
Load a pretrained model for English, named `en_ewt.electra-base`, i.e. a parser trained on the English EWT treebank, using the transformner model `electra-base-disciminator`.

The model will be downloaded anc cached locally for further use.

In [5]:
parser = Parser.load('en_ewt.electra-base')

2020-10-21 15:26:27 INFO: loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/config.json from cache at /home/attardi/.cache/torch/transformers/9236d197566a7f1be2b2151f5afcc5a8e17f31e1e23c52f3cdf2340019986e78.88ba6e8e7d5a7936e86d6f2551fe19c236dc57c24da163907cd0544e9933f6ee
2020-10-21 15:26:27 INFO: Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "embedding_size": 768,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_attentions": true,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

2020-10-21 15:26:28 INFO: loading weights file https://cdn.huggingface.co/google/el

You may parse plain text, by telling the language used: 

In [None]:
dataset = parser.predict('She enjoys playing tennis.', text='en')

`dataset` is an instance of `diaparser.utils.Dataset` containing the predicted syntactic trees.

Let's look at the first one:

In [7]:
dataset.sentences[0]

1	She	_	_	_	_	2	nsubj	_	_
2	enjoys	_	_	_	_	0	root	_	_
3	playing	_	_	_	_	2	xcomp	_	_
4	tennis	_	_	_	_	3	obj	_	_
5	.	_	_	_	_	2	punct	_	_

Or you can provide tokenized text, as weel ask to see the estimated probabiity for each predicted arc:

In [6]:
dataset = parser.predict('She', 'enjoys', 'playing', 'tennis', '.']], prob=True, verbose=False)

You may then look at individual fields of the tokens in a sentence and the probability of their arcs.

In [9]:
import torch
print(f"arcs:  {dataset.arcs[0]}\n"
      f"rels:  {dataset.rels[0]}\n"
      f"probs: {dataset.probs[0].gather(1,torch.tensor(dataset.arcs[0]).unsqueeze(1)).squeeze(-1)}")

arcs:  [2, 0, 2, 3, 2]
rels:  ['nsubj', 'root', 'xcomp', 'obj', 'punct']
probs: tensor([1.0000, 1.0000, 1.0000, 1.0000, 0.9999])
