In [3]:
from supar import Parser
import torch

In [4]:
torch.cuda.set_device('cuda:0')

In [4]:
parser = Parser.load('dep-biaffine-en')

In [5]:
dataset = parser.predict('I saw Sarah with a telescope.', lang='en', prob=True, verbose=False)

In [6]:
dataset

Dataset(n_sentences=1, n_batches=1, n_buckets=1)

In [7]:
dataset[0]

1	I	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Sarah	_	_	_	_	2	dobj	_	_
4	with	_	_	_	_	2	prep	_	_
5	a	_	_	_	_	6	det	_	_
6	telescope	_	_	_	_	4	pobj	_	_
7	.	_	_	_	_	2	punct	_	_

# Árbol de dependencias sintáctico con mayor F1

In [8]:
sin = Parser.load('dep-biaffine-roberta-en')
sin.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope', '.'], verbose=False)[0]

1	I	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Sarah	_	_	_	_	2	dobj	_	_
4	with	_	_	_	_	2	prep	_	_
5	a	_	_	_	_	6	det	_	_
6	telescope	_	_	_	_	4	pobj	_	_
7	.	_	_	_	_	2	punct	_	_

In your example, the sentence is “I saw Sarah with a telescope.”. The word “saw” is the root of the sentence. The word “I” is a nominal subject (nsubj) of “saw”, “Sarah” is a direct object (dobj) of “saw”, “with” is a preposition (prep) connected to “saw”, “a” is a determiner (det) of “telescope”, and “telescope” is the object of the preposition (pobj) “with”. The period is punctuation (punct) connected to “saw”. The HEAD column indicates the ID of the parent of each word. For example, “I” (ID 1) is a child of “saw” (ID 2), so its HEAD is 2.

Column 7 represents the position of the parent of the current node, if 0 means it is the root of the tree.

Each element in the sentence is classified according to Universal Stanford dependency relation to the HEAD:

    nsubj: Nominal subject
    obj: Object
    iobj: Indirect object
    csubj: Clausal subject
    ccomp: Clausal complement
    xcomp: Open clausal complement
    nmod: Nominal modifier
    advmod: Adverbial modifier
    amod: Adjectival modifier
    conj: Conjunct
    cc: Coordinating conjunction
    aux: Auxiliary
    cop: Copula
    det: Determiner
    clf: Classifier
    case: Case marking
    mark: Marker
    punct: Punctuation
    dep: Unspecified dependency
    root: initial node in the tree

# Árbol de dependencias semántico con mayor F1

In [2]:
sem = Parser.load('sdp-vi-roberta-en')

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
sem.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope', '.'], verbose=False)[0]

1	I	_	_	_	_	_	_	_	_
2	saw	_	_	_	_	_	_	_	_
3	Sarah	_	_	_	_	_	_	_	_
4	with	_	_	_	_	_	_	_	_
5	a	_	_	_	_	_	_	_	_
6	telescope	_	_	_	_	_	_	_	_
7	.	_	_	_	_	_	_	_	_

Parece ser que el modelo que utiliza roberta no funciona, nos salta un mensaje de que sería conveniente entrenarlo nosotros mismos.

In [3]:
sem = Parser.load('sdp-biaffine-en')

In [9]:
sem.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope','.'], verbose=False)[0]

[2024-03-04 09:26:15 INFO] Loading the data
[2024-03-04 09:26:16 INFO] 
Dataset(n_sentences=1, n_batches=1, n_buckets=1)
[2024-03-04 09:26:16 INFO] Making predictions on the data
[2024-03-04 09:26:16 INFO] 0:00:00.011046s elapsed, 724.24 Tokens/s, 90.53 Sents/s


1	I	_	_	_	_	_	_	2:ARG1	_
2	saw	_	_	_	_	_	_	0:root	_
3	Sarah	_	_	_	_	_	_	2:ARG2	_
4	with	_	_	_	_	_	_	_	_
5	a	_	_	_	_	_	_	_	_
6	telescope	_	_	_	_	_	_	4:ARG2|5:BV	_
7	.	_	_	_	_	_	_	_	_

In [3]:
sem = Parser.load('sdp-vi-en')

Downloading https://github.com/yzhangcs/parser/releases/download/v1.1.0/dm.vi.sdp.lstm.tag-char-lemma.zip to /root/.cache/supar/dm.vi.sdp.lstm.tag-char-lemma.zip
100%|██████████| 497M/497M [00:44<00:00, 11.7MB/s] 


In [4]:
sem.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope','.'], verbose=False)[0]

1	I	_	_	_	_	_	_	2:ARG1	_
2	saw	_	_	_	_	_	_	0:root|2:ARG1	_
3	Sarah	_	_	_	_	_	_	2:ARG2	_
4	with	_	_	_	_	_	_	_	_
5	a	_	_	_	_	_	_	_	_
6	telescope	_	_	_	_	_	_	4:ARG2|5:BV	_
7	.	_	_	_	_	_	_	_	_

Semantic Roles: These are the roles that a predicate and its associated arguments can take on in a sentence. The format is HEAD:ROLE, where HEAD is the ID of the head word and ROLE is the semantic role. For example, 0:root indicates that the current word is the root of the sentence, 4:ARG1 indicates that the current word is the first argument (ARG1) of the word with ID 4, and so on.

In your example, the sentence is “I saw Sarah with a telescope.”. The word “saw” is the root of the sentence. The word “I” is the first argument (ARG1) of “saw”, “Sarah” is the second argument (ARG2) of “saw”, and “telescope” is the second argument (ARG2) of “with” and bears a variable (BV) related to “a”.

De entre los tres modelos de sdp que tenemos disponibles, RoBERTa no funciona bien, y los otros dos dan resultados muy similares, aunque tenemos alguna desconexión importante en el arbol, ya que "telescope" no está ligado a ningún sujeto o verbo, lo cual puede considerarse un error. No obstante, si se combina el árbol semántico con el sintáctico, este error no será tan notorio, ya que la relación queda implícita a nivel sintáctico.

# Árbol de constituencia con mayor F1

In [5]:
con = Parser.load('con-crf-roberta-en')

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
con.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope', '.'], verbose=False)[0].pretty_print()

              TOP                       
               |                         
               S                        
  _____________|______________________   
 |             VP                     | 
 |    _________|____                  |  
 |   |    |         PP                | 
 |   |    |     ____|___              |  
 NP  |    NP   |        NP            | 
 |   |    |    |     ___|______       |  
 _   _    _    _    _          _      _ 
 |   |    |    |    |          |      |  
 I  saw Sarah with  a      telescope  . 



In [7]:
con.predict(['I', 'saw', 'Sarah', 'with', 'a', 'telescope', '.'], verbose=False)[0]

(TOP (S (NP (_ I)) (VP (_ saw) (NP (_ Sarah)) (PP (_ with) (NP (_ a) (_ telescope)))) (_ .)))

El árbol de constituencia parece ser que se representa bastante bien con RoBERTa, con la información del árbol representada en el segundo formato podemos formar un grafo de constituencia. 