# CoDeLin Demo

Constituent and Dependency Linearization System usage as a python library.

## Constituent Parsing Linearization

In this example we encode a tree in bracketing format into a sequence of labels using the Naive Absolute Encoding. After that, we decode the labels back into our constituent tree.

In [3]:
from src.models.const_tree import C_Tree
from src.utils.constants import C_STRAT_FIRST
from src.encs.enc_const import *

print("\n[*] Original tree:")
original_tree = "(S (NP (DT The) (NNS owls)) (VP (VBP are) (RB not) (SBAR (WHNP (WP what)) (S (NP (PRP they)) (VP (VBP seem))))) (PUNCT .))"
c_tree = C_Tree.from_string(original_tree)
print(c_tree)

print("\n[*] Encoding:")
encoder = C_NaiveAbsoluteEncoding(separator="_", unary_joiner="+")
print(encoder)

print("\n[*] Linearized tree:")
lc_tree = encoder.encode(c_tree)
print(lc_tree)

print("\n[*] Decoded tree:")
c_tree = encoder.decode(lc_tree)
c_tree.postprocess_tree(conflict_strat=C_STRAT_FIRST,clean_nulls=True)
print(c_tree)


[*] Original tree:
(S (NP (DT The) (NNS owls)) (VP (VBP are) (RB not) (SBAR (WHNP (WP what)) (S (NP (PRP they)) (VP (VBP seem))))) (PUNCT .))

[*] Encoding:
Constituent Naive Absolute Encoding

[*] Linearized tree:
-BOS-	-BOS-	-BOS-
The	DT	2_NP
owls	NNS	1_S
are	VBP	2_VP
not	RB	2_VP
what	WP	3_SBAR_WHNP
they	PRP	4_S_NP
seem	VBP	1_S_VP
.	PUNCT	1_S
-EOS-	-EOS-	-EOS-


[*] Decoded tree:
(S (NP (DT The) (NNS owls)) (VP (VBP are) (RB not) (SBAR (WHNP (WP what)) (S (NP (PRP they)) (VP (VBP seem))))) (PUNCT .))


The following example deals with constituent trees with features embedded on the part of speech tags using Dynamic Encoding. The sample sentence is extracted from the SPMRL German treebank.

In [2]:
print("\n[*] Original tree:")
original_tree = "(PP (APPR-AC##lem=in|_## IM) (NN-NK##lem=Blick|case=dat|number=sg|gender=masc## BLICK))"
f_idx_dict={"lem":0,"case":1,"number":2,"gender":3}

c_tree = C_Tree.from_string(original_tree)
print(c_tree)

print("\n[*] Encoding:")
encoder = C_NaiveDynamicEncoding(separator="_", unary_joiner="+")
print(encoder)

print("\n[*] Linearized tree:")
lc_tree = encoder.encode(c_tree)
print(lc_tree.n)
print(lc_tree.to_string(f_idx_dict))

print("\n[*] Decoded tree:")
c_tree = encoder.decode(lc_tree)
c_tree.postprocess_tree(conflict_strat=C_STRAT_FIRST,clean_nulls=True)
print(c_tree)


[*] Original tree:
(PP (APPR-AC##lem=in|_## IM) (NN-NK##lem=Blick|case=dat|number=sg|gender=masc## BLICK))

[*] Encoding:
Constituent Naive Dynamic Encoding

[*] Linearized tree:
4
-BOS-	-BOS-	-BOS-	-BOS-	-BOS-	-BOS-	-BOS-	-BOS-
IM	APPR-AC	in	_	_	_	_	1*_PP
BLICK	NN-NK	Blick	dat	sg	masc	_	0*_PP
-EOS-	-EOS-	-EOS-	-EOS-	-EOS-	-EOS-	-EOS-	-EOS-


[*] Decoded tree:
(PP (APPR-AC IM) (NN-NK BLICK))


The following block of code will run all the available encodings for CoDeLin.

In [6]:
print("\n[*] Original tree:")
original_tree = "(S (NP (DT The) (NNS owls)) (VP (VBP are) (RB not) (SBAR (WHNP (WP what)) (S (NP (PRP they)) (VP (VBP seem))))) (PUNCT .))"
c_tree = C_Tree.from_string(original_tree)
print(c_tree)

print("\n[*] Linearized tree with Absolute Encoding:")
a_encoder = C_NaiveAbsoluteEncoding(separator="_", unary_joiner="+")
a_lc_tree = encoder.encode(c_tree)
print(a_lc_tree)

print("\n[*] Linearized tree with Relative Encoding:")
r_encoder = C_NaiveRelativeEncoding(separator="_", unary_joiner="+")
r_lc_tree = encoder.encode(c_tree)
print(r_lc_tree)

print("\n[*] Linearized tree with Dynamic Encoding:")
d_encoder = C_NaiveDynamicEncoding(separator="_", unary_joiner="+")
d_lc_tree = encoder.encode(c_tree)
print(d_lc_tree)

print("\n[*] Linearized tree with Incremental Encoding:")
i_encoder = C_NaiveIncrementalEncoding(separator="_", unary_joiner="+")
i_lc_tree = encoder.encode(c_tree)
print(i_lc_tree)


[*] Original tree:
(S (NP (DT The) (NNS owls)) (VP (VBP are) (RB not) (SBAR (WHNP (WP what)) (S (NP (PRP they)) (VP (VBP seem))))) (PUNCT .))

[*] Linearized tree with Absolute Encoding:
-BOS-	-BOS-	-BOS-
The	DT	2_NP
owls	NNS	1_S
are	VBP	2_VP
not	RB	2_VP
what	WP	3_SBAR_WHNP
they	PRP	4_S_NP
seem	VBP	1_S_VP
.	PUNCT	1_S
-EOS-	-EOS-	-EOS-


[*] Linearized tree with Relative Encoding:
-BOS-	-BOS-	-BOS-
The	DT	2_NP
owls	NNS	1_S
are	VBP	2_VP
not	RB	2_VP
what	WP	3_SBAR_WHNP
they	PRP	4_S_NP
seem	VBP	1_S_VP
.	PUNCT	1_S
-EOS-	-EOS-	-EOS-


[*] Linearized tree with Dynamic Encoding:
-BOS-	-BOS-	-BOS-
The	DT	2_NP
owls	NNS	1_S
are	VBP	2_VP
not	RB	2_VP
what	WP	3_SBAR_WHNP
they	PRP	4_S_NP
seem	VBP	1_S_VP
.	PUNCT	1_S
-EOS-	-EOS-	-EOS-


[*] Linearized tree with Incremental Encoding:
-BOS-	-BOS-	-BOS-
The	DT	2_NP
owls	NNS	1_S
are	VBP	2_VP
not	RB	2_VP
what	WP	3_SBAR_WHNP
they	PRP	4_S_NP
seem	VBP	1_S_VP
.	PUNCT	1_S
-EOS-	-EOS-	-EOS-

