Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree visualization #15

Closed
arademaker opened this issue Feb 11, 2017 · 3 comments
Closed

tree visualization #15

arademaker opened this issue Feb 11, 2017 · 3 comments
Assignees

Comments

@arademaker
Copy link
Contributor

arademaker commented Feb 11, 2017

https://github.com/udapi/udapi-python/blob/master/udapi/block/write/textmodetrees.py#L20

We need some similar feature for printing a sentence as a tree. This trees are very useful for visualize the data:

─┮
 ╰─┮ Gosto VERB root
   │ ╭─╼ de ADP mark
   ├─┾ levar VERB xcomp
   │ │ ╭─╼ a ADP case
   │ ├─┶ sério NOUN xcomp
   │ │ ╭─╼ o DET det
   │ │ ├─╼ meu DET det
   │ ╰─┾ papel NOUN obj
   │   │ ╭─╼ de ADP case
   │   ╰─┾ consultor NOUN nmod
   │     ╰─╼ encartado VERB acl
   ╰─╼ . PUNCT punct

alternatives:

convert conllu to tex and compile it

 udapy write.Tikz attributes=form,lemma,upos < my.conllu > my.tex

If needed I can add more features to
https://github.com/udapi/udapi-python/blob/master/udapi/block/write/tikz.py
e.g. printing multiword tokens and some default colors.
Of course, for camera-ready pictures a bit of manual fine-tuning of the layout will be needed.

You can try also

udapy write.TextModeTrees color=1 < my.conllu | less -R

output above.

There is a button for SVG export and you can use
inkscape -D -z --file=image.svg --export-pdf=image.pdf --export-latex
to export it to pdf and tex:
\begin{figure}
\centering
\def\svgwidth{\columnwidth}
\input{image.pdf_tex}
\end{figure}

Can I outout to LaTeX the
second command :

udapy write.TextModeTrees color=1 < my.conllu | less -R

Yes, but without the colors:
echo '\begin{verbatim}' > my.tex
udapy write.TextModeTrees < my.conllu >> my.tex
echo '\end{verbatim}' >> my.tex
and then use
\input{my.tex}

It would not be difficult to write a subclass of write.TextModeTrees
which would use some LaTeX markup like \lemma{I}, \upos{PRON}
instead of the ANSI color codes. So then you could define the colors&style
\def\lemma#1{\textcolor{red}{#1}}
If you are interested, I can implement it.

what I really missing is a simple way to display a fragment of a sentence

Now, I've added a Udapi block which allows to delete all nodes in a document
except for the subtrees matching a given condition, e.g.

udapy -s util.Filter subtree='node.upos == "NOUN"' < in.conllu > filtered.conllu

will print only noun phrases.
So you can use

udapy util.Filter subtree='node.form == "dog"' write.TextModeTrees < in.conllu

to get the subtree(s) headed by word "dog", or

udapy util.Filter subtree='node.ord == 2 and node.root.address() == "3"' write.TextModeTrees < in.conllu

to get the subtree headed by the second word in tree with sent_id = 3.

Yet another alternative to Tikz, Html and TextModeTrees would be to
use paste the CoNLL-U to the online Brat rendered
(e.g. click "edit" here http://universaldependencies.org/sandbox.html#pirate-example).
But then you would need to zoom, take a screenshot and include it as bitmap (png) into LaTeX,
which is not optimal.

If needed I can implement write.Sdparse which would print something like

Dogs run
nsubj(run-2, Dogs-1)

which would allow easier manual editing than the CoNLL-U format.

@arademaker
Copy link
Contributor Author

other lib that we can take ideas from:

https://yandex.github.io/dep_tregex/quickstart.html

@arademaker
Copy link
Contributor Author

Yet another one

https://gitlab.com/nats/deptreeviz

and some possible ideas can also be taken from http://lisp-univ-etc.blogspot.com.br/2017/04/pretty-printing-trees.html.

@arademaker
Copy link
Contributor Author

Commit af2dd6e closes this issue. More tests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants