# Introduction
Udapi is an API and framework for processing [Universal Dependencies](http://universaldependencies.org/). In this tutorial, we will focus on the Python version of Udapi. Perl and Java versions are [available](http://udapi.github.io/) as well, but they are missing some of the features.

Udapi can be used from the shell (e.g. Bash), using the wrapper script `udapy`. It can be also used as a library, from Python, IPython or Jupyter notebooks. We will show both of these ways bellow.

This tutorial uses Details sections for extra info (if you want to know more or if you run into problems). You need to click on it to show its content.
<details><summary>Details</summary>
It is a substitute for footnotes. The content may be long and showing it in the main text may be distracting.
</details>

### Install (upgrade) Udapi
First, make sure you have the newest version of Udapi. If you have already installed Udapi [using git clone](https://github.com/udapi/udapi-python#install-udapi-for-developers), just run `git pull`. If you have not installed Udapi yet, run
<details><summary>Details</summary>
<ul>
  <li> The command below installs Udapi from GitHub (from the master branch). With <code>pip3 install --user --upgrade udapi</code>, you can install the last version released on PyPI (possibly older).
  <li> The exclamation mark (!) in Jupyter or IPython means that the following command will be executed by the system shell (e.g. Bash).
</ul>
</details>

In [2]:
!pip3 install --upgrade git+https://github.com/udapi/udapi-python.git

Collecting git+https://github.com/udapi/udapi-python.git
  Cloning https://github.com/udapi/udapi-python.git to /tmp/pip-req-build-weedeujv
  Running command git clone --filter=blob:none --quiet https://github.com/udapi/udapi-python.git /tmp/pip-req-build-weedeujv
  Resolved https://github.com/udapi/udapi-python.git to commit 19fe22939699b383355b0c0dc2764f8c5792f39d
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting colorama (from udapi==0.3.0)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting termcolor (from udapi==0.3.0)
  Using cached termcolor-2.4.0-py3-none-any.whl.metadata (6.1 kB)
Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Using cached termcolor-2.4.0-py3-none-any.whl (7.7 kB)
Building wheels for collected packages: udapi
  Building wheel for udapi (pyproject.toml) ... [?25ldone
[?25h  Create

Now, make sure you can run the command-line interface `udapy`, e.g. by printing the help message.

In [3]:
!udapy -h

usage: udapy [optional_arguments] scenario

udapy - Python interface to Udapi - API for Universal Dependencies

Examples of usage:
  udapy -s read.Sentences udpipe.En < in.txt > out.conllu
  udapy -T < sample.conllu | less -R
  udapy -HAM ud.MarkBugs < sample.conllu > bugs.html

positional arguments:
  scenario              A sequence of blocks and their parameters.

options:
  -h, --help            show this help message and exit
  -s, --save            Add write.Conllu to the end of the scenario
  -T, --save_text_mode_trees
                        Add write.TextModeTrees color=1 to the end of the scenario
  -H, --save_html       Add write.TextModeTreesHtml color=1 to the end of the scenario
  -A, --save_all_attributes
                        Add attributes=form,lemma,upos,xpos,feats,deprel,misc (to be used after -T and -H)
  -C, --save_comments   Add print_comments=1 (to be used after -T and -H)
  -M, --marked_only     Add marked_only=1 to the end of the scenario

<details><summary>Details: If the previous command fails with "udapy: command not found"</summary>
This means that Udapi is not properly installed. When installing Udapi with <code>pip3 --user</code>, it is installed into <code>~/.local/lib/python3.6/site-packages/udapi/</code> (or similar depending on your Python version) and the wrapper into <code>~/.local/bin</code>. Thus you need to
<pre>
export PATH="$HOME/.local/bin/:$PATH"
</pre>
</details>

# Browse CoNLL-U files
### Get sample UD data

Download and extract [ud20sample.tgz](http://ufal.mff.cuni.cz/~popel/udapi/ud20sample.tgz). There are just 100 sentences for each of the 70 treebanks (`sample.conllu`),  plus 4 bigger files (`train.conllu` and `dev.conllu`) for German, English, French and Czech. For full UD ([2.0](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1983) or [newer](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3424)), go to [Lindat](https://lindat.cz).

In [4]:
!wget http://ufal.mff.cuni.cz/~popel/udapi/ud20sample.tgz
!tar -xf ud20sample.tgz
%cd sample

--2024-03-24 10:56:27--  http://ufal.mff.cuni.cz/~popel/udapi/ud20sample.tgz
Resolving ufal.mff.cuni.cz (ufal.mff.cuni.cz)... 195.113.20.52
Connecting to ufal.mff.cuni.cz (ufal.mff.cuni.cz)|195.113.20.52|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ufal.mff.cuni.cz/~popel/udapi/ud20sample.tgz [following]
--2024-03-24 10:56:27--  https://ufal.mff.cuni.cz/~popel/udapi/ud20sample.tgz
Connecting to ufal.mff.cuni.cz (ufal.mff.cuni.cz)|195.113.20.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4670982 (4.5M) [application/x-gzip]
Saving to: ‘ud20sample.tgz’


2024-03-24 10:56:27 (7.34 MB/s) - ‘ud20sample.tgz’ saved [4670982/4670982]

/home/alexandra/work/projects/scene_desc_extractor/scene_desc_extractor/research/sample


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


Let's choose one of the sample files and see the raw [CoNLL-U format](https://universaldependencies.org/format.html).
<details><summary>Details: executing from Bash, IPython, Jupyter</summary>
<ul>
<li>If you see "No such file or directory" error, make sure you executed the previous cell. Note that the <code>cd</code> command is not prefixed by an exclamation mark because that would run in a sub-shell, which "forgets" the changed directory when finished. It is prefixed by a percent sign, which marks it as <a href="https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd">IPython magic</a>.
<li><code>cat</code> is another IPython magic command, this time an alias for the shell command of the same name (so you can prefix <code>cat</code> with an exclamation mark, if you prefer), which prints a given file. With <code>automagic</code> on, you can use it without the percent sign.
<li>In this tutorial, we use <code>| head</code> to show just the first 10 lines of the output (preventing thus big ipynb file size). You can ignore the "cat: write error: Broken pipe" warning.
<li>When using Jupyter, you can omit the <code>| head</code> because long outputs are automatically wrapped in a text box with a scrollbar.
<li>When running this from IPython or Bash, you can use a pager: <code>less UD_Ancient_Greek/sample.conllu</code>
</ul>
</details>


In [23]:
!cat UD_English/sample.conllu | head -n 100

# newdoc id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000
# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0001
# text = Al-Zaman : American forces killed Shaikh Abdullah al-Ani, the preacher at the mosque in the town of Qaim, near the Syrian border.
1	Al	Al	PROPN	NNP	Number=Sing	0	root	_	SpaceAfter=No
2	-	-	PUNCT	HYPH	_	1	punct	_	SpaceAfter=No
3	Zaman	Zaman	PROPN	NNP	Number=Sing	1	flat	_	_
4	:	:	PUNCT	:	_	1	punct	_	_
5	American	american	ADJ	JJ	Degree=Pos	6	amod	_	_
6	forces	force	NOUN	NNS	Number=Plur	7	nsubj	_	_
7	killed	kill	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	1	parataxis	_	_
8	Shaikh	Shaikh	PROPN	NNP	Number=Sing	7	obj	_	_
9	Abdullah	Abdullah	PROPN	NNP	Number=Sing	8	flat	_	_
10	al	al	PROPN	NNP	Number=Sing	8	flat	_	SpaceAfter=No
11	-	-	PUNCT	HYPH	_	8	punct	_	SpaceAfter=No
12	Ani	Ani	PROPN	NNP	Number=Sing	8	flat	_	SpaceAfter=No
13	,	,	PUNCT	,	_	8	punct	_	_
14	the	the	DET	DT	Definite=Def|PronType=Art	15	det	_	_
15	pre

In [27]:
!cat UD_English/GUM_fiction_moon.conllu | head -n 100

# newdoc id = GUM_fiction_moon
# global.Entity = GRP-etype-infstat-centering-minspan-link-identity
# meta::author = H.P. Lovecraft
# meta::dateCollected = 2019-09-11
# meta::dateCreated = 2005-03-31
# meta::dateModified = 2015-11-18
# meta::genre = fiction
# meta::salientEntities = 1, 2, 3, 11, 19, 51, 59, 64, 80, 85
# meta::sourceURL = https://en.wikisource.org/wiki/What_the_Moon_Brings
# meta::speakerCount = 1
# meta::summary = A protagonist tells of a terrifying walk on a moonlit summer night into a stream flowing to an ocean filled with the dead, during which face-like lotos blossoms whispered to him and a dead city of spires appeared in the sea, where sea worms feasted on flesh under an evil moon and a shocking statue with hooves and a crown gazed at him.
# meta::title = What the Moon Brings
# sent_id = GUM_fiction_moon-1
# s_prominence = 2
# s_type = frag
# transition = establishment
# text = What the Moon Brings
# newpar
# newpar_block = head (1 s)
1	What	what

### Browse conllu files with `udapy -T`
While the CoNLL-U format was designed with readibility (by both machines and humans) on mind, it may be still a bit difficult to read and interpret by humans. Let's visualize the dependency tree structure using ASCII-art by piping the conllu file into `udapy -T`.

In [26]:
!cat UD_English/GUM_fiction_moon.conllu | udapy -T  | head -n 100

2024-03-24 11:23:51,848 [   INFO] execute - No reader specified, using read.Conllu
2024-03-24 11:23:51,848 [   INFO] execute -  ---- ROUND ----
2024-03-24 11:23:51,848 [   INFO] execute - Executing block read.Conllu
2024-03-24 11:23:51,850 [   INFO] execute - Executing block write.TextModeTrees
docname = GUM_fiction_moon
global.Entity = GRP-etype-infstat-centering-minspan-link-identity
loaded_from = -
# sent_id = GUM_fiction_moon-1
# text = What the Moon Brings
─┮
 ╰─┮ [33mWhat[0m [31mPRON[0m [34mroot[0m
   │   ╭─╼ [33mthe[0m [31mDET[0m [34mdet[0m
   │ ╭─┶ [33mMoon[0m [31mNOUN[0m [34mnsubj[0m
   ╰─┶ [33mBrings[0m [31mVERB[0m [34macl[0m

# sent_id = GUM_fiction_moon-2
# text = I hate the moon - I am afraid of it - for when it shines on certain scenes familiar and loved it sometimes makes them unfamiliar and hideous.
─┮
 │ ╭─╼ [33mI[0m [31mPRON[0m [34mnsubj[0m
 ╰─┾ [33mhate[0m [31mVERB[0m [34mroot[0m
   │ ╭─╼ [33mthe[0m [31mDET[

In [22]:
!cat UD_English/sample.conllu | udapy -T | head -n 100

2024-03-24 11:17:16,066 [   INFO] execute - No reader specified, using read.Conllu
2024-03-24 11:17:16,066 [   INFO] execute -  ---- ROUND ----
2024-03-24 11:17:16,066 [   INFO] execute - Executing block read.Conllu
2024-03-24 11:17:16,070 [   INFO] execute - Executing block write.TextModeTrees
docname = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000
global.Entity = None
loaded_from = -
# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0001
# text = Al-Zaman : American forces killed Shaikh Abdullah al-Ani, the preacher at the mosque in the town of Qaim, near the Syrian border.
─┮
 ╰─┮ [33mAl[0m [31mPROPN[0m [34mroot[0m
   ┡─╼ [33m-[0m [31mPUNCT[0m [34mpunct[0m
   ┡─╼ [33mZaman[0m [31mPROPN[0m [34mflat[0m
   ┡─╼ [33m:[0m [31mPUNCT[0m [34mpunct[0m
   │   ╭─╼ [33mAmerican[0m [31mADJ[0m [34mamod[0m
   │ ╭─┶ [33mforces[0m [31mNOUN[0m [34mnsubj[0m
   ┡─┾ [33mkilled[0m [31mVERB[0m [34mparataxis

<details><summary>Details:</summary>
<ul>
<li>You may be used to see dependency trees where the root node is on the top and words are ordered horizontally (left to right). Here, the root is on left and words are ordered vertically (top to bottom).
<li>The colors are implemented using the <a href="https://pypi.org/project/colorama/">colorama package</a> and ANSI escape codes. When running this from IPython or Bash and using <code>less</code>, you need to instruct it to display the colors with <code>-R</code>:
<code>
cat UD_Ancient_Greek/sample.conllu | udapy -T | less -R
</code>
<li>You can also use <code>udapy -T -N</code> to disable the colors.
<li><code>udapy -q</code> suppresses all Udapi messages (warnings, info, debug) printed on the standard error output, so only fatal errors are printed. By default only debug messages are suppresses, but these can be printed with <code>udapy -v</code>.
<li>But you already know this because you have read <code>udapy -h</code>, am I right?
</ul>
</details>

`udapy -T` is a shortcut for `udapy write.TextModeTrees color=1`, where `write.TextModeTrees` is a so-called *block* (a basic Udapi processing unit) and `color=1` is its parameter. See [the documentation](https://udapi.readthedocs.io/en/latest/udapi.block.write.html#module-udapi.block.write.textmodetrees) (or even [the source code](https://github.com/udapi/udapi-python/blob/master/udapi/block/write/textmodetrees.py) of `write.TextModeTrees` to learn about further parameters. Now, let's print also the LEMMA and MISC columns and display the columns vertically aligned using parameters `layout=align attributes=form,lemma,upos,deprel,misc`.

In [19]:
!cat UD_English/sample.conllu | udapy -q write.TextModeTrees color=1 layout=align attributes=form,lemma,upos,deprel,misc | head -n 20

docname = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000
global.Entity = None
loaded_from = -
# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0001
# text = Al-Zaman : American forces killed Shaikh Abdullah al-Ani, the preacher at the mosque in the town of Qaim, near the Syrian border.
─┮                                              
 ╰─┮           [33mAl[0m       [36mAl[0m       [31mPROPN[0m [34mroot[0m      SpaceAfter=No[0m
   ┡─╼         [33m-[0m        [36m-[0m        [31mPUNCT[0m [34mpunct[0m     SpaceAfter=No[0m
   ┡─╼         [33mZaman[0m    [36mZaman[0m    [31mPROPN[0m [34mflat[0m      _[0m
   ┡─╼         [33m:[0m        [36m:[0m        [31mPUNCT[0m [34mpunct[0m     _[0m
   │   ╭─╼     [33mAmerican[0m [36mamerican[0m [31mADJ[0m   [34mamod[0m      _[0m
   │ ╭─┶       [33mforces[0m   [36mforce[0m    [31mNOUN[0m  [34mnsubj[0m     _[0m
   ┡─┾         [33mkilled[0m   [36m

In [33]:
!echo "John loves Mary." | udapy -s read.Sentences udpipe.En 

Traceback (most recent call last):
  File "/home/alexandra/.cache/pypoetry/virtualenvs/scene-desc-extractor-6Y_iv0gf-py3.11/bin/udapy", line 114, in <module>
    runner.execute()
  File "/home/alexandra/.cache/pypoetry/virtualenvs/scene-desc-extractor-6Y_iv0gf-py3.11/lib/python3.11/site-packages/udapi/core/run.py", line 166, in execute
    blocks = _import_blocks(block_names, block_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexandra/.cache/pypoetry/virtualenvs/scene-desc-extractor-6Y_iv0gf-py3.11/lib/python3.11/site-packages/udapi/core/run.py", line 118, in _import_blocks
    exec(command)  # pylint: disable=exec-used
    ^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
  File "/home/alexandra/.cache/pypoetry/virtualenvs/scene-desc-extractor-6Y_iv0gf-py3.11/lib/python3.11/site-packages/udapi/block/udpipe/en.py", line 2, in <module>
    from udapi.block.udpipe.base import Base
  File "/home/alexandra/.cache/pypoetry/virtualenvs/scene-desc-e

In [58]:
!cat UD_English/sample.conllu | udapy write.TextModeTrees color=1 attributes=form,upos,deprel,misc | head -n 20


2024-03-24 12:00:56,066 [   INFO] execute - No reader specified, using read.Conllu
2024-03-24 12:00:56,066 [   INFO] execute -  ---- ROUND ----
2024-03-24 12:00:56,066 [   INFO] execute - Executing block read.Conllu
2024-03-24 12:00:56,070 [   INFO] execute - Executing block write.TextModeTrees
docname = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000
global.Entity = None
loaded_from = -
# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0001
# text = Al-Zaman : American forces killed Shaikh Abdullah al-Ani, the preacher at the mosque in the town of Qaim, near the Syrian border.
─┮
 ╰─┮ [33mAl[0m [31mPROPN[0m [34mroot[0m SpaceAfter=No[0m
   ┡─╼ [33m-[0m [31mPUNCT[0m [34mpunct[0m SpaceAfter=No[0m
   ┡─╼ [33mZaman[0m [31mPROPN[0m [34mflat[0m _[0m
   ┡─╼ [33m:[0m [31mPUNCT[0m [34mpunct[0m _[0m
   │   ╭─╼ [33mAmerican[0m [31mADJ[0m [34mamod[0m _[0m
   │ ╭─┶ [33mforces[0m [31mNOUN[0m [34mnsubj[0

### Browse conllu files from IPython/Jupyter
So far, we were using Udapi only via its command-line interface `udapy`, which is handy, but not very Pythonic. So let's now use Udapi as a library and load the English conllu sample file into a document `doc` and visualize the sixth tree (i.e. `doc[5]` in zero-based indexing).

In [20]:
import udapi
doc = udapi.Document("UD_English/sample.conllu")
doc[5].draw()

# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0006
# text = The third was being run by the head of an investment firm.
─┮
 │   ╭─╼ [33mThe[0m [31mDET[0m [34mdet[0m
 │ ╭─┶ [33mthird[0m [31mADJ[0m [34mnsubj:pass[0m
 │ ┢─╼ [33mwas[0m [31mAUX[0m [34maux[0m
 │ ┢─╼ [33mbeing[0m [31mAUX[0m [34maux:pass[0m
 ╰─┾ [33mrun[0m [31mVERB[0m [34mroot[0m
   │ ╭─╼ [33mby[0m [31mADP[0m [34mcase[0m
   │ ┢─╼ [33mthe[0m [31mDET[0m [34mdet[0m
   ┡─┾ [33mhead[0m [31mNOUN[0m [34mobl[0m
   │ │ ╭─╼ [33mof[0m [31mADP[0m [34mcase[0m
   │ │ ┢─╼ [33man[0m [31mDET[0m [34mdet[0m
   │ │ ┢─╼ [33minvestment[0m [31mNOUN[0m [34mcompound[0m
   │ ╰─┶ [33mfirm[0m [31mNOUN[0m [34mnmod[0m
   ╰─╼ [33m.[0m [31mPUNCT[0m [34mpunct[0m


<details><summary>Details:</summary>
<ul>
<li><code>doc = udapi.Document(filename)</code> is a shortcut for
<pre>
import udapi.core.document
doc = udapi.core.document.Document(filename)
</pre>
<li>We can print the whole document using <code>doc.draw()</code>.
<li><code>doc.draw(**kwargs)</code> is a shortcut for creating a <code>write.TextModeTrees</code> block and applying it on the document:
<pre>
import udapi.block.write.textmodetrees
block = udapi.block.write.textmodetrees.TextModeTrees(**kwargs)
block.run(doc)
</pre>
</ul>
</details>

The `draw()` method takes the same parameters as the `write.TextModeTrees` block, so we can for example display only the node ID (aka `ord`, i.e. word-order index), form and [universal (morpho-syntactic) features](https://universaldependencies.org/u/feat/index.html).


In [21]:
doc[5].draw(layout="align", attributes="ord,form,feats")

# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0006
# text = The third was being run by the head of an investment firm.
─┮                      
 │   ╭─╼   [32m1[0m  [33mThe[0m        Definite=Def|PronType=Art[0m
 │ ╭─┶     [32m2[0m  [33mthird[0m      Degree=Pos|NumType=Ord[0m
 │ ┢─╼     [32m3[0m  [33mwas[0m        Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin[0m
 │ ┢─╼     [32m4[0m  [33mbeing[0m      VerbForm=Ger[0m
 ╰─┾       [32m5[0m  [33mrun[0m        Tense=Past|VerbForm=Part|Voice=Pass[0m
   │ ╭─╼   [32m6[0m  [33mby[0m         _[0m
   │ ┢─╼   [32m7[0m  [33mthe[0m        Definite=Def|PronType=Art[0m
   ┡─┾     [32m8[0m  [33mhead[0m       Number=Sing[0m
   │ │ ╭─╼ [32m9[0m  [33mof[0m         _[0m
   │ │ ┢─╼ [32m10[0m [33man[0m         Definite=Ind|PronType=Art[0m
   │ │ ┢─╼ [32m11[0m [33minvestment[0m Number=Sing[0m
   │ ╰─┶   [32m12[0m [33mfirm[0m       Number=Sing[0m
   ╰─╼     [32m13

# Document representation in Udapi

Udapi [document](https://github.com/udapi/udapi-python/blob/master/udapi/core/document.py) consists of a sequence of so-called *bundles*, mirroring a sequence of sentences in a typical natural language text.

A [bundle](https://github.com/udapi/udapi-python/blob/master/udapi/core/bundle.py) corresponds to a sentence,
possibly in multiple versions or with different representations, such as sentence-tuples from parallel corpora, or paraphrases in the same language or alternative analyses (e.g. parses produced by different parsers). If there are more trees in a bundle, they must be distinguished by a so-called *zone* (a label which contains the language code).

Each tree is represented by a special (artificial) [root](https://github.com/udapi/udapi-python/blob/master/udapi/core/root.py) node, which is added to the top of a CoNLL-U tree in the Udapi model. The root node bears the ID of a given tree/sentence (`sent_id`) and its word order (`ord`) is 0. Technically, Root is subclass of Node, with some extra methods.

The [Node](https://github.com/udapi/udapi-python/blob/master/udapi/core/node.py) class corresponds to a node
of a dependency tree. It provides access to all the CoNLL-U-defined attributes (`ord`, `form`, `lemma`, `upos`, `xpos`, `feats`, `deprel`, `deps`, `misc`). There are methods for tree traversal (`parent`, `root`, `children`, `descendants`); word-order traversal (`next_node`, `prev_node`); tree manipulation (`parent` setter) including word-order changes (`shift_after_node(x)`, `shift_before_subtree(x)`, etc.); and utility methods: `is_descendant_of(x)`, `is_nonprojective()`, `precedes(x)`, `is_leaf()`, `is_root()`, `get_attrs([])`, `compute_text()`, `draw()`.

## Exercise 1: Count prepositions and postpositions
[Prepositions and postpositions](https://en.wikipedia.org/wiki/Preposition_and_postposition) are together called *adpositions* and assigned the [ADP](https://universaldependencies.org/u/pos/ADP.html) universal part-of-speech tag (`upos`) in UD. Some languages (e.g. English) use mostly prepositions, others mostly postpositions.
* Do you know any English postpositions?
* Guess the typical adposition type (i.e. whether a given language uses more prepositions or postpositions) for at least 10 languages of your choice (from those in UD2.0).
* Complete the following code and find out how many prepositions and postpositions are in `UD_English/sample.conllu` (which has been loaded into `doc`).

In [None]:
prepositions, postpositions = 0, 0
# Iterate over all nodes in the document (in all trees)
for node in doc.nodes:
    if node.upos == "ADP":
        # TODO: fix this code to actually distinguish prepositions and postpositions
        prepositions += 1
# Print the results
prepositions, postpositions

If you don't know how to proceed click on the following hints.
<details><summary>Hint 1:</summary>
In some dependency grammars, adpositions govern noun (i.e. adposition is the *parent* of a given noun node). In other dependency grammars, adpositions depend on nouns (i.e. noun is the *parent* of a given adposition). Find out which style is being used by UD. Check <a href="https://universaldependencies.org/u/overview/syntax.html">the UD documentation</a> or inspect some of the tree visualizations and guess.
</details>
<details><summary>Hint 2:</summary>
See the <a href="https://udapi.readthedocs.io/en/latest/udapi.core.html#module-udapi.core.node">Node documentation</a> and find out how to obtain dependency parent and dependency children. Note that these are properties of a given node, rather than methods, so you should not write parentheses () after the property name.
</details>
<details><summary>Hint 3:</summary>
<code>doc.nodes</code> iterates over all nodes in the document sorted by the word order, but this would be cumbersome to exploit. Find a method of <code>Node</code> to detect the relative word order of two nodes (within the same tree/sentence).
</details>
<details><summary>Hint 4:</summary>
Use <code>node.parent</code> and <code>node.precedes(another_node)</code>.
The latter is a shortcut for <code>node.ord &lt; another_node.ord</code>.
</details>
<details><summary>Solution:</summary>
<pre>
for node in doc.nodes:
    if node.upos == "ADP":
        if node.precedes(node.parent):
            prepositions += 1
        else:
            postpositions += 1
</pre>
</details>


## Exercise 2: Explore English postpositions
The previous exercise indicates there are 7 occurrences of postpositions in the English sample. Find these 7 occurrences and visualize them using `node.draw()`. Count which adpositions (`lemma`) with which dependency relations (`deprel`) are responsible for these occurrences. Recompute these statistics on the bigger English training data. Can you explain these occurrences? What are the reasons? Is any occurrence an annotation error?

In [None]:
# For the statistics, you may find useful: count["any string"] += 1
import collections
count = collections.Counter()
big_doc = udapi.Document("UD_English/train.conllu")

for node in doc.nodes:
    # TODO detect postposition
    pass

# Print the statistics
count.most_common()

<details><summary>Solution 1:</summary>
<pre>
for node in doc.nodes:
    if node.upos == "ADP" and node.parent.precedes(node):
        node.parent.draw()
        count[node.lemma + " " + node.deprel] += 1
</pre>
</details>
<details><summary>Hint 1:</summary>
We can see there are many particles of phrase verbs, e.g. "busted up".
These seem to be correctly annotated as <code>ADP</code> according to the UD guidelines.
Let's filter out those cases and focus on the rest and let's switch to the big train data.
</details>
<details><summary>Solution 2:</summary>
<pre>
count = collections.Counter()
for node in big_doc.nodes:
    if node.upos == "ADP" and node.parent.precedes(node) and node.parent.upos != "VERB":
        count[node.lemma + " " + node.deprel] += 1
count.most_common()
</pre>
Alternatively to <code>node.parent.upos != "VERB"</code>,
you could also filter out <code>node.deprel != "compound:prt"</code>,
or directly focus on <code>node.deprel == "case"</code>
</details>
<details><summary>Partial answer:</summary>
Most of the occurrences are actually annotated correctly,
although they are not typically considered as postpositions.
For example, <code>node.deprel == "fixed"</code> is being used for multi-word adpositions,
such as "because of", where "of" depends on "because" from technical (and consistency) reasons,
but the whole multi-word adpositions precedes its governing nound, so it is actually a multi-word preposition.

What about the remaining occurrences, after filtering out <code>node.deprel not in {"compound:prt", "fixed"}</code>?
</details>

In the next tutorial, 02-blocks.ipynb (not finished yet), we will explore several useful Udapi blocks, some of which may be handy when working further on Exercise 2 or similar tasks.