<img align="right" src="images/tf.png" width="128"/>
<img align="right" src="images/ninologo.png" width="128"/>
<img align="right" src="images/dans.png" width="128"/>

---

To see the pos tag creation: consult [posTag](posTag.ipynb)

---

# Use the Part of Speech tagging

# Usage

For now, you can make use of a bunch of sets in your queries, whether in the TF-browser or in a notebook.

## Getting the sets

Here is how you can get the sets.

### With Dropbox

If you are synchronized to the `obb` shared folder on Dropbox
(that means, you have installed the Dropbox client and accepted the invitation to `obb`):

You are all set, you have the newest version of the sets file on your computer seconds after
it has been updated.

### With Github

First get the tutorials repo:

For the first time:

```sh
cd ~/github/annotation
git clone https://github.com/annotation/tutorials
```

Advice: do not work in your clone directly, but in a working directory outside this clone.
When you want to get updates the repo:

```sh
cd ~/github/annotation/tutorials
git pull origin master
```

(This will fail if you have worked inside your clone).

## Using the sets and features

You can use the sets and features directly in your programs, or in TF-queries, whether in notebooks or in the TF-browser.

### TF-browser

To start the TF browser:

```sh
text-fabric oldbabylonian --sets=~/Dropbox/obb/sets.tfx --mod=annotation/tutorials/oldbabylonian/cookbook/pos/tf
```

or 

```sh
text-fabric oldbabylonian --sets=~/github/annotation/tutorials/oldbabylonian/cookbook/sets.tfx --mod=annotation/tutorials/oldbabylonian/cookbook/pos/tf
```

### In notebooks

This notebook is an example of how you can work with the new data.

## Using sets in queries

You can use the names of sets in all places where you currently use `word`, `sign`, `face`, etc.
More info in the [docs](https://annotation.github.io/text-fabric/Use/Search/#search-template-reference).

In [2]:
from tf.app import use

In [3]:
A = use('oldbabylonian', hoist=globals(), mod='ancient-data/tftutorials/oldbabylonian/pos/tf:clone')

	connecting to online GitHub repo annotation/app-oldbabylonian ... connected
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-oldbabylonian/code:
	rv0.2=#4bb2530bfb94dc93601f8b3df7722cb0e5df7a43 (latest release)
	connecting to online GitHub repo Nino-cunei/oldbabylonian ... connected
Using data in /Users/dirk/text-fabric-data/Nino-cunei/oldbabylonian/tf/1.0.4:
	rv1.4 (latest release)
Using data in /Users/dirk/github/ancient-data/tftutorials/oldbabylonian/pos/tf/1.0.4:
	repo clone offline under ~/github (local github)
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used


Note that the features `pos` and `subpos` and friends are loaded now.

Let's print the frequency lists of their values.
First a convenience function to print the frequency list of an arbitrary feature.

In [4]:
def freqList(feat):
  for (p, n) in Fs(feat).freqList():
    print(f'{p:<12}: {n:>5} x')

In [5]:
freqList('pos')

noun        : 26560 x
pcl         :  7230 x
prep        :  5943 x
prn         :  1492 x
adv         :   399 x


In [6]:
freqList('subpos')

conj        :  2570 x
rel         :  2363 x
numeral     :  2238 x
neg         :  1909 x
prs         :  1440 x
tmp         :   399 x
dem         :    52 x


In [7]:
freqList('cs')

nom         :  1001 x
obl         :   415 x
dat         :    24 x


In [8]:
freqList('ps')

2           :   545 x
3           :   534 x
1           :   361 x


In [9]:
freqList('gn')

m           :   772 x
c           :   572 x
f           :    96 x


In [10]:
freqList('nu')

sg          :  1283 x
pl          :   157 x


We still need to load the sets.

In [11]:
from tf.lib import readSets

In [12]:
sets = readSets('~/github/annotation/tutorials/oldbabylonian/cookbook/sets.tfx')
sorted(sets)

['advtmp',
 'nonprep',
 'noun',
 'nounMdet',
 'nounMlogo',
 'nounMnum',
 'nounMprep',
 'nounUdet',
 'nounUlogo',
 'nounUnum',
 'nounUprep',
 'noundet',
 'nounlogo',
 'nounnum',
 'nounprep',
 'pcl',
 'pclconj',
 'pclneg',
 'pclrel',
 'prep',
 'prndem',
 'prnprs']

We perform a query with the new sets:

In [13]:
query = '''
pclneg
<: noun
'''
results = A.search(query)

 0 
 1 pclneg
 2 <: noun
 3 
line 1: Unknown object type: "pclneg"
line 2: Unknown object type: "noun"
Valid object types are: document, face, line, word, cluster, sign


  0.01s 0 results


Oops! We need to tell `A.search()` to use the sets.

In [14]:
query = '''
pclneg
<: noun
'''
results = A.search(query, sets=sets)

  0.01s 79 results


In [15]:
A.table(results, end=10)

n,p,word,word.1
1,P509376 obverse:6,u2-ul,ta-asz-pu-ra-am
2,P509376 obverse:8,u2-ul,ta-asz-pu-ra-am
3,P509377 reverse:12,la,_sza3-gal_
4,P481192 obverse:12',la,_in-nu_
5,P510526 obverse:12,la,ki
6,P510551 reverse:2,la,"s,u2-ha-ri-ka"
7,P510562 obverse:12,u2-ul,ta-asz-pu-ra-am
8,P510562 reverse:2,u2-ul#,ta-asz-pu-ra#-am
9,P510569 obverse:14,u2-ul,ta-asz-pu-ra-am#
10,P510576 reverse:10,u2-ul#,[ta-asz]-pu#-ra-am


Why not ask for preposition-pronoun combinations?

In [16]:
query = '''
pclneg
<: prnprs
'''
results = A.search(query, sets=sets)

  0.01s 8 results


In [16]:
A.table(results, end=10)

n,p,word,word.1
1,P510611 obverse:7,u2-ul,a-na-ku
2,P510795 reverse:8,la,ia-a-ti
3,P510880 reverse:8,la,ka-ti
4,P292937 reverse:6,la,ia-ti
5,P313317 reverse:19,<<u2-ul>>,a-na-ku
6,P313324 reverse:12,la,ka-ti
7,P387319 reverse:3,la,ia-a-ti
8,P292902 reverse:4,u2-la,a-na-ku-ma


---

full posTag and pos notebooks on
[annotation/tutorials/oldbabylonian/cookbook](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/oldbabylonian/cookbook)

full tutorial on
[annotation/tutorials/oldbabylonian](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/oldbabylonian)

# And so on for more corpora

* Hebrew Bible
* Quran
* Uruk
* Peshitta

and [more](https://annotation.github.io/text-fabric/About/Corpora/).