# Generate plain text plus line numbers

We generate a tsv file of all the lines in the corpus with the following columns:

* **id**: the P-number of the tablet `:` face `.` line number
* **n**: the number of the line within its face
* **text-orig-full** the full atf
* **text-orig-plain** the essential bits of the atf
* **text-orig-rich** the essential bits in somewhat richer unicode
* **text-orig-unicode** the unicode rendering

See [text formats](https://github.com/Nino-cunei/tfFromAtf/blob/master/docs/transcription.md#text-formats)

In [9]:
from tf.app import use
from tf.core.files import expanduser as ex, unexpanduser as ux, dirMake

In [10]:
corpora = ["oldbabylonian", "oldassyrian"]

In [14]:
TOF = "text-orig-full"
TOP = "text-orig-plain"
TOR = "text-orig-rich"
TOU = "text-orig-unicode"

DEST = ex("~/Downloads/cdliexport")
dirMake(DEST)

for corpus in corpora:
    A = use(f"Nino-cunei/{corpus}")
    F = A.api.F
    L = A.api.L
    T = A.api.T

    file = f"{DEST}/{corpus}.tsv"
    with open(file, "w") as fh:

        for tablet in F.otype.s("document"):
            pnumber = F.pnumber.v(tablet)
            
            for face in L.d(tablet, otype="face"):
                facetag = F.face.v(face)

                for (n, line) in enumerate(L.d(face, otype="line")): 
                    lnumber = F.lnno.v(line)
                    tof = T.text(line, fmt=TOF)
                    top = T.text(line, fmt=TOP)
                    tor = T.text(line, fmt=TOR)
                    tou = T.text(line, fmt=TOU)
    
                    fh.write(f"{pnumber}:{facetag}.{lnumber}\t{n + 1}\t{tof}\t{top}\t{tor}\t{tou}\n")
                    
    A.dm(f"Corpus **{corpus}** exported to file `{ux(file)}`")

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
document,1285,158.15,100
face,2834,71.71,100
line,27375,7.42,100
word,76505,2.64,100
cluster,23449,1.78,21
sign,203219,1.0,100


Corpus **oldbabylonian** exported to file `~/Downloads/cdliexport/oldbabylonian.tsv`

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
document,4775,160.52,100
face,11910,64.36,100
line,109860,6.98,100
word,314012,2.37,97
cluster,82085,1.71,18
sign,766501,1.0,100


Corpus **oldassyrian** exported to file `~/Downloads/cdliexport/oldassyrian.tsv`

In [15]:
!head -n 20 {DEST}/oldbabylonian.tsv

P509373:obverse.1	1	[a-na] _{d}suen_-i-[din-nam]	a-na d⁼suen-i-din-nam	a-na d⁼suen-i-din-nam	𒀀𒈾 𒀭𒂗𒍪𒄿𒁷𒉆
P509373:obverse.2	2	qi2-bi2-[ma]	qi2-bi2-ma	qi₂-bi₂-ma	𒆠𒉈𒈠
P509373:obverse.3	3	um-ma _{d}en-lil2_-sza-du-u2-ni-ma	um-ma d⁼en-lil2-sza-du-u2-ni-ma	um-ma d⁼en-lil₂-ša-du-u₂-ni-ma	𒌝𒈠 𒀭𒂗𒆤𒊭𒁺𒌑𒉌𒈠
P509373:obverse.4	4	_{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim]	d⁼utu u3 d⁼marduk a-na da-ri-a-tim	d⁼utu u₃ d⁼marduk a-na da-ri-a-tim	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒀀𒈾 𒁕𒊑𒀀𒁴
P509373:obverse.5	5	li-ba-al-li-t,u2-u2-ka	li-ba-al-li-t,u2-u2-ka	li-ba-al-li-ṭu₂-u₂-ka	𒇷𒁀𒀠𒇷𒌅𒌑𒅗
P509373:obverse.6	6	{disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_	disz⁼sze-ep-d⁼suen a2-gal2 dumu um-mi-a-mesz	diš⁼še-ep-d⁼suen a₂-gal₂ dumu um-mi-a-meš	𒁹𒊺𒅁𒀭𒂗𒍪 𒀉𒅅 𒌉 𒌝𒈪𒀀𒈨𒌍
P509373:obverse.7	7	ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma]	ki-a-am u2-lam-mi-da-an-ni um-ma szu-u2-ma	ki-a-am u₂-lam-mi-da-an-ni um-ma šu-u₂-ma	𒆠𒀀𒄠 𒌑𒇴𒈪𒁕𒀭𒉌 𒌝𒈠 𒋗𒌑𒈠
P509373:obverse.8	8	{disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim]	disz⁼sa-am-su-ba-ah-li sza-pi2-ir ma-tim	diš⁼sa-am