<img align="right" src="tf-small.png"/>

# Linked Data

We prepare a Linked Data set of the
[Hebrew Bible](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_home.html)
plus linguistic annotations.

This is just an exercise, done for this
[linked data course event](http://risis.eu/event/linked-data-for-science-and-innovation-studies/).

It is not a definitive linked data export.

In [5]:
import os
from tf.fabric import Fabric

# Call Text-Fabric

Everything starts by setting up Text-Fabric.
It needs to know where to look for data.

In [2]:
ETCBC = 'hebrew/etcbc4c'
PHONO = 'hebrew/phono'
TF = Fabric( modules=[ETCBC, PHONO])

This is Text-Fabric 2.3.6
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
111 features found and 0 ignored


# Load Features

In [3]:
api = TF.load('''
    g_word_utf8 trailer_utf8
    phono phono_trailer
    lex sp function typ
''')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.21s B g_word_utf8          from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.21s B phono                from /Users/dirk/github/text-fabric-data/hebrew/phono
   |     0.07s B phono_trailer        from /Users/dirk/github/text-fabric-data/hebrew/phono
   |     0.09s B trailer_utf8         from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.22s B lex                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.16s B sp                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.09s B function             from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.42s B typ                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.00s Feature overview: 104 for nodes; 5 for edges; 2 configs; 7 computed
  7.05s All features loaded/computed - for details use loadLog()


# Output location

In [6]:
outputDir = os.path.expanduser('~/Dropbox/public_dropbox/datasets/HebrewBible')
if not os.path.exists(outputDir): os.makedirs(outputDir)
outFile = 'HebrewBible.tsv'
outPath = '{}/{}'.format(outputDir, outFile)

# First step

We compile a table, with a row for each word (>400,000 rows).
In each row, we have fields for book, chapter, verse, Hebrew text, phonological text, lexeme, part of speech.
We also add the phrase *function* of the surrounding phrase and the clause *type* of the 
surrounding clause.

For more information on these features, see the
[feature documentation](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html),
in particular

* [g_word_utf8](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/g_word_utf8.html)
* [trailer_utf8](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/trailer_utf8.html)
* [phono](https://etcbc.github.io/text-fabric-data/features/hebrew/phono/phono.html)
* [phono_trailer](https://etcbc.github.io/text-fabric-data/features/hebrew/phono/phono_trailer.html)
* [sp](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/sp.html)
* [lex](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/lex.html)
* [function](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/function.html)
* [typ](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/typ.html)

In [7]:
fields = '''
    book chapter verse
    hebrew hsep 
    phono phsep
    lexeme part-of-speech
    phrase-function clause-type
'''.strip().split()

rowFormat = ('{}\t' * (len(fields) - 1)) + '{}\n'

indent(reset=True)
info('Generating word table file')

with open(outPath, 'w') as tf:
    tf.write(rowFormat.format(*fields))
    for w in F.otype.s('word'):
        (book, chapter, verse) = T.sectionFromNode(w)
        p = L.u(w, 'phrase')[0]
        c = L.u(w, 'clause')[0]
        tf.write(rowFormat.format(
            book, chapter, verse,
            F.g_word_utf8.v(w), F.trailer_utf8.v(w),
            F.phono.v(w), F.phono_trailer.v(w),
            F.lex.v(w), F.sp.v(w),
            F.function.v(p), F.typ.v(c),
        ))
tf.close()
info('Done')

  0.00s Generating word table file
    28s Done
