In [1]:
%load_ext autoreload
%autoreload 2

# First occurrences

We want to view the first occurrences of each word.

The purpose is to see them with yellow highlighting in the Text-Fabric browser.

# Problem

We need a query to find those first occurrences.

Here is one:

```
word
/without/
> w:word
.. .lex. w
/-/
```

But this query is super inefficient.
I have not yet seen it run to completion.

# Solution

With a bit of programming we compute this set of words, export it as a set,
and then start the TF browser with this set loaded.

This is an example of a powerful practice:

> rather than writing very convoluted queries, produce auxiliary data
on which much simpler, and more efficient queries can be run.

In [26]:
from tf.app import use

In [28]:
A = use("ETCBC/bhsa", hoist=globals())

This is Text-Fabric 11.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

122 features found and 0 ignored
  1.53s Dataset without structure sections in otext:no structure functions in the T-API
  3.70s All features loaded/computed - for details use TF.isLoaded()
  1.18s All additional features loaded - for details use TF.isLoaded()


# Compute first occurrences

When we walk through all nodes and remember the first occurrence of each word,
we compute in no time the set of first occurrences.

In [29]:
A.indent(reset=True)
lexFound = {}

for w in F.otype.s("word"):
    lex = F.lex.v(w)
    if lex in lexFound:
        continue
    lexFound[lex] = w
    
A.info(f"Found {len(lexFound)} first occurrences")

  0.08s Found 8769 first occurrences


We also want the first occurrences per book and per chapter.

In [38]:
from itertools import chain

In [39]:
A.indent(reset=True)

lexFound = dict(bible={}, book={}, chapter={})

for b in F.otype.s("book"):
    for c in L.d(b, otype="chapter"):
        for w in L.d(c, otype="word"):
            lex = F.lex.v(w)
            if lex not in lexFound["bible"]:
                lexFound["bible"][lex] = w
            if lex not in lexFound["book"].setdefault(b, {}):
                lexFound["book"][b][lex] = w
            if lex not in lexFound["chapter"].setdefault(c, {}):
                lexFound["chapter"][c][lex] = w

A.info(f"Found {len(lexFound['bible'])} first occurrences in bible")
A.info(
    f"Found {sum(len(firstOcc) for firstOcc in lexFound['book'].values())} first occurrences in book"
)
A.info(
    f"Found {sum(len(firstOcc) for firstOcc in lexFound['chapter'].values())} first occurrences in chapter"
)

  0.35s Found 8769 first occurrences in bible
  0.35s Found 40172 first occurrences in book
  0.35s Found 135394 first occurrences in chapter


In [40]:
sets = dict(
    first=set(lexFound["bible"].values()),
    firstbook=set(
        chain.from_iterable(firstOcc.values() for firstOcc in lexFound["book"].values())
    ),
    firstchapter=set(
        chain.from_iterable(
            firstOcc.values() for firstOcc in lexFound["chapter"].values()
        )
    ),
)

In [42]:
for (name, members) in sets.items():
    print(f"{name:20} {len(members):>6} nodes")

first                  8769 nodes
firstbook             40172 nodes
firstchapter         135394 nodes


We write this dictionary of sets (only one set) to file.

In [43]:
from tf.lib import writeSets

from tf.core.helpers import expanduser

In [44]:
writeSets(sets, expanduser("~/Downloads/sets"))

True

Now we can call the TF browser with an extra parameter.

In [None]:
!text-fabric ETCBC/bhsa --sets=~/Downloads/sets

This is Text-Fabric 11.0.3
Starting new kernel listening on 19802
Loading data for ETCBC/bhsa. Please wait ...
Setting up TF kernel for ETCBC/bhsa  ~/Downloads/sets
Using TF-app in /Users/me/text-fabric-data/github/ETCBC/bhsa/app:
	rv1.8=#eb1eef532de43783a548afed016937f55572bac6 offline under /Users/me/text-fabric-data/github (local release)
Sets from ~/Downloads/sets: first, firstbook, firstchapter
Using data in /Users/me/text-fabric-data/github/ETCBC/bhsa/tf/2021:
	rv1.8=#eb1eef532de43783a548afed016937f55572bac6 offline under /Users/me/text-fabric-data/github (local release)
Using data in /Users/me/text-fabric-data/github/etcbc/phono/tf/2021:
	rv2.1=#aba4367b49750089e4e4122415a77cac43bd97bc offline under /Users/me/text-fabric-data/github (local release)
Using data in /Users/me/text-fabric-data/github/ETCBC/parallels/tf/2021:
	rv2.1=#f45f6cc3c4f933dba6e649f49cdb14a40dcf333f offline under /Users/me/text-fabric-data/github (local release)
This is Text-Fabric 11.0.3
Api reference : https

In this browser, in the query box, write the following query:

```
first
```

This will find all nodes in the set `first`.

Then you can browse the chapters in the Hebrew Bible, and on each page you see the
words that have their first occurrence there.

Likewise, if you query

```
firstchapter
```

you see in each chapter the first occurrence highlighted of each word that occurs in it,
and only the first occurrence.

# Further experiments

You can now use `first`, `firstbook`, and `firstchapter` in queries, wherever you can use `word`.

So you can make intricate queries based on where words occur for the first time.

And you can fabricate other sets, things that are hard to query for, and then use them
in supercharged queries, both in a Jupyter notebook and in the Text-Fabric browser.