# Chapters with only "frequent" words

Task: find the chapters without more than 20 rare words, where a rare word has a frequency (as lexeme) of less than 70.

A question posed by Oliver Glanz.

In [1]:
from tf.fabric import Fabric

In [2]:
TF = Fabric(modules='etcbc/bhsa/tf/c')

This is Text-Fabric 7.0.3
Api reference : https://dans-labs.github.io/text-fabric/Api/General/
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

114 features found and 0 ignored


In [6]:
api = TF.load('freq_lex', silent=True)

In [9]:
api.makeAvailableIn(globals())

[('computed-data', ('C Computed', 'Call AllComputeds', 'Cs ComputedString')),
 ('edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')),
 ('loading', ('TF', 'ensureLoaded', 'ignored', 'loadLog')),
 ('locality', ('L Locality',)),
 ('messaging', ('cache', 'error', 'indent', 'info', 'reset')),
 ('navigating-nodes', ('N Nodes', 'sortKey', 'otypeRank', 'sortNodes')),
 ('node-features', ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')),
 ('search', ('S Search',)),
 ('text', ('T Text',))]

In [19]:
FREQ = 70
AMOUNT = 20

## Query

A straightforward query is:

In [22]:
query = f'''
chapter
/without/
  word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
  < word freq_lex<{FREQ}
/-/
'''

Two problems with this query:

* it is very inelegant
* it does not perform, in fact, you cannot wait for it.

So, better not search with this one.

In [24]:
indent(reset=True)
info('start query')
# results = S.search(query, limit=1)
info('end query')
len(results)

  0.00s start query
  0.00s end query


1

# By hand

On the other hand, with a bit of hand coding it is very easy, and almost instantaneous:

In [32]:
results = []
allChapters = F.otype.s('chapter')

for chapter in allChapters:
    if len([
        word for word in L.d(chapter, otype='word') if F.freq_lex.v(word) < FREQ
    ]) < AMOUNT:
        results.append(chapter)
        
print(f'{len(results)} chapters out of {len(allChapters)}')

60 chapters out of 929


In [33]:
for chapter in results:
    print('{} {}'.format(*T.sectionFromNode(chapter)))

Exodus 11
Exodus 24
Leviticus 17
Deuteronomy 30
Joshua 23
Isaiah 12
Isaiah 39
Jeremiah 45
Ezekiel 15
Hosea 3
Joel 3
Psalms 1
Psalms 3
Psalms 4
Psalms 13
Psalms 14
Psalms 15
Psalms 20
Psalms 23
Psalms 24
Psalms 26
Psalms 43
Psalms 47
Psalms 53
Psalms 54
Psalms 61
Psalms 67
Psalms 70
Psalms 82
Psalms 86
Psalms 87
Psalms 93
Psalms 97
Psalms 99
Psalms 100
Psalms 101
Psalms 110
Psalms 113
Psalms 114
Psalms 115
Psalms 117
Psalms 120
Psalms 121
Psalms 122
Psalms 123
Psalms 124
Psalms 125
Psalms 126
Psalms 127
Psalms 128
Psalms 130
Psalms 131
Psalms 133
Psalms 134
Psalms 136
Psalms 138
Psalms 150
Job 25
Esther 10
2_Chronicles 27
