# Essential features in the BHSA

The BHSA has a number of redundant/inessential/defective features.

By not loading them we save roughly 50% of the RAM usage by Text-Fabric.

These features are listed in the
[config.yaml](https://github.com/ETCBC/bhsa/blob/master/app/config.yaml)
of the BHSA app, under the key `excludedFeatures`.

But if you load features through the `Fabric.loadAll()` function, this file is not taken into account,
and all those features will be loaded.

We make a list of all essential BHSA features, i.e. all features except those excluded features,
and put them in a file named `essential`.

You can then load exactly these features by

```
TF = Fabric.load("file:essential")
```

This notebook creates that file `essential`.

You can find this file online as a release
[attachment here](https://github.com/ETCBC/bhsa/releases/download/v1.8.1/essential).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.fabric import Fabric
from tf.core.files import expanduser as ex, readYaml, fileOpen

In [3]:
bhsaDir = ex("~/github/ETCBC/bhsa")
bhsaConfigPath= f"{bhsaDir}/app/config.yaml"

We read the config file to pick up the excluded features.

In [4]:
excludedFeatures = set(readYaml(asFile=bhsaConfigPath).dataDisplay.excludedFeatures)
excludedFeatures

{'crossrefLCS',
 'crossrefSET',
 'dist',
 'dist_unit',
 'distributional_parent',
 'freq_occ',
 'functional_parent',
 'g_nme',
 'g_nme_utf8',
 'g_pfm',
 'g_pfm_utf8',
 'g_prs',
 'g_prs_utf8',
 'g_uvf',
 'g_uvf_utf8',
 'g_vbe',
 'g_vbe_utf8',
 'g_vbs',
 'g_vbs_utf8',
 'instruction',
 'is_root',
 'kind',
 'kq_hybrid',
 'kq_hybrid_utf8',
 'languageISO',
 'lex0',
 'lexeme_count',
 'mother_object_type',
 'rank_occ',
 'root',
 'suffix_gender',
 'suffix_number',
 'suffix_person'}

We initialize TF on ETCBC/bhsa to get an inventory of all features.

In [5]:
TF = Fabric(locations="~/github/ETCBC/bhsa/tf/2021")

In [6]:
featureCategories = TF.explore()

  0.40s Feature overview: 109 for nodes; 6 for edges; 1 configs; 9 computed


In [7]:
allFeatures = set(featureCategories["nodes"]) | set(featureCategories["edges"]) 

We subtract the excluded features from all features, and write the remaining features into a file with
name `essential`.
We also subtract the `omap@...` features.

In [15]:
essentialFeatures = sorted(f for f in allFeatures -  excludedFeatures if not f.startswith("omap@"))
essentialFeatures

['book',
 'book@am',
 'book@ar',
 'book@bn',
 'book@da',
 'book@de',
 'book@el',
 'book@en',
 'book@es',
 'book@fa',
 'book@fr',
 'book@he',
 'book@hi',
 'book@id',
 'book@ja',
 'book@ko',
 'book@la',
 'book@nl',
 'book@pa',
 'book@pt',
 'book@ru',
 'book@sw',
 'book@syc',
 'book@tr',
 'book@ur',
 'book@yo',
 'book@zh',
 'chapter',
 'code',
 'det',
 'domain',
 'freq_lex',
 'function',
 'g_cons',
 'g_cons_utf8',
 'g_lex',
 'g_lex_utf8',
 'g_word',
 'g_word_utf8',
 'gloss',
 'gn',
 'label',
 'language',
 'lex',
 'lex_utf8',
 'ls',
 'mother',
 'nametype',
 'nme',
 'nu',
 'number',
 'oslots',
 'otype',
 'pargr',
 'pdp',
 'pfm',
 'prs',
 'prs_gn',
 'prs_nu',
 'prs_ps',
 'ps',
 'qere',
 'qere_trailer',
 'qere_trailer_utf8',
 'qere_utf8',
 'rank_lex',
 'rela',
 'sp',
 'st',
 'tab',
 'trailer',
 'trailer_utf8',
 'txt',
 'typ',
 'uvf',
 'vbe',
 'vbs',
 'verse',
 'voc_lex',
 'voc_lex_utf8',
 'vs',
 'vt']

In [16]:
len(essentialFeatures)

82

In [27]:
with fileOpen(ex("~/Downloads/essential"), "w") as fh:
    fh.write("\n".join(essentialFeatures))

Now we test it: load the BHSA by means of Fabric but only load the essential features.

In [28]:
api = TF.load(f"file:~/Downloads/essential")

In [29]:
eFeats = api.Eall()

In [30]:
fFeats = api.Fall()

In [31]:
len(eFeats) + len(fFeats)

82

In [32]:
eFeats

['mother', 'oslots']

In [33]:
fFeats

['book',
 'book@am',
 'book@ar',
 'book@bn',
 'book@da',
 'book@de',
 'book@el',
 'book@en',
 'book@es',
 'book@fa',
 'book@fr',
 'book@he',
 'book@hi',
 'book@id',
 'book@ja',
 'book@ko',
 'book@la',
 'book@nl',
 'book@pa',
 'book@pt',
 'book@ru',
 'book@sw',
 'book@syc',
 'book@tr',
 'book@ur',
 'book@yo',
 'book@zh',
 'chapter',
 'code',
 'det',
 'domain',
 'freq_lex',
 'function',
 'g_cons',
 'g_cons_utf8',
 'g_lex',
 'g_lex_utf8',
 'g_word',
 'g_word_utf8',
 'gloss',
 'gn',
 'label',
 'language',
 'lex',
 'lex_utf8',
 'ls',
 'nametype',
 'nme',
 'nu',
 'number',
 'otype',
 'pargr',
 'pdp',
 'pfm',
 'prs',
 'prs_gn',
 'prs_nu',
 'prs_ps',
 'ps',
 'qere',
 'qere_trailer',
 'qere_trailer_utf8',
 'qere_utf8',
 'rank_lex',
 'rela',
 'sp',
 'st',
 'tab',
 'trailer',
 'trailer_utf8',
 'txt',
 'typ',
 'uvf',
 'vbe',
 'vbs',
 'verse',
 'voc_lex',
 'voc_lex_utf8',
 'vs',
 'vt']

We attach the file `essential` to
[ETCBC/bhsa release 1.8.1](https://github.com/ETCBC/bhsa/releases/tag/v1.8.1)