<img align="right" src="tf-small.png"/>

# ETCBC nodes

In this notebook we try to map the **non-slot** nodes between the versions 4, 4b and 4c of the ETCBC dataset.
An other notebook has produced a mapping between the **slots** of those versions, and we want to extend that
mapping to nodes in general.

If we succeed, then text-fabric notebooks that are based on an older version of the data, can also be used unmodified on newer versions of the data.

In general, node mappings between versions can not be perfect. We try and see how far we get.

# Basic idea

We start out with a very simple idea: nodes are linked to slots. In order to map a node in version x, we look at its slots in version x, map those to slots in version y, and see which nodes in version y are linked to those slots.
They are good candidates for the mapping.

In [1]:
import os,collections
from functools import reduce
from tf.fabric import Fabric

In [2]:
locations = {
    '4': '~/github/text-fabric-data-legacy',
    '4b': '~/github/text-fabric-data-legacy',
    '4c': '~/github/text-fabric-data', 
}
versions = ['4', '4b', '4c']
TF = {}
api = {}
for v in versions:
    TF[v] = Fabric(locations=locations[v], modules='hebrew/etcbc{}'.format(v))
    api[v] = TF[v].load('''
        g_word lex
    ''')
A4 = api['4']
A4b = api['4b']
A4c = api['4c']

This is Text-Fabric 2.3.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
110 features found and 0 ignored
  0.00s loading features ...
   |     0.15s B g_word               from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4
   |     0.13s B lex                  from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4
   |     0.00s Feature overview: 105 nodes; 4 edges; 1 configs; 7 computeds
  4.99s All features loaded/computed - for details use loadLog()
This is Text-Fabric 2.3.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/bl

# Exploration

Let us see what has happened to the phrases between 4 and 4b.

In [3]:
TF['4b'].load('omap@4-4b', add=True)

  0.00s loading features ...
   |     0.64s B omap@4-4b            from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
  1.48s All additional features loaded - for details use loadLog()


In [None]:
mapping = {}
for p in A4.F.otype.s('phrase'):
    slots = E.oslots.f(p)
    mappedSlots = reduce(
        lambda x,y: x+y,
        [A4b.Es('omap@4-4b').f(s) for s in slots],
        [],
    )

In [4]:
A4b.Es('omap@4-4b').f(100000)

(100002,)

In [5]:
mappedSlots = reduce(
    lambda x,y: x+y,
    [A4b.Es('omap@4-4b').f(s) for s in range(100)],
    (),
)

In [6]:
for x in [A4b.Es('omap@4-4b').f(s) for s in range(100)]:
    print(x)

()
(1,)
(2,)
(3,)
(4,)
(5,)
(6,)
(7,)
(8,)
(9,)
(10,)
(11,)
(12,)
(13,)
(14,)
(15,)
(16,)
(17,)
(18,)
(19,)
(20,)
(21,)
(22,)
(23,)
(24,)
(25,)
(26,)
(27,)
(28,)
(29,)
(30,)
(31,)
(32,)
(33,)
(34,)
(35,)
(36,)
(37,)
(38,)
(39,)
(40,)
(41,)
(42,)
(43,)
(44,)
(45,)
(46,)
(47,)
(48,)
(49,)
(50,)
(51,)
(52,)
(53,)
(54,)
(55,)
(56,)
(57,)
(58,)
(59,)
(60,)
(61,)
(62,)
(63,)
(64,)
(65,)
(66,)
(67,)
(68,)
(69,)
(70,)
(71,)
(72,)
(73,)
(74,)
(75,)
(76,)
(77,)
(78,)
(79,)
(80,)
(81,)
(82,)
(83,)
(84,)
(85,)
(86,)
(87,)
(88,)
(89,)
(90,)
(91,)
(92,)
(93,)
(94,)
(95,)
(96,)
(97,)
(98,)
(99,)
