# How to change feature data?

Suppose you are constructing a feature, and want to modify it as part of the construction process.

How do you do that, modify the `.tf` files directly?

Well, don't do that, although it is possible.

It is better to use TF to load the existing feature, which gives you access to its data in memory.

Then construct a dictionary with updated feature data, and use TF to save it to disk.

Here is an example for the missieven data.

In [1]:
import os

from tf.app import use


In [2]:
# A = use("missieven:latest", hoist=globals())
A = use("missieven", hoist=globals())

We are going to build a feature `upper` that delivers 1 if a word of an original letter starts with a capital. Otherwise it is 0.

We then save it, and load it.

After that we are going to modify the feature: it will be 2 if the word is all-caps, 1 if it contains a capital, but not all letters
are capitals. If there are no capitals, the result is 0.

`upper` will yield None for all words that are not original text.

# Compute `upper`

We make a dictionary that maps nodes to values for upper.

In [3]:
upper = {}

for n in F.isorig.s(1):
    trans = F.transo.v(n)
    upper[n] = 1 if trans and trans[0].isupper() else 0

len(upper)

0

That is bad. It turns out that I have forgotten to declare `isorig` as an integer valued feature:

In [4]:
F.isorig.v(63)

'1'

We will do that in a next data version.
In the meanwhile, we remedy it as follows:

In [5]:
upper = {}

for n in F.isorig.s("1"):
    trans = F.transo.v(n)
    upper[n] = 1 if trans and trans[0].isupper() else 0

len(upper)

3260840

That's better.

In [6]:
features = dict(upper=upper)

# Save `upper` (1)

We will save `upper` in a new module, called `capital`, in the `exercises` subdirectory of the tutorial directory.

In [8]:
GITHUB = os.path.expanduser("~/github")
ORG = "annotation"
REPO = "tutorials"
PATH = "missieven/exercises/capital/tf"
VERSION = A.version

location = f"{GITHUB}/{ORG}/{REPO}/{PATH}"
mod = f"{ORG}/{REPO}/{PATH}"

Note the version: we have built the version against a specific version of the data:

In [9]:
A.version

0.7

We have to specify a bit of metadata for this feature:

In [10]:
metaData = {
    "upper": dict(
        valueType="int",
        description="whether an original word starts with a capital",
        creator="Dirk Roorda",
    ),
}

Now we can give the save command:

In [11]:
TF.save(nodeFeatures=features, metaData=metaData, location=location, module=VERSION)

  0.00s Exporting 1 node and 0 edge and 0 config features to ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7:
   |     3.11s T upper                to ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7
  3.11s Exported 1 node features and 0 edge features and 0 config features to ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7


True

# Load upper

We load our dataset again, but now with the module `capital` next to it, by specifying its path.
We have that path already in the variable `mod`:

In [13]:
mod

'annotation/tutorials/missieven/exercises/capital/tf'

In [12]:
A = use("missieven", mod=f"{mod}:clone", hoist=globals())

   |     4.16s T upper                from ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7


In [14]:
caps = 0
nocaps = 0

for n in F.isorig.s("1"):
    c = F.upper.v(n)
    if c == 0:
        nocaps += 1
    if c == 1:
        caps += 1

print(f"{caps=} {nocaps=} {caps + nocaps=}")

caps=264170 nocaps=2996670 caps + nocaps=3260840


# Modify `upper`

We make a dictionary with changed values for upper.

We only have to update values that are 1 or 0.

We use node 3 for testing.

In [15]:
F.transo.v(3)

'PIETER'

Copy the feature data into a new dict:

In [16]:
upper = dict(F.upper.items())
print(f"BEFORE: {upper[3]=}")

BEFORE: upper[3]=1


Modify this dict and give it new values for the nodes in question:

In [17]:
for (n, cap) in upper.items():
    trans = F.transo.v(n) or ""
    upper[n] = sum(1 if c.isupper() else 0 for c in trans)

len(upper)
print(f"AFTER:  {upper[3]=}")

AFTER:  upper[3]=6


In [18]:
features = dict(upper=upper)

# Save `upper` (2)

We will save `upper` in the same module, with the modified meta data.

In [19]:
metaData = {
    "upper": dict(
        valueType="int",
        description="the number of capitals in a word",
        creator="Dirk Roorda",
    ),
}

By default, the save location is the directory of the last loaded module given by the `mod` parameter in the
`A.use()` call that loaded the dataset.

In [20]:
TF.save(nodeFeatures=features, metaData=metaData)

  0.00s Exporting 1 node and 0 edge and 0 config features to ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7:
   |     2.95s T upper                to ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7
  2.96s Exported 1 node features and 0 edge features and 0 config features to ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7


True

However, the updated feature is not yet loaded. We can see that with our example:

In [21]:
F.upper.v(3)

1

The updated feature would give value `6` for this node.

We reload the feature `upper`.
By passing the parameter `add=True` we tell TF to load just this feature into the existing API, rather than to create
a new API with this feature loaded.

In [22]:
TF.load("upper", add=True)

  0.00s loading features ...
   |     4.07s T upper                from ~/github/annotation/tutorials/missieven/exercises/capital/tf/0.7
  6.74s All additional features loaded - for details use loadLog()


Look at the message: the `T` in `T upper` is an indication that the feature data has been read from the newly created
`upper.tf` text file.
After reading a binary file `upper.tfx` is created with the same data, but it loads faster.
When the `.tfx` file is outdated, the `.tf` file will be read again, and a new `.tfx` will be created.
This happens behind the screens.

We check:

In [23]:
F.upper.v(3)

6

# Ad libitum

You can repeat modifying `upper` and saving it until you are done.

If the module `capital` is on GitHub, it is published and you can use it next to the missieven corpus by calling the missieven
with a suitable `mod` parameter.