# Memory of TF data

Once TF features get loaded into memory, how much memory do they actually use?

And can we reduce that footprint?

In this notebook I am following this
[Hands-On Exploration of Python Memory Usage](https://code.tutsplus.com/tutorials/understand-how-much-memory-your-python-objects-use--cms-25609)

We take the
[Dead Sea Scrolls](https://github.com/etcbc/dss) as our leading example.

In [7]:
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
from reprlib import repr

In [8]:
from tf.app import use

In [2]:
A = use('dss:clone', checkout='clone', hoist=globals())

Using TF-app in /Users/dirk/github/annotation/app-dss/code:
	repo clone offline under ~/github (local github)
Using data in /Users/dirk/github/etcbc/dss/tf/0.4:
	repo clone offline under ~/github (local github)
Using data in /Users/dirk/github/etcbc/dss/parallels/tf/0.4:
	repo clone offline under ~/github (local github)


# WARP features

Every TF dataset has the `otype` and `oslots` features.
Let's see how much memory space they consume.

However, this is not the whole story. We need a deep, recursive `getsizeof` and we use the approach from this
[recipe](https://code.activestate.com/recipes/577504/)

In [9]:
def deepSize(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}

    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)


##### Example call #####

# if __name__ == '__main__':
#    d = dict(a=1, b=2, c=3, d=[4,5,6,7], e='a string of chars')
#    print(deepSize(d, verbose=True))

In [18]:
for ft in (
  '__levels__',
  '__sections__',
  '__boundary__',
  '__order__',
  '__rank__',
  '__levUp__',
  '__levDown__',
  'otype',
  'oslots',
  'type',
  'glyph',
  'glypho',
  'rec',
):
  data = TF.features[ft].data
  lData = len(data)
  tData = type(data)
  sData = deepSize(data)
  print(f'''
Feature {ft} is a {tData} of length {lData:>7,}
\tsize = {sData:>10,} bytes
''')


Feature __levels__ is a <class 'tuple'> of length       7
	size =      1,579 bytes


Feature __sections__ is a <class 'tuple'> of length       2
	size =  7,750,393 bytes


Feature __boundary__ is a <class 'tuple'> of length       2
	size = 123,237,760 bytes


Feature __order__ is a <class 'array.array'> of length 2,107,856
	size =  8,958,464 bytes


Feature __rank__ is a <class 'array.array'> of length 2,107,856
	size =  8,958,464 bytes


Feature __levUp__ is a <class 'tuple'> of length 2,107,856
	size = 494,912,728 bytes


Feature __levDown__ is a <class 'tuple'> of length 677,618
	size = 114,895,720 bytes


Feature otype is a <class 'tuple'> of length 677,620
	size =  5,421,415 bytes


Feature oslots is a <class 'tuple'> of length 677,619
	size = 316,032,008 bytes


Feature type is a <class 'dict'> of length 2,032,329
	size = 249,123,881 bytes


Feature glyph is a <class 'dict'> of length 1,838,363
	size = 268,732,701 bytes


Feature glypho is a <class 'dict'> of length 1,838,363
	s

In [17]:
x = 1000000000
print(f'{x:>7,}')

1,000,000,000
