# Performance Comparison: Text-Fabric vs Context-Fabric

This notebook compares loading performance and memory consumption between:
- **Text-Fabric (TF)**: The original implementation using pickle/gzip caching
- **Context-Fabric (CF)**: The new implementation using memory-mapped numpy arrays

We test against the BHSA (Biblia Hebraica Stuttgartensia Amstelodamensis) corpus.

In [1]:
import os
import sys
import gc
import time
import shutil
import psutil
from pathlib import Path

# Add Context-Fabric packages to path
sys.path.insert(0, str(Path.cwd().parent / 'packages'))

# Source paths
BHSA_SOURCE = '/Users/cody/github/etcbc/bhsa/tf/2021'
TFX_CACHE = Path(BHSA_SOURCE) / '.tf'
CFM_CACHE = Path(BHSA_SOURCE) / '.cfm'

print(f"BHSA Source: {BHSA_SOURCE}")
print(f"TFX Cache: {TFX_CACHE}")
print(f"CFM Cache: {CFM_CACHE}")

BHSA Source: /Users/cody/github/etcbc/bhsa/tf/2021
TFX Cache: /Users/cody/github/etcbc/bhsa/tf/2021/.tf
CFM Cache: /Users/cody/github/etcbc/bhsa/tf/2021/.cfm


## Utility Functions

In [2]:
def get_memory_usage_mb():
    """Get current process memory usage in MB."""
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024

def get_cache_size_mb(cache_path):
    """Get total size of cache directory in MB."""
    if not cache_path.exists():
        return 0
    total = 0
    for p in cache_path.rglob('*'):
        if p.is_file():
            total += p.stat().st_size
    return total / 1024 / 1024

def clear_caches():
    """Remove both TFX and CFM caches."""
    for cache in [TFX_CACHE, CFM_CACHE]:
        if cache.exists():
            shutil.rmtree(cache)
            print(f"Removed: {cache}")
        else:
            print(f"Not found: {cache}")

class TimingResult:
    """Store timing and memory results."""
    def __init__(self, name):
        self.name = name
        self.compile_time = 0
        self.load_time = 0
        self.memory_before = 0
        self.memory_after = 0
        self.cache_size = 0
    
    @property
    def memory_used(self):
        return self.memory_after - self.memory_before
    
    def __repr__(self):
        return (f"{self.name}:\n"
                f"  Compile time: {self.compile_time:.2f}s\n"
                f"  Load time:    {self.load_time:.2f}s\n"
                f"  Memory used:  {self.memory_used:.1f} MB\n"
                f"  Cache size:   {self.cache_size:.1f} MB")

## Step 1: Clear All Caches

Remove any existing `.tf` (Text-Fabric) and `.cfm` (Context-Fabric) caches to ensure fresh compilation.

In [3]:
# Clear all caches
clear_caches()

# Force garbage collection
gc.collect()

print("\nCaches cleared.")

Removed: /Users/cody/github/etcbc/bhsa/tf/2021/.tf


Removed: /Users/cody/github/etcbc/bhsa/tf/2021/.cfm

Caches cleared.


## Step 2: Text-Fabric Performance Test

Test the original Text-Fabric implementation using the low-level Fabric class.

In [4]:
from tf.fabric import Fabric as TFFabric

tf_result = TimingResult("Text-Fabric")

# Measure memory before
gc.collect()
tf_result.memory_before = get_memory_usage_mb()
print(f"Memory before: {tf_result.memory_before:.1f} MB")

# Time the initial load (compilation)
print("\nLoading Text-Fabric (first load - compiles cache)...")
start = time.perf_counter()
tf_TF = TFFabric(locations=BHSA_SOURCE, silent='deep')
tf_api = tf_TF.loadAll(silent='deep')  # loadAll loads all features
tf_result.compile_time = time.perf_counter() - start
print(f"Compile time: {tf_result.compile_time:.2f}s")

# Measure memory after
tf_result.memory_after = get_memory_usage_mb()
print(f"Memory after: {tf_result.memory_after:.1f} MB")
print(f"Memory used: {tf_result.memory_used:.1f} MB")

# Check cache size
tf_result.cache_size = get_cache_size_mb(TFX_CACHE)
print(f"Cache size: {tf_result.cache_size:.1f} MB")

Memory before: 72.4 MB

Loading Text-Fabric (first load - compiles cache)...


Compile time: 61.46s
Memory after: 4905.3 MB
Memory used: 4832.9 MB
Cache size: 137.9 MB


In [5]:
# Verify Text-Fabric loaded correctly
print("Verification - Checking F.otype:")
print(f"Max slot: {tf_api.F.otype.maxSlot}")
print(f"Max node: {tf_api.F.otype.maxNode}")
print(f"Node 1 type: {tf_api.F.otype.v(1)}")
print(f"First word: {tf_api.F.g_word_utf8.v(1)}")

Verification - Checking F.otype:
Max slot: 426590
Max node: 1446831
Node 1 type: word
First word: בְּ


In [6]:
# Clean up Text-Fabric objects to free memory
del tf_TF, tf_api
gc.collect()
print("Text-Fabric objects cleaned up.")

Text-Fabric objects cleaned up.


In [7]:
# Test reload time (from cache)
print("Reloading Text-Fabric (from cache)...")
gc.collect()

start = time.perf_counter()
tf_TF2 = TFFabric(locations=BHSA_SOURCE, silent='deep')
tf_api2 = tf_TF2.loadAll(silent='deep')
tf_result.load_time = time.perf_counter() - start
print(f"Load time (from cache): {tf_result.load_time:.2f}s")

# Clean up
del tf_TF2, tf_api2
gc.collect()

Reloading Text-Fabric (from cache)...


Load time (from cache): 7.30s


3984883

## Step 3: Context-Fabric Performance Test

Test the new Context-Fabric implementation with memory-mapped numpy arrays.

In [8]:
# Clear CFM cache (TFX cache will remain for fair comparison)
if CFM_CACHE.exists():
    shutil.rmtree(CFM_CACHE)
    print(f"Removed: {CFM_CACHE}")

gc.collect()
print("Ready for Context-Fabric test.")

Ready for Context-Fabric test.


In [9]:
from cfabric.core.fabric import Fabric as CFFabric

cf_result = TimingResult("Context-Fabric")

# Measure memory before
gc.collect()
cf_result.memory_before = get_memory_usage_mb()
print(f"Memory before: {cf_result.memory_before:.1f} MB")

# Time the initial load (compilation)
print("\nLoading Context-Fabric (first load - compiles cache)...")
start = time.perf_counter()
cf_TF = CFFabric(locations=BHSA_SOURCE, silent='deep')
cf_api = cf_TF.load('')  # Load all features
cf_result.compile_time = time.perf_counter() - start
print(f"Compile time: {cf_result.compile_time:.2f}s")

# Measure memory after
cf_result.memory_after = get_memory_usage_mb()
print(f"Memory after: {cf_result.memory_after:.1f} MB")
print(f"Memory used: {cf_result.memory_used:.1f} MB")

# Check cache size
cf_result.cache_size = get_cache_size_mb(CFM_CACHE)
print(f"Cache size: {cf_result.cache_size:.1f} MB")

Memory before: 1535.2 MB

Loading Context-Fabric (first load - compiles cache)...


   |     0.19s T otype                from ~/github/etcbc/bhsa/tf/2021


   |     3.39s T oslots               from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@pt              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@en              from ~/github/etcbc/bhsa/tf/2021


   |     0.30s T trailer              from ~/github/etcbc/bhsa/tf/2021


   |     0.41s T lex_utf8             from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@da              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@nl              from ~/github/etcbc/bhsa/tf/2021


   |     0.39s T g_cons_utf8          from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@yo              from ~/github/etcbc/bhsa/tf/2021


   |     0.01s T verse                from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@la              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@syc             from ~/github/etcbc/bhsa/tf/2021


   |     0.31s T trailer_utf8         from ~/github/etcbc/bhsa/tf/2021


   |     0.39s T g_lex                from ~/github/etcbc/bhsa/tf/2021


   |     0.01s T chapter              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@hi              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@el              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@fr              from ~/github/etcbc/bhsa/tf/2021


   |     0.42s T g_lex_utf8           from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@de              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T qere_utf8            from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@ru              from ~/github/etcbc/bhsa/tf/2021


   |     0.02s T book                 from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@es              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@he              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@ur              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T qere_trailer_utf8    from ~/github/etcbc/bhsa/tf/2021


   |     0.44s T g_word_utf8          from ~/github/etcbc/bhsa/tf/2021


   |     0.39s T g_cons               from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@sw              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@id              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@tr              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@am              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@pa              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@bn              from ~/github/etcbc/bhsa/tf/2021


   |     0.41s T g_word               from ~/github/etcbc/bhsa/tf/2021


   |     0.43s T voc_lex_utf8         from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T qere_trailer         from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@ko              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T qere                 from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@zh              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@ar              from ~/github/etcbc/bhsa/tf/2021


   |     0.39s T lex                  from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@ja              from ~/github/etcbc/bhsa/tf/2021


   |     0.00s T book@fa              from ~/github/etcbc/bhsa/tf/2021


   |      |     0.36s C __levels__           from otype, oslots, otext


   |      |     6.60s C __order__            from otype, oslots, __levels__


   |      |     0.24s C __rank__             from otype, __order__


   |      |     5.74s C __levUp__            from otype, oslots, __rank__


   |      |     4.47s C __levDown__          from otype, __levUp__, __rank__


   |      |     0.78s C __characters__       from otext


   |      |     0.87s C __boundary__         from otype, oslots, __rank__


   |      |     0.13s C __sections__         from otype, oslots, otext, __levUp__, __levDown__, __levels__, book, chapter, verse


    27s Compiling ~/github/etcbc/bhsa/tf/2021 to ~/github/etcbc/bhsa/tf/2021/.cfm/1


    27s Loading WARP features...


    30s Compiling WARP features...


    30s Running precomputation...


    30s   Computing levels...


    30s   get ranking of otypes


    31s   results:


    31s   book           : 10938.21 {426591-426629}


    31s   chapter        :   459.19 {426630-427558}


    31s   lex            :    46.22 {1437602-1446831}


    31s   verse          :    18.38 {1414389-1437601}


    31s   half_verse     :     9.44 {606394-651572}


    31s   sentence       :      6.7 {1172308-1236024}


    31s   sentence_atom  :     6.61 {1236025-1300538}


    31s   clause         :     4.84 {427559-515689}


    31s   clause_atom    :      4.7 {515690-606393}


    31s   phrase         :     1.68 {651573-904775}


    31s   phrase_atom    :     1.59 {904776-1172307}


    31s   subphrase      :     1.42 {1300539-1414388}


    31s   word           :        1 {1-426590}


    31s   Computing order...


    31s   assigning otype levels to nodes


    31s   sorting nodes


    37s   Computing rank...


    37s   ranking nodes


    37s   Computing levUp...


    37s   making inverse of edge feature oslots


    38s   listing embedders of all nodes


    44s   Computing levDown...


    44s   inverting embedders


    45s   turning embeddees into list


    49s   Computing boundary...


    50s Loading regular features...


 1m 16s Compiling node features...


 1m 16s   Compiling book@he (str)...


 1m 16s   Compiling g_vbe_utf8 (str)...


 1m 16s   Compiling function (str)...


 1m 16s   Compiling uvf (str)...


 1m 16s   Compiling g_vbs (str)...


 1m 16s   Compiling vbe (str)...


 1m 16s   Compiling book@ja (str)...


 1m 16s   Compiling prs_gn (str)...


 1m 16s   Compiling dist (int)...


 1m 16s Node feature 'dist' contains value -1 which collides with the None sentinel. This value will be incorrectly read as None after loading.


 1m 16s   Compiling instruction (str)...


 1m 16s   Compiling tab (int)...


 1m 16s   Compiling qere_trailer_utf8 (str)...


 1m 16s   Compiling book@la (str)...


 1m 16s   Compiling prs (str)...


 1m 16s   Compiling prs_nu (str)...


 1m 16s   Compiling root (str)...


 1m 16s   Compiling prs_ps (str)...


 1m 16s   Compiling book@id (str)...


 1m 16s   Compiling pargr (str)...


 1m 16s   Compiling qere (str)...


 1m 16s   Compiling book@pt (str)...


 1m 16s   Compiling book (str)...


 1m 16s   Compiling vbs (str)...


 1m 16s   Compiling g_vbe (str)...


 1m 16s   Compiling g_uvf (str)...


 1m 16s   Compiling g_vbs_utf8 (str)...


 1m 16s   Compiling languageISO (str)...


 1m 17s   Compiling kq_hybrid (str)...


 1m 17s   Compiling book@am (str)...


 1m 17s   Compiling book@ko (str)...


 1m 17s   Compiling verse (int)...


 1m 17s   Compiling book@el (str)...


 1m 17s   Compiling language (str)...


 1m 17s   Compiling ps (str)...


 1m 17s   Compiling label (str)...


 1m 17s   Compiling nu (str)...


 1m 17s   Compiling g_word_utf8 (str)...


 1m 17s   Compiling lex0 (str)...


 1m 17s   Compiling book@ru (str)...


 1m 17s   Compiling book@yo (str)...


 1m 17s   Compiling trailer_utf8 (str)...


 1m 17s   Compiling lexeme_count (int)...


 1m 17s   Compiling gn (str)...


 1m 17s   Compiling qere_utf8 (str)...


 1m 17s   Compiling g_prs (str)...


 1m 17s   Compiling vs (str)...


 1m 17s   Compiling ls (str)...


 1m 17s   Compiling book@hi (str)...


 1m 17s   Compiling book@zh (str)...


 1m 17s   Compiling g_prs_utf8 (str)...


 1m 17s   Compiling book@nl (str)...


 1m 17s   Compiling book@fr (str)...


 1m 17s   Compiling book@en (str)...


 1m 18s   Compiling rank_occ (int)...


 1m 18s   Compiling is_root (str)...


 1m 18s   Compiling typ (str)...


 1m 18s   Compiling vt (str)...


 1m 18s   Compiling g_nme (str)...


 1m 18s   Compiling txt (str)...


 1m 18s   Compiling code (int)...


 1m 18s   Compiling mother_object_type (str)...


 1m 18s   Compiling number (int)...


 1m 18s   Compiling book@es (str)...


 1m 18s   Compiling suffix_person (str)...


 1m 18s   Compiling book@bn (str)...


 1m 18s   Compiling voc_lex (str)...


 1m 18s   Compiling book@ar (str)...


 1m 18s   Compiling g_pfm (str)...


 1m 18s   Compiling sp (str)...


 1m 18s   Compiling g_cons (str)...


 1m 18s   Compiling g_word (str)...


 1m 18s   Compiling rank_lex (int)...


 1m 19s   Compiling voc_lex_utf8 (str)...


 1m 19s   Compiling book@tr (str)...


 1m 19s   Compiling g_uvf_utf8 (str)...


 1m 19s   Compiling suffix_gender (str)...


 1m 19s   Compiling book@ur (str)...


 1m 19s   Compiling book@sw (str)...


 1m 19s   Compiling g_lex (str)...


 1m 19s   Compiling st (str)...


 1m 19s   Compiling pdp (str)...


 1m 19s   Compiling freq_occ (int)...


 1m 19s   Compiling dist_unit (str)...


 1m 19s   Compiling domain (str)...


 1m 19s   Compiling nametype (str)...


 1m 19s   Compiling nme (str)...


 1m 19s   Compiling g_lex_utf8 (str)...


 1m 19s   Compiling rela (str)...


 1m 19s   Compiling suffix_number (str)...


 1m 19s   Compiling qere_trailer (str)...


 1m 19s   Compiling g_pfm_utf8 (str)...


 1m 19s   Compiling g_nme_utf8 (str)...


 1m 19s   Compiling book@da (str)...


 1m 19s   Compiling chapter (int)...


 1m 19s   Compiling kind (str)...


 1m 20s   Compiling lex (str)...


 1m 20s   Compiling lex_utf8 (str)...


 1m 20s   Compiling book@syc (str)...


 1m 20s   Compiling gloss (str)...


 1m 20s   Compiling g_cons_utf8 (str)...


 1m 20s   Compiling det (str)...


 1m 20s   Compiling book@pa (str)...


 1m 20s   Compiling pfm (str)...


 1m 20s   Compiling book@de (str)...


 1m 20s   Compiling kq_hybrid_utf8 (str)...


 1m 20s   Compiling freq_lex (int)...


 1m 20s   Compiling book@fa (str)...


 1m 20s   Compiling trailer (str)...


 1m 20s Compiling edge features...


 1m 20s   Compiling functional_parent (edge, values=False)...


 1m 23s   Compiling omap@2017-2021 (edge, values=True)...


 1m 26s   Compiling distributional_parent (edge, values=False)...


 1m 28s   Compiling mother (edge, values=False)...


 1m 31s   Compiling omap@c-2021 (edge, values=True)...


 1m 33s Compilation complete


Compile time: 94.75s
Memory after: 1595.8 MB
Memory used: 60.6 MB


Cache size: 859.2 MB


In [10]:
# Verify Context-Fabric loaded correctly
print("Verification - Checking F.otype:")
print(f"Max slot: {cf_api.F.otype.maxSlot}")
print(f"Max node: {cf_api.F.otype.maxNode}")
print(f"Node 1 type: {cf_api.F.otype.v(1)}")
print(f"First word: {cf_api.F.g_word_utf8.v(1)}")

Verification - Checking F.otype:
Max slot: 426590
Max node: 1446831
Node 1 type: word
First word: בְּ


In [11]:
# Clean up Context-Fabric objects
del cf_TF, cf_api
gc.collect()
print("Context-Fabric objects cleaned up.")

Context-Fabric objects cleaned up.


In [12]:
# Test reload time (from cache)
print("Reloading Context-Fabric (from cache)...")
gc.collect()

start = time.perf_counter()
cf_TF2 = CFFabric(locations=BHSA_SOURCE, silent='deep')
cf_api2 = cf_TF2.load('')
cf_result.load_time = time.perf_counter() - start
print(f"Load time (from cache): {cf_result.load_time:.2f}s")

# Clean up
del cf_TF2, cf_api2
gc.collect()

Reloading Context-Fabric (from cache)...
  0.00s Loading from ~/github/etcbc/bhsa/tf/2021/.cfm/1


  2.53s All features loaded from .cfm format


Load time (from cache): 2.53s


997

## Step 4: Results Comparison

In [13]:
print("=" * 60)
print("PERFORMANCE COMPARISON RESULTS")
print("=" * 60)
print()
print(tf_result)
print()
print(cf_result)
print()
print("=" * 60)
print("SPEEDUP ANALYSIS")
print("=" * 60)

compile_speedup = tf_result.compile_time / cf_result.compile_time if cf_result.compile_time > 0 else 0
load_speedup = tf_result.load_time / cf_result.load_time if cf_result.load_time > 0 else 0
memory_ratio = cf_result.memory_used / tf_result.memory_used if tf_result.memory_used > 0 else 0
cache_ratio = cf_result.cache_size / tf_result.cache_size if tf_result.cache_size > 0 else 0

print(f"\nCompile speedup:     {compile_speedup:.2f}x {'faster' if compile_speedup > 1 else 'slower'}")
print(f"Load speedup:        {load_speedup:.2f}x {'faster' if load_speedup > 1 else 'slower'}")
print(f"Memory reduction:    {(1 - memory_ratio) * 100:.1f}%" if memory_ratio < 1 else f"Memory increase: {(memory_ratio - 1) * 100:.1f}%")
print(f"Cache size ratio:    {cache_ratio:.2f}x" + (" smaller" if cache_ratio < 1 else " larger"))

PERFORMANCE COMPARISON RESULTS

Text-Fabric:
  Compile time: 61.46s
  Load time:    7.30s
  Memory used:  4832.9 MB
  Cache size:   137.9 MB

Context-Fabric:
  Compile time: 94.75s
  Load time:    2.53s
  Memory used:  60.6 MB
  Cache size:   859.2 MB

SPEEDUP ANALYSIS

Compile speedup:     0.65x slower
Load speedup:        2.89x faster
Memory reduction:    98.7%
Cache size ratio:    6.23x larger


In [14]:
# Create a summary table
print("\n" + "=" * 70)
print(f"{'Metric':<20} {'Text-Fabric':<15} {'Context-Fabric':<15} {'Improvement':<15}")
print("=" * 70)
print(f"{'Compile Time (s)':<20} {tf_result.compile_time:<15.2f} {cf_result.compile_time:<15.2f} {compile_speedup:.2f}x")
print(f"{'Load Time (s)':<20} {tf_result.load_time:<15.2f} {cf_result.load_time:<15.2f} {load_speedup:.2f}x")
print(f"{'Memory Used (MB)':<20} {tf_result.memory_used:<15.1f} {cf_result.memory_used:<15.1f} {(1 - memory_ratio) * 100:.1f}%")
print(f"{'Cache Size (MB)':<20} {tf_result.cache_size:<15.1f} {cf_result.cache_size:<15.1f} {(1 - cache_ratio) * 100:.1f}%")
print("=" * 70)


Metric               Text-Fabric     Context-Fabric  Improvement    
Compile Time (s)     61.46           94.75           0.65x
Load Time (s)        7.30            2.53            2.89x
Memory Used (MB)     4832.9          60.6            98.7%
Cache Size (MB)      137.9           859.2           -523.0%


## Notes

- **Compile time**: Time to parse `.tf` files and build the cache (first load)
- **Load time**: Time to load from cache (subsequent loads)
- **Memory used**: Increase in process memory after loading
- **Cache size**: Total size of cache files on disk

Context-Fabric uses memory-mapped numpy arrays, which should show:
- Lower memory footprint (data stays on disk, mapped into virtual memory)
- Faster load times (no deserialization needed)
- Potentially different cache sizes (binary arrays vs compressed pickle)