Skip to content

blackwell-systems/gcf-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blackwell Systems License

gcf-python

Python implementation of GCF (Graph Compact Format).

84% fewer tokens than JSON. 32% fewer than TOON. 100% LLM comprehension accuracy at 500 symbols, where JSON fails.

Install

pip install gcf-python

Zero dependencies. Pure Python. Python 3.9+. Includes CLI.

CLI

gcf encode < payload.json    # JSON to GCF
gcf decode < payload.gcf     # GCF to JSON
gcf stats  < payload.json    # token comparison with visual bar
Payload: 50 symbols, 20 edges

  JSON  ██████████████████████████████  4,200 tokens
  GCF   ████████░░░░░░░░░░░░░░░░░░░░░░  1,150 tokens

  Savings: 73% fewer tokens with GCF

Library

Quick Start

from gcf import encode, Payload, Symbol, Edge

p = Payload(
    tool="context_for_task",
    token_budget=5000,
    tokens_used=1847,
    symbols=[
        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
    ],
    edges=[
        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
    ],
)

output = encode(p)

Output:

GCF tool=context_for_task budget=5000 tokens=1847 symbols=2
## targets
@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
## related
@1 fn pkg.NewServer 0.54 lsp_resolved
## edges
@0<@1 calls

Decode

from gcf import decode

p = decode(input_text)
print(p.tool, len(p.symbols), "symbols", len(p.edges), "edges")

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

from gcf import encode_with_session, Session, Payload, Symbol

sess = Session()

out1 = encode_with_session(payload1, sess)  # full declarations
out2 = encode_with_session(payload2, sess)  # reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

from gcf import encode_delta, DeltaPayload, Symbol, Edge

delta = DeltaPayload(
    tool="context_for_task",
    base_root="aaa111",
    new_root="bbb222",
    removed=[Symbol(qualified_name="pkg.OldFunc", kind="function")],
    added=[Symbol(qualified_name="pkg.NewFunc", kind="function", score=0.85, provenance="rwr")],
    delta_tokens=30,
    full_tokens=200,
)

output = encode_delta(delta)

81.2% savings on re-queries where the pack changed slightly.

Generic Encoding

Encode any Python value (not just graph payloads) into GCF tabular format:

from gcf import encode_generic

output = encode_generic({
    "employees": [
        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
    ],
})

Output:

## employees [2]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000

Works on dicts, lists, and primitives. Lists of uniform dicts get tabular rows. Nested dicts use ## key section headers.

API

Function Description
encode(p: Payload) -> str Encode a graph payload to GCF text
encode_generic(data: Any) -> str Encode any value to GCF tabular format
decode(input_text: str) -> Payload Parse GCF text back to a Payload
encode_with_session(p: Payload, s: Session) -> str Encode with session deduplication
encode_delta(d: DeltaPayload) -> str Encode a delta (added/removed only)
Session() Create a new session tracker (thread-safe)

Types

Type Purpose
Payload Full GCF payload: tool, budget, symbols, edges, pack root
Symbol Graph node: qualified name, kind, score, provenance, distance
Edge Directed relationship: source, target, edge type
DeltaPayload Diff between two packs: added/removed symbols and edges
Session Thread-safe tracker for multi-call deduplication
KIND_ABBREV / KIND_EXPAND Bidirectional kind abbreviation dicts

Comprehension Eval

Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. Six structured extraction questions sent to an LLM:

Format Accuracy Tokens vs JSON
GCF 100% (6/6) 11,090 79% fewer
TOON 100% (6/6) 16,378 69% fewer
JSON 66.7% (4/6) 53,341 baseline

JSON failed on counting tasks. GCF and TOON both achieved perfect accuracy. GCF does it in 32% fewer tokens.

Token Efficiency (TOON's Own Benchmark)

Running TOON's benchmark harness with GCF inserted (their datasets, their tokenizer):

Track GCF TOON Result
Mixed-structure (nested, semi-uniform) 169,554 227,896 GCF 34% smaller
Flat-only (tabular) 66,026 67,837 GCF 3% smaller
Semi-uniform event logs 107,269 154,032 GCF 44% smaller

GCF wins on every dataset except deeply nested config (75 tokens on a 618-token payload). On semi-uniform data, GCF uses 44% fewer tokens than TOON.

Reproducible: blackwell-systems/toon@gcf-comparison

Other Implementations

License

MIT

About

Python implementation of GCF: token-optimized wire format for LLM tool responses. Encoder, decoder, session dedup, delta encoding. Zero dependencies.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages