gcf-python

Python implementation of GCF (Graph Compact Format).

84% fewer tokens than JSON. 32% fewer than TOON. 100% LLM comprehension accuracy at 500 symbols, where JSON fails.

Install

pip install gcf-python

Zero dependencies. Pure Python. Python 3.9+. Includes CLI.

CLI

gcf encode < payload.json    # JSON to GCF
gcf decode < payload.gcf     # GCF to JSON
gcf stats  < payload.json    # token comparison with visual bar

Payload: 50 symbols, 20 edges

  JSON  ██████████████████████████████  4,200 tokens
  GCF   ████████░░░░░░░░░░░░░░░░░░░░░░  1,150 tokens

  Savings: 73% fewer tokens with GCF

Library

Quick Start

from gcf import encode, Payload, Symbol, Edge

p = Payload(
    tool="context_for_task",
    token_budget=5000,
    tokens_used=1847,
    symbols=[
        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
    ],
    edges=[
        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
    ],
)

output = encode(p)

Output:

GCF tool=context_for_task budget=5000 tokens=1847 symbols=2
## targets
@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
## related
@1 fn pkg.NewServer 0.54 lsp_resolved
## edges
@0<@1 calls

Decode

from gcf import decode

p = decode(input_text)
print(p.tool, len(p.symbols), "symbols", len(p.edges), "edges")

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

from gcf import encode_with_session, Session, Payload, Symbol

sess = Session()

out1 = encode_with_session(payload1, sess)  # full declarations
out2 = encode_with_session(payload2, sess)  # reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

from gcf import encode_delta, DeltaPayload, Symbol, Edge

delta = DeltaPayload(
    tool="context_for_task",
    base_root="aaa111",
    new_root="bbb222",
    removed=[Symbol(qualified_name="pkg.OldFunc", kind="function")],
    added=[Symbol(qualified_name="pkg.NewFunc", kind="function", score=0.85, provenance="rwr")],
    delta_tokens=30,
    full_tokens=200,
)

output = encode_delta(delta)

81.2% savings on re-queries where the pack changed slightly.

Generic Encoding

Encode any Python value (not just graph payloads) into GCF tabular format:

from gcf import encode_generic

output = encode_generic({
    "employees": [
        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
    ],
})

Output:

## employees [2]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000

Works on dicts, lists, and primitives. Lists of uniform dicts get tabular rows. Nested dicts use ## key section headers.

API

Function	Description
`encode(p: Payload) -> str`	Encode a graph payload to GCF text
`encode_generic(data: Any) -> str`	Encode any value to GCF tabular format
`decode(input_text: str) -> Payload`	Parse GCF text back to a Payload
`encode_with_session(p: Payload, s: Session) -> str`	Encode with session deduplication
`encode_delta(d: DeltaPayload) -> str`	Encode a delta (added/removed only)
`Session()`	Create a new session tracker (thread-safe)

Types

Type	Purpose
`Payload`	Full GCF payload: tool, budget, symbols, edges, pack root
`Symbol`	Graph node: qualified name, kind, score, provenance, distance
`Edge`	Directed relationship: source, target, edge type
`DeltaPayload`	Diff between two packs: added/removed symbols and edges
`Session`	Thread-safe tracker for multi-call deduplication
`KIND_ABBREV` / `KIND_EXPAND`	Bidirectional kind abbreviation dicts

Comprehension Eval

Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. Six structured extraction questions sent to an LLM:

Format	Accuracy	Tokens	vs JSON
GCF	100% (6/6)	11,090	79% fewer
TOON	100% (6/6)	16,378	69% fewer
JSON	66.7% (4/6)	53,341	baseline

JSON failed on counting tasks. GCF and TOON both achieved perfect accuracy. GCF does it in 32% fewer tokens.

Token Efficiency (TOON's Own Benchmark)

Running TOON's benchmark harness with GCF inserted (their datasets, their tokenizer):

Track	GCF	TOON	Result
Mixed-structure (nested, semi-uniform)	169,554	227,896	GCF 34% smaller
Flat-only (tabular)	66,026	67,837	GCF 3% smaller
Semi-uniform event logs	107,269	154,032	GCF 44% smaller

GCF wins on every dataset except deeply nested config (75 tokens on a 618-token payload). On semi-uniform data, GCF uses 44% fewer tokens than TOON.

Reproducible: blackwell-systems/toon@gcf-comparison

Other Implementations

Go: github.com/blackwell-systems/gcf-go
TypeScript: github.com/blackwell-systems/gcf-typescript
Specification: github.com/blackwell-systems/gcf

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
src/gcf		src/gcf
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gcf-python

Install

CLI

Library

Quick Start

Decode

Session Deduplication

Delta Encoding

Generic Encoding

API

Types

Comprehension Eval

Token Efficiency (TOON's Own Benchmark)

Other Implementations

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gcf-python

Install

CLI

Library

Quick Start

Decode

Session Deduplication

Delta Encoding

Generic Encoding

API

Types

Comprehension Eval

Token Efficiency (TOON's Own Benchmark)

Other Implementations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages