Skip to content

Releases: anthonylee991/cgc

CGC v0.7.0

08 Mar 12:57

Choose a tag to compare

What's New

GLiNER2 replaces 4-model extraction pipeline

The default graph extraction pipeline has been upgraded from a complex 4-model chain (spaCy + GliNER + GliREL + E5 routing) to a single GLiNER2 model that handles both entity recognition and relation extraction natively.

Benchmark results (10 test cases):

Metric v1 (old) v2 (new) Change
Macro F1 0.47 0.52 +11%
Model load 27s 5s 5x faster
Inference 908ms 768ms 15% faster
Dependencies 4 1 75% fewer

Key changes

  • New default: pip install cgc[extraction] now installs gliner2 (single dependency)
  • Legacy preserved: pip install cgc[extraction-v1] for the old pipeline, or use HybridExtractor(pipeline="v1")
  • Eliminated: spaCy tokenization, GliREL, char-to-token conversion bridge, E5 domain routing
  • Kept: Pattern matcher (50+ regex patterns), filters, deduplication, industry packs

Files

  • cgc/discovery/gliner2.py — New GLiNER2 extractor module
  • cgc/discovery/extractor.py — v1/v2 pipeline switching
  • benchmarks/extraction_benchmark.py — Reproducible benchmark suite

Full Changelog: v0.6.0...v0.7.0