Releases: anthonylee991/cgc
Releases · anthonylee991/cgc
CGC v0.7.0
What's New
GLiNER2 replaces 4-model extraction pipeline
The default graph extraction pipeline has been upgraded from a complex 4-model chain (spaCy + GliNER + GliREL + E5 routing) to a single GLiNER2 model that handles both entity recognition and relation extraction natively.
Benchmark results (10 test cases):
| Metric | v1 (old) | v2 (new) | Change |
|---|---|---|---|
| Macro F1 | 0.47 | 0.52 | +11% |
| Model load | 27s | 5s | 5x faster |
| Inference | 908ms | 768ms | 15% faster |
| Dependencies | 4 | 1 | 75% fewer |
Key changes
- New default:
pip install cgc[extraction]now installsgliner2(single dependency) - Legacy preserved:
pip install cgc[extraction-v1]for the old pipeline, or useHybridExtractor(pipeline="v1") - Eliminated: spaCy tokenization, GliREL, char-to-token conversion bridge, E5 domain routing
- Kept: Pattern matcher (50+ regex patterns), filters, deduplication, industry packs
Files
cgc/discovery/gliner2.py— New GLiNER2 extractor modulecgc/discovery/extractor.py— v1/v2 pipeline switchingbenchmarks/extraction_benchmark.py— Reproducible benchmark suite
Full Changelog: v0.6.0...v0.7.0