Lightweight document knowledge graph builder.
Inspired by RAG knowledge graph trends but focused on lightweight document graph building. Extract entities and relationships from unstructured text and query them with a simple Python API -- no heavy NLP frameworks required.
graph TD
A[Document Text] --> B[Entity Extractor]
B --> C[Relationship Detector]
C --> D[KnowledgeGraph]
D --> E[Query API]
D --> F[Path Finding]
D --> G[Subgraph Extraction]
D --> H[JSON Export]
subgraph "Entity Extraction"
B --> B1[Persons]
B --> B2[Organizations]
B --> B3[Locations]
B --> B4[Dates / Emails / URLs]
end
subgraph "Graph Algorithms"
F --> F1[BFS]
F --> F2[Shortest Path]
end
- Regex-based entity extraction -- recognizes persons, organizations, locations, dates, emails, URLs, and monetary amounts
- Co-occurrence relationship detection -- finds relationships based on textual proximity
- Custom lightweight graph -- BFS traversal, shortest path, subgraph extraction with zero external graph dependencies
- Document tracking -- every entity and relationship is traced back to its source document
- JSON export -- serialize your entire knowledge graph for downstream use
pip install -e .from docgraph import KnowledgeGraph
kg = KnowledgeGraph()
# Add documents
kg.add_document(
"Alice Johnson is the CEO of Acme Corp, headquartered in New York.",
doc_id="press-release-1"
)
kg.add_document(
"Bob Smith joined Acme Corp as CTO. He previously worked with Alice Johnson at Global Technologies.",
doc_id="press-release-2"
)
# Query an entity
result = kg.query("Alice Johnson")
print(result["neighbors"]) # ['Acme Corp', 'New York', ...]
# Find shortest path
path = kg.find_path("Bob Smith", "New York")
print(path) # ['Bob Smith', 'Acme Corp', 'New York']
# Get neighborhood subgraph
subgraph = kg.get_subgraph("Acme Corp", depth=1)
# Export full graph
json_str = kg.export_json()
# Graph statistics
print(kg.stats())
# {'nodes': 6, 'edges': 5, 'density': 0.3333, 'documents': 2}from docgraph import DocGraphConfig, KnowledgeGraph
config = DocGraphConfig(
max_entities=5000,
co_occurrence_window=100,
min_relationship_weight=0.2,
)
kg = KnowledgeGraph(config=config)Or via environment variables (see .env.example):
export MAX_ENTITIES=10000
export LOG_LEVEL=DEBUG# Install dev dependencies
make dev
# Run tests
make test
# Lint + typecheck + test
make allSee CONTRIBUTING.md for full guidelines.
src/docgraph/
__init__.py # Public API
config.py # Configuration
core.py # Entity, Relationship, KnowledgeGraph
utils.py # Text processing, regex patterns, graph algorithms
tests/
test_core.py # Unit tests
docs/
ARCHITECTURE.md
MIT -- see LICENSE.
Built by Officethree Technologies | Made with ❤️ and AI