🧠 RKnowledge

Production-grade knowledge graph extraction CLI. Extract concepts and relationships from any document using LLMs, store in Neo4j, analyze with graph algorithms, and explore with interactive visualization.

Documentation · Architecture · Contributing

Inspired by rahulnyk/knowledge_graph — rewritten from scratch in Rust.

How We Differ

Dimension	Original (Python)	RKnowledge (Rust)
Interface	Jupyter notebook	Full CLI, 10 subcommands
LLM Providers	Ollama only	Anthropic, OpenAI, Google, Ollama + any OpenAI-compatible API
Concurrency	Sequential	Parallel LLM calls (`-j` flag)
Storage	In-memory DataFrames	Neo4j graph DB (persistent)
Incremental	Rebuild from scratch	`--append` merges into existing graph
Input Formats	PDF only	PDF, Markdown, HTML, plain text
Entity Typing	8 fixed categories	Free-form LLM classification
Graph Analytics	Degree + Louvain	PageRank, LPA communities, Dijkstra, density
Querying	None	`query`, `path`, `stats`, `communities`
Visualization	Static Pyvis	Interactive: click cards, search, toggles, legend
Export	None	JSON, CSV, GraphML, Cypher
Tests	None	118 tests (107 unit + 11 integration)
CI/CD	None	GitHub Actions: lint, test, multi-platform build
Distribution	`docker build` + Jupyter	Single binary, curl install, skills.sh

Features

Multi-format: PDF, Markdown, HTML, and plain text
Multi-provider LLM: Anthropic, OpenAI, Google, Ollama (local/free)
Concurrent extraction: Parallel LLM calls with -j flag
Smart entity typing: LLM classifies freely ("programming language", "database", etc.)
Tenant Isolation: Isolate multiple projects/users in one Neo4j instance
Manual Relation Entry: Add ground truth data directly via CLI
Domain-Aware Prompting: Specialized extraction for medical, legal, or technical docs
Neo4j backend: Persistent graph DB with Cypher, incremental --append
Graph analytics: PageRank, community detection, shortest path, density
Interactive visualization: Redesigned dashboard with entity filters and search
Multiple exports: JSON, CSV, GraphML, Cypher
Fast: Compiled Rust, single binary, zero runtime deps

Installation

One-liner

curl -fsSL https://raw.githubusercontent.com/Algiras/RKnowledge/main/install.sh | bash

From Source

git clone https://github.com/Algiras/RKnowledge.git
cd RKnowledge
cargo build --release
cp target/release/rknowledge ~/.local/bin/

As a Skill

npx skills add Algiras/RKnowledge

Install the skills CLI first if you haven't already.

Quick Start

# 1. Initialize configuration and start Neo4j
rknowledge init

# 2. Configure your LLM provider (interactive)
rknowledge auth

# 3. Build a knowledge graph from documents
rknowledge build ./docs/ --provider ollama --tenant my-project

# 4. Explore
rknowledge query "machine learning" --tenant my-project
rknowledge path "docker" "kubernetes" --tenant my-project
rknowledge stats --tenant my-project
rknowledge viz --tenant my-project

Commands

Command	Description
`init`	Initialize config and start Neo4j via Docker
`auth`	Configure API keys for LLM providers (interactive)
`build <path>`	Process documents and build knowledge graph
`query <query>`	Search graph (natural language or `cypher:` prefix) with `--depth`
`path <from> <to>`	Find shortest path between two concepts
`stats`	Graph analytics: PageRank, density, degree distribution, entity types
`communities`	List detected communities and their members
`export`	Export to JSON, CSV, GraphML, or Cypher
`viz`	Open interactive visualization in browser

Build Options

rknowledge build ./docs \
  --provider ollama \          # anthropic, openai, ollama, google
  --model mistral \            # provider-specific model name
  --output neo4j \             # neo4j, json, csv
  -j 8 \                       # concurrent LLM requests
  --append \                   # merge into existing graph
  --chunk-size 1500 \          # text chunk size (chars)
  --chunk-overlap 150          # overlap between chunks

Query Examples

# Natural language search with depth
rknowledge query "machine learning" --depth 2

# Shortest path between concepts
rknowledge path "docker" "kubernetes"

# Graph statistics and analytics
rknowledge stats

# Community detection
rknowledge communities

# Direct Cypher query
rknowledge query "cypher: MATCH (n:Concept) RETURN n.label, n.degree ORDER BY n.degree DESC LIMIT 10"

Advanced Features

Tenant Isolation

Isolate multiple projects within one Neo4j instance. Queries and stats are automatically scoped to the specified tenant.

rknowledge build ./docs --tenant client-a
rknowledge stats --tenant client-a
rknowledge viz --tenant client-a

Manual Relation Insertion

Add ground truth data directly. Perfect for linking concepts the LLM might miss or adding domain-specific "hard links".

# Interactive mode
rknowledge add --interactive

# Direct insertion
rknowledge add "Rust" "is a" "Programming Language" --type1 "Language" --type2 "Category"

Domain-Aware Prompting

Guide extraction with domain context (medical, legal, etc.) or custom focus areas.

rknowledge build ./papers --domain medical --context "Focus on drug-gene interactions"

Configuration

Configuration is stored at ~/<config_dir>/rknowledge/config.toml:

default_provider = "ollama"
default_model = "mistral"
chunk_size = 1500
chunk_overlap = 150

[providers.anthropic]
api_key = "${ANTHROPIC_API_KEY}"
# base_url = "https://api.anthropic.com"  # Change for Anthropic-compatible proxies
model = "claude-sonnet-4-20250514"

[providers.openai]
api_key = "${OPENAI_API_KEY}"
# base_url = "https://api.openai.com/v1"  # Change for Groq, DeepSeek, etc.
model = "gpt-4o"

[providers.ollama]
base_url = "http://localhost:11434"
model = "mistral"

[providers.google]
api_key = "${GOOGLE_API_KEY}"  # Also accepts GEMINI_API_KEY
model = "gemini-2.0-flash"

[neo4j]
uri = "bolt://localhost:7687"
user = "neo4j"
password = "rknowledge"
database = "neo4j"

LLM Providers

Provider	Setup	Best For
Ollama	`ollama pull mistral`	Free, local, private data
Anthropic	`export ANTHROPIC_API_KEY=...`	Highest quality extraction
OpenAI	`export OPENAI_API_KEY=...`	Good balance of quality/speed
Google	`export GOOGLE_API_KEY=...` or `GEMINI_API_KEY`	Gemini models
Groq	Set `base_url` in config (see below)	Ultra-fast inference
DeepSeek	Set `base_url` in config (see below)	Cost-effective
Mistral	Set `base_url` in config (see below)	European, multilingual
+ any OpenAI-compatible	Set `base_url` in config	Together, OpenRouter, Fireworks, LM Studio, vLLM, ...

All four providers support base_url in the config, so you can point any provider at a proxy, gateway, or compatible service.

OpenAI-Compatible APIs

The openai provider works with any service that implements the OpenAI chat completions API. Change base_url in your config:

# Example: Using Groq
[providers.openai]
api_key = "${GROQ_API_KEY}"
base_url = "https://api.groq.com/openai/v1"
model = "llama-3.3-70b-versatile"

export GROQ_API_KEY=your-key
rknowledge build ./docs --provider openai

Service	`base_url`
Groq	`https://api.groq.com/openai/v1`
DeepSeek	`https://api.deepseek.com/v1`
Mistral	`https://api.mistral.ai/v1`
Together AI	`https://api.together.xyz/v1`
OpenRouter	`https://openrouter.ai/api/v1`
Fireworks	`https://api.fireworks.ai/inference/v1`
LM Studio	`http://localhost:1234/v1`
vLLM	`http://localhost:8000/v1`

Architecture

See ARCHITECTURE.md for the full deep-dive.

How It Works

Document Parsing: Documents are loaded and converted to plain text (PDF, MD, HTML, TXT)
Chunking: Text is split into overlapping chunks (default 1500 chars)
LLM Extraction: Chunks are sent concurrently to the LLM to extract (concept, type, concept, type, relationship) tuples
Graph Building: Concepts become typed nodes, relationships become weighted edges
Contextual Proximity: Concepts in the same chunk get additional weighted edges
Community Detection: Label Propagation groups related concepts
Storage: Graph is stored in Neo4j via MERGE for safe incremental updates

Neo4j Access

After rknowledge init, Neo4j is available at:

Browser: http://localhost:7474
Bolt: bolt://localhost:7687
Credentials: neo4j / rknowledge

Development

cargo test                  # Run tests (127 total)
cargo clippy -- -D warnings # Lint (CI enforced)
cargo fmt                   # Format
RUST_LOG=debug cargo run -- build ./demo_data  # Debug logging

See CONTRIBUTING.md for the full development guide.

License

MIT

Credits

Inspired by rahulnyk/knowledge_graph.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
assets		assets
demo_data		demo_data
docs		docs
skill		skill
src		src
tests		tests
.cursorrules		.cursorrules
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
README.md		README.md
demo.tape		demo.tape
demo_data.kg.json		demo_data.kg.json
install.sh		install.sh
llms.txt		llms.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 RKnowledge

How We Differ

Features

Installation

One-liner

From Source

As a Skill

Quick Start

Commands

Build Options

Query Examples

Advanced Features

Tenant Isolation

Manual Relation Insertion

Domain-Aware Prompting

Configuration

LLM Providers

OpenAI-Compatible APIs

Architecture

How It Works

Neo4j Access

Development

License

Credits

About

Uh oh!

Releases 2

Packages

Contributors 3

Uh oh!

Languages

Algiras/RKnowledge

Folders and files

Latest commit

History

Repository files navigation

🧠 RKnowledge

How We Differ

Features

Installation

One-liner

From Source

As a Skill

Quick Start

Commands

Build Options

Query Examples

Advanced Features

Tenant Isolation

Manual Relation Insertion

Domain-Aware Prompting

Configuration

LLM Providers

OpenAI-Compatible APIs

Architecture

How It Works

Neo4j Access

Development

License

Credits

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Uh oh!

Languages

Packages