Skip to content

blackwell-systems/knowing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

knowing

MCP Tools License Blackwell Systems

The system of record for how software systems behave, change, and relate over time.

Vision

Git is the system of record for source code. knowing is the system of record for what that source code means in the context of a running organization.

Software organizations have no single place that captures how their systems actually connect, who owns what, what changed since the last deploy, or whether production behavior matches what the code declares. That knowledge lives in people's heads, incident postmortems, and tribal memory. When someone leaves or an incident happens at 3 AM, it's gone.

knowing builds a versioned, content-addressed ledger of software system relationships: code, infrastructure, ownership, and runtime behavior. Every state is a hash. Every edge has provenance. Every question has an auditable answer.

Agents are the first consumer. But the actual audience is anyone who needs to understand a software organization as a system: platform teams, SREs, architects, security, compliance.

The Problem

Agents today are blind at repository boundaries. LSP tells you where a symbol is used inside one workspace. Code search finds matching text. Dependency graphs tell you which packages depend on which.

None of them answer the questions that actually matter:

If I change this symbol, what breaks across the rest of the system? Is this route actually called in production? When did this cross-repo dependency appear? Who do I need to notify? What did the system look like when we deployed on Tuesday?

What knowing Does

knowing builds a boundary-aware relationship graph across repositories, services, and infrastructure. It fuses static analysis with runtime observation to create a single, trustworthy model of how a software system actually works.

It is a persistent daemon with three components:

  • Indexer: crawls repositories in any language, parses ASTs with full type resolution (go/packages for Go, tree-sitter for everything else), computes content hashes, resolves cross-repo symbol references, and builds the graph. The graph model is language-agnostic; extractors produce nodes and edges, the graph doesn't care what language they came from. Watches for git changes and re-indexes incrementally.
  • Graph store: owns the content-addressed graph in SQLite. Manages the snapshot chain, runs garbage collection, and handles traversal queries with a multi-tier cache.
  • MCP server: exposes the graph to agents over stdio or HTTP.

Unlike tools that maintain mutable current-state graphs, knowing is content-addressed: every node, edge, and graph snapshot is a hash. This means:

  • History: the graph has a full audit trail; every previous state is queryable
  • Staleness: a hash mismatch is a structural fact, not a heuristic guess
  • Integrity: any graph state is provably derived from specific source commits
  • Runtime ground truth: production traces fused with static analysis tell you what actually runs, not just what the code declares

The Git analogy is exact: Git is a content-addressed graph of source code. knowing is a content-addressed graph of source code relationships.

What It Answers

For agents:

  • "I'm changing this function signature. Which other repos call it?"
  • "What is the blast radius of this change, and how confident are we in each edge?"
  • "What is the full data flow of this value across functions, services, queues, and repositories?"

For platform teams and SREs:

  • "What did the dependency graph look like when we deployed on Tuesday?"
  • "When did this cross-repo incompatibility first appear?"
  • "Is this route actually called in production, or just declared in code?"
  • "Static analysis says 47 callers; how many are active in production?"

For architects and tech leads:

  • "This PR adds 3 new cross-repo dependencies and spans 3 teams. Here's who to notify."
  • "What edges in the graph are stale after this week's changes?"
  • "This proto field has zero runtime reads in 90 days. Safe to deprecate."

For security and compliance:

  • "Prove that this graph was derived from these specific source commits."
  • "Show me every service that touches PII, traced through the actual runtime call graph."
  • "What changed in the system's dependency structure between these two audit dates?"

MCP Tools

Tool Purpose
cross_repo_callers All callers of a symbol across indexed repos
blast_radius Full impact analysis for a proposed change
trace_dataflow Follow a value across function and service boundaries
repo_graph Repository and package-level dependency relationships
stale_edges Edges invalidated by recent source changes (hash mismatch)
ownership Who owns the code/service/consumers affected by a change
snapshot_diff What changed in the graph between two points in time
semantic_diff Relationship-level diff between any two snapshots
pr_impact Semantic diff specialized for a PR (resolves base/head from git)
index_repo Add a repo to the graph
graph_query Raw graph query (Cypher or similar)

Relationship to agent-lsp

agent-lsp gives agents live semantic awareness inside a workspace: diagnostics, rename execution, edit simulation, symbol navigation.

knowing gives agents (and humans) persistent system-level awareness across repositories: relationships, impact, ownership, staleness, and runtime behavior.

Where agent-lsp answers "where is this symbol used in this repo?", knowing answers "where is this contract used across the system, who owns the consumers, and is it actually called in production?"

Roadmap

Five parallel workstreams, not sequential phases. See docs/roadmap.md for the full breakdown with dependency graph and parallelization notes.

Workstream Focus
Graph Core Content-addressed store, language-agnostic extractor framework, Go + tree-sitter extractors, traversal cache, MCP server, daemon
Edge Types SCIP, protobuf/gRPC, HTTP routes, events, schemas, infrastructure, ownership
Runtime Intelligence OpenTelemetry trace ingestion, runtime symbol resolution, confidence decay
Developer Visibility Semantic PR diff, graph-native test selection, ownership routing, staleness dashboard
Agent Coordination Pending mutations, temporal reasoning, federated graphs

Documentation

  • Architecture: design decisions, system overview, schemas, interfaces
  • Roadmap: workstreams, dependencies, parallelization notes

Tech Stack

  • Go (indexer, graph store, MCP server)
  • tree-sitter (multi-language AST parsing)
  • SCIP (ingest external indices)
  • SQLite (content-addressed persistent store)
  • MCP over stdio/HTTP

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors

Languages