Skip to content
Björn Wikström edited this page Apr 13, 2026 · 9 revisions

Nous

Nous logo

Nous (νοῦς, Greek: mind / active intellect) is a persistent epistemic substrate for AI.

This repository should be read as a research program with a live implementation. It is not just a wrapper around model output. It is a system that stores typed relations, maintains graded uncertainty, evolves structurally across time, and carries memory between interactions.

Language models are the larynx in this picture, not the mind.

Why this category is different

Benchmarking Nous with standard LLM benchmarks would be like measuring the sweetness of chocolate with the Scoville scale. The issue is not minor inaccuracy. The issue is category error: the instrument was built to measure a different phenomenon.

Benchmarks such as MMLU, ARC, and HumanEval are valid instruments for language models. Nous is not merely a language model output surface. It is a persistent epistemic substrate.

The Larynx Problem · Benchmark


What the artifact already contains

  • stores typed, evidence-scored relations instead of only text chunks
  • exposes explicit uncertainty and contradiction boundaries
  • runs a continuous cognitive loop between interactions
  • consolidates memory asynchronously
  • supports cross-domain bridge formation and bisociation

Reference evidence

Benchmarking is instrumentation here, not the identity of the project.

Historical reference run

In a documented reference run, an 8B model with Nous-grounded memory outperformed a 70B baseline on a domain-specific relational benchmark.

Model Memory Score Questions
llama3.1-8b 46% 60
llama-3.3-70b 47% 60
llama3.1-8b ✓ Nous 96% 60

This is useful evidence, but it is not the full story. It still measures answer quality at a moment in time. That is why FNC-Bench exists.

FNC-Bench

FNC-Bench is the repo's epistemic benchmark suite. It asks different questions:

  • does the system know that it does not know?
  • does it preserve belief under contradiction?
  • does stated confidence track real knowledge?
  • does the substrate change coherently across time?

Benchmark


Reading order

The shortest path through the category claim and implementation is:

Page Description
The Larynx Problem Why language output is not the same thing as intelligence
Benchmark Why standard LLM benchmarks do not apply to Nous
Architecture Substrate structure, memory loop, graph runtime
Getting Started Install, daemon setup, first query
Intent Disambiguation Effect Why graph grounding changes model behavior
Contributing Ways to contribute code, docs, benchmarks, and datasets
Lab Notes Dated research notes, strategic documents, external communications
FAQ Common questions

Links


Contact

𝕏 / Twitter @Q_for_qualia
LinkedIn bjornshomelab
Email bjorn@base76research.com
Issues GitHub Issues

Clone this wiki locally