-
-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Nous (νοῦς, Greek: mind / active intellect) is a persistent epistemic substrate for AI.
This repository should be read as a research program with a live implementation. It is not just a wrapper around model output. It is a system that stores typed relations, maintains graded uncertainty, evolves structurally across time, and carries memory between interactions.
Language models are the larynx in this picture, not the mind.
Benchmarking Nous with standard LLM benchmarks would be like measuring the sweetness of chocolate with the Scoville scale. The issue is not minor inaccuracy. The issue is category error: the instrument was built to measure a different phenomenon.
Benchmarks such as MMLU, ARC, and HumanEval are valid instruments for language models. Nous is not merely a language model output surface. It is a persistent epistemic substrate.
→ The Larynx Problem · Benchmark
- stores typed, evidence-scored relations instead of only text chunks
- exposes explicit uncertainty and contradiction boundaries
- runs a continuous cognitive loop between interactions
- consolidates memory asynchronously
- supports cross-domain bridge formation and bisociation
Benchmarking is instrumentation here, not the identity of the project.
In a documented reference run, an 8B model with Nous-grounded memory outperformed a 70B baseline on a domain-specific relational benchmark.
| Model | Memory | Score | Questions |
|---|---|---|---|
| llama3.1-8b | — | 46% | 60 |
| llama-3.3-70b | — | 47% | 60 |
| llama3.1-8b | ✓ Nous | 96% | 60 |
This is useful evidence, but it is not the full story. It still measures answer quality at a moment in time. That is why FNC-Bench exists.
FNC-Bench is the repo's epistemic benchmark suite. It asks different questions:
- does the system know that it does not know?
- does it preserve belief under contradiction?
- does stated confidence track real knowledge?
- does the substrate change coherently across time?
The shortest path through the category claim and implementation is:
| Page | Description |
|---|---|
| The Larynx Problem | Why language output is not the same thing as intelligence |
| Benchmark | Why standard LLM benchmarks do not apply to Nous |
| Architecture | Substrate structure, memory loop, graph runtime |
| Getting Started | Install, daemon setup, first query |
| Intent Disambiguation Effect | Why graph grounding changes model behavior |
| Contributing | Ways to contribute code, docs, benchmarks, and datasets |
| Lab Notes | Dated research notes, strategic documents, external communications |
| FAQ | Common questions |
- PyPI: pypi.org/project/nouse
- GitHub: base76-research-lab/Nous
- License: MIT
| 𝕏 / Twitter | @Q_for_qualia |
| bjornshomelab | |
| bjorn@base76research.com | |
| Issues | GitHub Issues |