Skip to content

Schema parser does not implement picoschema modifiers (enum, array, descriptions) #794

@groksrc

Description

@groksrc

Summary

The docs describe Basic Memory schemas as using Picoschema syntax, but the parser appears to do plain YAML field-to-type mapping only. Picoschema modifiers like (enum, description): [values], (array, description): type, and trailing field descriptions after a comma are not interpreted — they're just included as part of the YAML key.

The result: writing schemas with picoschema modifiers does not produce the validation behavior the docs promise.

Relationship to other issues

Evidence

Docs claim

From docs.basicmemory.com/raw/concepts/schema-system.md:

Schemas use Picoschema — a compact YAML-based syntax for defining fields. Each line declares a field name, its type, and an optional description.

The "Modifiers" table lists field?(enum): [values], field?(array): type, field?(object):, and "field: type, description" forms.

Actual behavior

Schema note saved as:

---
title: PicoTest
type: schema
entity: pico_test
schema:
  name: string
  status(enum, current state): [active, inactive]
  tags(array, list of tags): string
settings:
  validation: warn
---

Note that this YAML does parse cleanly — the (enum, description) form is the spec-correct picoschema syntax, where the description is inside the parens before the colon, so YAML doesn't see trailing text after the flow sequence.

Conforming data note:

---
title: PicoTest1
type: pico_test
status: active
tags: [foo, bar]
---

# PicoTest1

## Observations
- [name] PicoTest1
- [status] active
- [tags] foo
- [tags] bar

Run schema_validate:

Notes: 1 | Valid: 1 | Warnings: 3 | Errors: 0

- PicoTest1 — valid
  - warning: Missing required field: name (expected [name] observation)
  - warning: Missing required field: status(enum, current state) (expected [status(enum, current state)] observation)
  - warning: Missing required field: tags(array, list of tags) (expected [tags(array, list of tags)] observation)

What this tells us

  1. The validator is looking for an observation literally named [status(enum, current state)], not for an observation named [status] whose value is one of active/inactive. The picoschema modifier (enum, current state) is being treated as part of the field name.

  2. Same for tags(array, list of tags) — the validator wants an observation [tags(array, list of tags)], not multiple [tags] observations whose values are strings.

  3. The name warning is also wrong — the observation [name] PicoTest1 is present in the data note but the validator says it's missing. This may be a separate issue with how observations are matched to required fields when other warnings precede them.

What appears to actually work

The parser does honor:

  • ? for optional fields (field?: type)
  • Capitalized type names for relations (works_at: Organization matches a works_at [[...]] relation)
  • Plain types (string, integer, Note)

That's about it. The full picoschema feature set described in the docs is aspirational rather than implemented.

Suggested resolution

A few possible directions:

  1. Implement picoschema for real. Pre-process the YAML keys, extract (enum)/(array)/(object) modifiers and descriptions, and translate to internal representation before validation. This is what the docs promise and what the picoschema spec defines.

  2. Strip modifiers and warn. If the parser can't implement them, at least strip the (enum, ...) portion off the YAML key during parsing so the field is recognized as status, and emit a warning that enum/array constraints aren't enforced. This avoids the bizarre "field named status(enum, current state) is missing" error.

  3. Update the docs to match what works. Document only the subset that actually works (field: type, field?: type, field: CapitalizedRelation) and remove the picoschema-modifier examples until they're implemented. This is the lowest-effort fix and is the path the linked docs issue takes.

Option 1 is the most user-respecting. Option 3 is the fastest.

Why this matters

Without enums, the schema can require that status exists but cannot enforce that it's one of [idea, queued, parked, archived, promoted]. That's a significant gap for using schemas as a validation tool — the whole point of strict mode (per the docs) is to enforce constraints, but the constraints can't be expressed.

In the meantime, users have to choose between:

  • A. Write schemas that look picoschema-ish and silently don't enforce the constraints they appear to.
  • B. Write schemas with bare types only and live without enum/array constraints.

A workaround is to validate enums externally (via a script over the frontmatter or observations), but that defeats the purpose of having a built-in validator.

Environment

  • basic-memory 0.20.3
  • macOS, Python 3.13
  • Verified 2026-05-05

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions