docs(research): define learned-model artifact contracts by Whiteks1 · Pull Request #439 · Whiteks1/quantlab

Whiteks1 · 2026-04-21T13:25:12Z

Summary

add the N.0 learned-model artifact contract for dataset, feature, model config, and training summary artifacts
update the roadmap with the Neural Research Track while keeping D.2 / paper / broker safety as the main maturity path
replace the old web3 app public framing with a more professional execution-venue positioning
link the learned-model contract from README and the run artifact contract

Scope

Docs-only implementation for #438.

Out of scope:

no training loops
no PyTorch / TensorFlow / sklearn integration
no model registry or serving
no paper promotion for learned models
no execution, broker, paper, Stepbit, or Quant Pulse implementation changes
no runtime dependencies

Validation

git diff --cached --check

Closes #438

Summary by Sourcery

Define a proposed learned-model artifact contract and integrate a parallel neural research track into the roadmap and docs without changing runtime behavior.

Documentation:

Add a learned-model artifact contract spec describing required model_run artifacts, fields, and evaluation discipline for Stage N.0 neural research foundations.
Extend the roadmap with a parallel Neural Research Track (Stages N.0–N.6), including non-promotion rules and updated recommended execution order for introducing learned models.
Update README and run artifact docs to reference the new learned-model artifact contract and clarify QuantLab’s positioning as a local-first quantitative research and supervised execution system.

sourcery-ai · 2026-04-21T13:25:19Z

Reviewer's Guide

Defines the initial N.0 learned-model experiment artifact contract and threads it into the docs/roadmap and run-artifact contract, while updating overall product positioning toward a local-first quantitative research and supervised execution system instead of a 'web3 app'.

Class diagram for N.0 learned-model artifact schemas

classDiagram
  class ModelRunRoot {
    +string model_run_id
    +string path
  }

  class DatasetManifest {
    +string schema_version
    +string artifact_type
    +string model_run_id
    +string created_at
    +string dataset_id
    +Source source
    +string[] universe
    +TimeRange time_range
    +int rows
    +Target target
    +Split split
    +string data_hash
    +string generation_command
  }

  class Source {
    +string kind
    +string path
  }

  class TimeRange {
    +string start
    +string end
    +string timezone
  }

  class Target {
    +string name
    +string horizon
    +string definition
  }

  class Split {
    +string method
    +SplitWindow train
    +SplitWindow validation
    +SplitWindow test
  }

  class SplitWindow {
    +string start
    +string end
  }

  class FeatureManifest {
    +string schema_version
    +string artifact_type
    +string model_run_id
    +string dataset_id
    +string feature_set_id
    +Feature[] features
    +Lookback lookback
    +Normalization normalization
    +string[] leakage_guards
    +string feature_hash
    +string generation_command
  }

  class Feature {
    +string name
    +string source
    +map parameters
  }

  class Lookback {
    +int max_bars
    +bool uses_future_data
  }

  class Normalization {
    +string method
    +string fit_scope
  }

  class ModelConfig {
    +string schema_version
    +string artifact_type
    +string model_run_id
    +string model_family
    +string model_name
    +Library library
    +map hyperparameters
    +int random_seed
    +string training_objective
    +InputShape input_shape
    +OutputSpec output
  }

  class Library {
    +string name
    +string version
  }

  class InputShape {
    +int features
    +int window
  }

  class OutputSpec {
    +string kind
    +string target
  }

  class TrainingSummary {
    +string schema_version
    +string artifact_type
    +string model_run_id
    +string status
    +string started_at
    +string finished_at
    +int duration_seconds
    +string dataset_manifest_path
    +string feature_manifest_path
    +string model_config_path
    +Metrics metrics
    +BaselineComparison baseline_comparison
    +Reproducibility reproducibility
    +PromotionAssessment promotion_assessment
  }

  class Metrics {
    +map train
    +map validation
    +map test
    +map downstream_market
  }

  class BaselineComparison {
    +bool required
    +map rule_based_baseline
    +map classical_ml_baseline
    +string result
  }

  class Reproducibility {
    +int random_seed
    +string code_version
    +string data_hash
    +string feature_hash
  }

  class PromotionAssessment {
    +bool eligible_for_paper
    +string[] blocking_reasons
  }

  ModelRunRoot "1" o-- "1" DatasetManifest : contains
  ModelRunRoot "1" o-- "1" FeatureManifest : contains
  ModelRunRoot "1" o-- "1" ModelConfig : contains
  ModelRunRoot "1" o-- "1" TrainingSummary : contains

  DatasetManifest "1" --> "1" Source
  DatasetManifest "1" --> "1" TimeRange
  DatasetManifest "1" --> "1" Target
  DatasetManifest "1" --> "1" Split
  Split "1" --> "1" SplitWindow : train
  Split "1" --> "1" SplitWindow : validation
  Split "1" --> "1" SplitWindow : test

  FeatureManifest "1" --> "*" Feature
  FeatureManifest "1" --> "1" Lookback
  FeatureManifest "1" --> "1" Normalization

  ModelConfig "1" --> "1" Library
  ModelConfig "1" --> "1" InputShape
  ModelConfig "1" --> "1" OutputSpec

  TrainingSummary "1" --> "1" Metrics
  TrainingSummary "1" --> "1" BaselineComparison
  TrainingSummary "1" --> "1" Reproducibility
  TrainingSummary "1" --> "1" PromotionAssessment

Flow diagram for N.0 learned-model experiment artifact lifecycle

flowchart TD
  A_define_dataset["Define dataset and temporal splits"]
  B_emit_dataset_manifest["Emit dataset_manifest.json"]
  C_define_features["Define and generate features"]
  D_emit_feature_manifest["Emit feature_manifest.json"]
  E_configure_model["Configure model and hyperparameters"]
  F_emit_model_config["Emit model_config.json"]
  G_train_model["Train model with fixed random_seed"]
  H_emit_training_summary["Emit training_summary.json"]
  I_optional_downstream["Optional downstream backtest producing report.json"]
  J_promotion_blocked["Enforce non-promotion rules at N.0"]

  subgraph ModelRunRootDir["outputs/model_runs/<model_run_id>/"]
    B_emit_dataset_manifest --> D_emit_feature_manifest
    D_emit_feature_manifest --> F_emit_model_config
    F_emit_model_config --> H_emit_training_summary
  end

  A_define_dataset --> B_emit_dataset_manifest
  C_define_features --> D_emit_feature_manifest
  E_configure_model --> F_emit_model_config
  G_train_model --> H_emit_training_summary

  H_emit_training_summary --> I_optional_downstream
  H_emit_training_summary --> J_promotion_blocked

  I_optional_downstream --> J_promotion_blocked

File-Level Changes

Change	Details	Files
Introduce a formal N.0 learned-model artifact contract and wire it into existing artifact documentation.	Add a new docs/learned-model-artifact-contract.md file describing the contract root, required JSON artifacts (dataset_manifest.json, feature_manifest.json, model_config.json, training_summary.json), sample schemas, evaluation discipline, leakage rules, and non-promotion constraints. Extend docs/run-artifact-contract.md with a new 'Learned-Model Experiment Artifacts' section that defines outputs/model_runs/<model_run_id>/, lists the N.0 required artifacts, clarifies their relationship to existing run and paper artifacts, and links to the detailed learned-model contract.	`docs/learned-model-artifact-contract.md` `docs/run-artifact-contract.md`
Extend the roadmap with a Parallel Neural Research Track and integrate learned-model stages into the recommended execution order and guardrails.	Add a 'Parallel Neural Research Track' section to docs/roadmap.md that defines Stage N.0 through N.6, including goals, scope, exit conditions, and clear authority rules between QuantLab, Stepbit, and Quant Pulse. Update the 'Recommended Execution Order' to insert N.0–N.5 before deeper orchestration and automation stages, emphasizing evidence discipline before any learned-model promotion. Tighten the 'What Should Not Happen Early' section with explicit prohibitions on premature learned-model promotion, neural claims without baselines, and over-reliance on predictive accuracy.	`docs/roadmap.md`
Refine external product framing and documentation links to reflect the new learned-model contract and more professional execution-venue positioning.	Update the public product framing in docs/roadmap.md and the execution venue strategy note in README.md to describe QuantLab as a local-first quantitative research and supervised execution system that can support modern (including web3) venues without being a crypto/AI marketing shell. Add cross-links to the new learned-model artifact contract from README.md, docs/roadmap.md, and docs/run-artifact-contract.md for discoverability.	`docs/roadmap.md` `README.md`

Assessment against linked issues

Issue	Objective	Addressed
#438	Define and document the initial N.0 learned-model artifact contract, including minimum required fields and structure for dataset_manifest.json, feature_manifest.json, model_config.json, and training_summary.json, and how these artifacts relate to existing QuantLab outputs (report.json, outputs/runs/, and future learned-model experiment directories).	✅
#438	Define and document evaluation discipline for learned-model research, including temporal split requirements, train/validation/test separation, random seed discipline, dataset and feature-set traceability, target/horizon definition, baseline comparison requirements, leakage-prevention expectations, and minimum metadata needed to reproduce an experiment.	✅
#438	Define and document non-promotion rules for learned models (no promotion without baselines, no promotion based only on predictive accuracy, no direct execution intent, no bypassing existing paper/safety/broker/supervised execution gates) while keeping this slice documentary only (no training loops, ML integrations, or runtime changes).	✅

Possibly linked issues

research(ml): define learned-model artifact contracts and evaluation discipline #438: PR fully documents the N.0 learned-model artifact contracts, evaluation discipline, and roadmap integration specified in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The JSON example payloads currently use placeholder values like empty strings and zero counts (e.g., rows: 0, empty data_hash); consider either marking these explicitly as placeholders or providing realistic example values so readers don’t misinterpret them as recommended defaults.
The N.0 artifact list and rules are now described in roadmap.md, run-artifact-contract.md, and learned-model-artifact-contract.md; you might reduce duplication by keeping the detailed contract only in the dedicated doc and linking to it from the roadmap and run-artifact contract to avoid divergence over time.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The JSON example payloads currently use placeholder values like empty strings and zero counts (e.g., `rows: 0`, empty `data_hash`); consider either marking these explicitly as placeholders or providing realistic example values so readers don’t misinterpret them as recommended defaults.
- The N.0 artifact list and rules are now described in `roadmap.md`, `run-artifact-contract.md`, and `learned-model-artifact-contract.md`; you might reduce duplication by keeping the detailed contract only in the dedicated doc and linking to it from the roadmap and run-artifact contract to avoid divergence over time.

## Individual Comments

### Comment 1
<location path="docs/roadmap.md" line_range="528" />
<code_context>
+Authority rule:
+
+- QuantLab owns dataset definition, feature definition, model validation, artifact contracts, and promotion criteria
+- Stepbit may later orchestrate learned-model workflows, but must not own modeling authority
+- Quant Pulse may later provide upstream hypotheses or signal context, but must not certify learned-model validity
+
</code_context>
<issue_to_address>
**suggestion (typo):** Consider adding an explicit subject after "but" for grammatical completeness and consistency.

For consistency with existing docs (e.g., `learned-model-artifact-contract.md`) and improved readability, consider: `Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.`

```suggestion
- Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-21T13:26:23Z

+Authority rule:
+
+- QuantLab owns dataset definition, feature definition, model validation, artifact contracts, and promotion criteria
+- Stepbit may later orchestrate learned-model workflows, but must not own modeling authority


suggestion (typo): Consider adding an explicit subject after "but" for grammatical completeness and consistency.

For consistency with existing docs (e.g., learned-model-artifact-contract.md) and improved readability, consider: Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.

Suggested change

- Stepbit may later orchestrate learned-model workflows, but must not own modeling authority

- Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.

docs(research): define learned-model artifact contracts for #438

8efc9ce

sourcery-ai Bot reviewed Apr 21, 2026

View reviewed changes

Whiteks1 merged commit 3fcb29e into main Apr 21, 2026
3 checks passed

Whiteks1 deleted the codex/issue-438-learned-model-contracts branch April 21, 2026 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(research): define learned-model artifact contracts#439

docs(research): define learned-model artifact contracts#439
Whiteks1 merged 1 commit intomainfrom
codex/issue-438-learned-model-contracts

Whiteks1 commented Apr 21, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 21, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Stepbit may later orchestrate learned-model workflows, but must not own modeling authority
	- Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.

Conversation

Whiteks1 commented Apr 21, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Validation

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for N.0 learned-model artifact schemas

Flow diagram for N.0 learned-model experiment artifact lifecycle

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Whiteks1 commented Apr 21, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Apr 21, 2026 •

edited

Loading