Skip to content

MettleForgeLab/DataDiddler-RUNTIME

Repository files navigation

DataDiddler Runtime

Deterministic execution substrate for structuring text into canonical artifacts.


What this is

This repository contains the DataDiddler runtime surface.

It is responsible for:

  • executing stage pipelines
  • enforcing schema boundaries
  • producing canonical structured artifacts (GroundedDatasetBlock)

This is not the full system.

It is the execution layer only.


System Structure

core/       → contract surface (schema + artifact definition)
lenses/     → execution primitives (stage behavior)
pipeline/   → orchestration (stage wiring + execution order)
integrity/  → trust enforcement (evidence + contradiction handling)
query/      → inspection layer (render + index)
tooling/    → build + toolchain support
publish/    → placeholder (non-operational)

Core Distinctions

lenses define behavior
pipeline defines sequence
core defines structure

Input Contract

DataDiddler does not accept arbitrary data.

Input must be:

  • plain text or Markdown
  • structurally recoverable (clear sections, paragraphs, or records)
  • free of presentation-layer noise

Unsupported inputs:

  • raw HTML
  • layout-heavy documents without preprocessing

Upstream normalization is required for:

  • scraped web content
  • HTML documents
  • domain-specific formats

DataDiddler does not clean data.

It assumes the data has already been shaped into something worth structuring.


Execution Model

Pipeline:

Rake → Separator → Tagger → Packager

Each stage:

  • consumes explicit inputs
  • produces explicit outputs
  • runs deterministically

How to Run

From the repository root:

./run_datadiddler.ps1

This will execute the pipeline on the configured input directory and produce a structured artifact.


Outputs

Primary artifact:

GroundedDatasetBlock.vN.json

Properties:

  • schema-validated
  • structurally complete
  • content may be empty but shape is enforced

Guarantees

  • deterministic execution
  • fail-closed behavior (no partial success)
  • explicit artifact production
  • schema enforcement

Non-Guarantees

This repository does not provide:

  • semantic correctness
  • domain-specific tagging
  • ingestion or normalization
  • governance or verification
  • external trust systems

publish/

The publish/ directory is reserved for future functionality.

It does not currently participate in execution.


Example

A minimal example is provided in:

examples/sample_input/
examples/sample_output/

This demonstrates the expected input shape and resulting output structure.


What this is not

This is not:

  • the full DataDiddler system
  • the ingestion layer
  • the governance layer
  • a domain-specific processor

It is the execution substrate only.


Two lines, held steady

Structure is enforced. Meaning must conform to it.

About

Deterministic execution substrate for structuring text into canonical artifacts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors