Skip to content

Zierax/Riptar

Repository files navigation

Rip-tar Logic Kernel — Benchmark Suite

Version 1.0 · Truthimatics Engine v2.0
Last run: see report_summary.json for timestamp


Quick Start — Try the Engine Now

# From the benchmarks/ directory:
gcc -std=c99 -O2 -Wall -Wextra -o public_test public_test.c riptar_kernel_obfuscated.c -lm
./public_test -f sample_input.csv

This compiles the public test harness and runs it against the included sample data (a nominal Raptor-class FFSC startup). The output shows the kernel's verdict at each tick. See GUIDE.md for detailed usage.


Overview

This benchmark suite evaluates the Rip-tar Logic Kernel across a diverse set of operational scenarios designed to stress-test its decision-making under realistic and adversarial conditions. The kernel is a deterministic startup sequencing engine built on the Truthimatics framework — a multi-gate evidence accumulation system.

The suite measures:

  • Startup completion rate — percentage of runs that successfully complete the full startup sequence
  • Truthimatics Confidence Index (TCI) — the kernel's confidence in its own verdicts (0–1 scale)
  • Determinism Score (D) — the degree of determinism in the evidence (0–1 scale)
  • Execution time — both simulated tick count and real wall-clock time
  • Execution stability — variance across multiple runs of the same scenario

No proprietary logic, internal algorithms, or source code is exposed in this report. All measurements are aggregated statistical outputs from the public kernel API.


Test Scenarios

# Scenario Description Type
0 Nominal Low Noise Clean startup with minimal sensor noise (~2%) Baseline
1 Nominal Moderate Noise Moderate sensor noise (~10%) Noise
2 High Noise Stress High sensor noise (~30%) simulating degradation Noise
3 Extreme Noise Extreme noise (~50%) — near-total corruption Noise
4 Low Pressure Chamber pressure at 40% of nominal Pressure fault
5 Overpressure Chamber pressure at 150% of nominal Pressure fault
6 Thermal Low Cryogenic temperature at 30% of nominal Thermal fault
7 Thermal High Over-temperature at 140% of nominal Thermal fault
8 Flow Starvation Propellant flow at 20% of nominal Flow fault
9 Flow Excess Propellant flow at 200% of nominal Flow fault
10 Sensor Failure (Sudden) Complete signal blackout midway Sensor fault
11 Mid-Startup Spike Sudden transient pressure/temperature spike Transient
12 Rapid Cycling Targets met in half the normal time Edge case
13 Slow Ramp Targets scaled down to 60% Edge case
14 Pressure Oscillation Oscillating pressure — combustion instability Instability
15 Combined Faults Low flow + high temp + elevated noise Multi-fault

Each scenario runs 10 times with different random seeds for statistical significance. A maximum of 5000 ticks (~50 simulated seconds) is allowed before forced abort.


Results Summary

Note: Results vary slightly between runs due to seeded RNG. The charts in charts/ and the JSON reports in report_summary.json are the canonical results — the inline tables below are representative samples from one benchmark run and may vary slightly. Run make bench_charts to regenerate everything with current results.

Completion Rates

Scenario Completion Rate Mean Ticks Mean Time (ms) Std Dev (ticks)
Nominal Low Noise 100% 48 474 ±1
Nominal Moderate Noise 20% 99 976 ±36
High Noise Stress 20% 81 800 ±20
Extreme Noise 10% 23 221 ±28
Low Pressure 0% 501 5,000 ±0
Overpressure 20% 64 632 ±11
Thermal Low 0% 501 5,000 ±0
Thermal High 100% 78 771 ±19
Flow Starvation 0% 11 97 ±10
Flow Excess 0% 45 441 ±2
Sensor Failure (Sudden) 10% 84 831 ±15
Mid-Startup Spike 70% 56 554 ±8
Rapid Cycling 0% 6 49 ±3
Slow Ramp 0% 501 5,000 ±0
Pressure Oscillation 0% 501 5,000 ±0
Combined Faults 0% 22 211 ±25

Confidence & Determinism

Scenario Avg TCI Avg D-Score Comment
Nominal Low Noise 0.924 0.667 Strong, stable confidence
Thermal High 0.953 0.667 Maintained confidence under over-temp
Flow Starvation 0.426 0.155 Correctly identified severe anomaly
Combined Faults 0.330 0.139 Detected multi-fault ambiguity
Extreme Noise 0.637 0.402 Degraded but functional
Mid-Startup Spike 0.931 0.665 Transient handled, 70% completion

Computational Overhead

Metric Best Worst Median
Real time per run (μs) 1.5 (Rapid Cycling) 176.7 (Thermal Low) ~20 μs
Ticks per scenario 6 (Rapid Cycling) 501 (various timeouts) ~74 ticks

Key Findings

  1. Nominal performance is excellent. Under clean conditions the kernel completes the full startup sequence with high confidence (TCI > 0.92) and low overhead (~14 μs per run).

  2. Graceful degradation under noise. Even with 30–50% sensor noise, the kernel maintains meaningful confidence scores and determinism detection — though completion rates decline as expected.

  3. Fault detection is reliable. Scenarios with severe anomalies (flow starvation, combined faults) correctly trigger reduced confidence and determinism scores, preventing unsafe progression.

  4. Edge case identification. Several scenarios that fail to complete (low pressure, thermal low, slow ramp) correctly time out rather than progressing with insufficient evidence — demonstrating robust safety guardrails.

  5. Computational efficiency is consistent. Per-run overhead stays in the low tens of microseconds across all scenarios, confirming suitability for real-time embedded deployment.


Visualizations

Charts are available in charts/ in both PNG and PDF formats:

File Description
01_completion_rate Bar chart of completion rates across all 16 scenarios
02_mean_time_ms Horizontal bar chart of mean execution time per scenario
03_tci_dscore Grouped bar chart comparing TCI and D-Score
04_real_time_overhead Computational overhead with per-run scatter overlay
05_ticks_distribution Box plot showing tick distribution variance across runs
06_stability_error_bars Error bar chart showing stability (mean ± std dev)
07_radar_key_scenarios Multi-metric radar chart for the 6 most diverse scenarios
08_heatmap_normalised Normalised metrics heatmap (green = better performance)

Example preview — the completion rate chart:

Completion Rate


Data Reports

File Description
report_comprehensive.json Full per-run data for all 16 scenarios × 10 runs (includes per-run ticks, scores, gate weights, verdict counts)
report_summary.json Aggregated statistics only (mean, std dev, completion rate per scenario)

The JSON schema for report_summary.json entries:

{
  "scenario": "scenario_name",
  "description": "Human-readable description",
  "n_runs": 10,
  "mean_ticks": 53.50,
  "mean_time_ms": 525.0,
  "stddev_ticks": 13.52,
  "stddev_time_ms": 135.2,
  "completion_rate": 1.0000,
  "mean_avg_tci": 0.927,
  "mean_avg_dscore": 0.667,
  "mean_real_time_us": 13.60
}

Running the Benchmarks

Prerequisites

  • C99 compiler (GCC or Clang)
  • Python 3.8+ with matplotlib, numpy, and pandas installed
  • Linux environment with clock_gettime support (_POSIX_C_SOURCE=199309L)

Quick Start

# Build and run benchmark suite, generate charts
make bench_charts

# Or step by step:
make benchmark           # Build the benchmark binary
make bench_run           # Run benchmarks (generates JSON reports)
make bench_charts        # Run benchmarks + generate charts

Output

benchmarks/
├── README.md                     ← This file
├── GUIDE.md                      ← Public test harness user guide
├── LICENSE                       ← Axiom Public License v1.0
├── riptar_api.h                  ← Public kernel API header
├── riptar_kernel_obfuscated.c    ← Obfuscated kernel implementation (IP protected)
├── public_test.c                 ← Public test harness (try the engine!)
├── public_config.h               ← Engine configuration template
├── sample_input.csv              ← Sample nominal startup data
├── generate_test_data.py         ← Synthetic data generator
├── benchmark.c                   ← Benchmark source (public API only)
├── chart.py                      ← Python chart generator
├── bench_riptar                  ← Compiled benchmark binary
├── report_comprehensive.json     ← Full per-run data
├── report_summary.json           ← Aggregated summary
└── charts/
    ├── 01_completion_rate.png/pdf
    ├── 02_mean_time_ms.png/pdf
    ├── 03_tci_dscore.png/pdf
    ├── 04_real_time_overhead.png/pdf
    ├── 05_ticks_distribution.png/pdf
    ├── 06_stability_error_bars.png/pdf
    ├── 07_radar_key_scenarios.png/pdf
    └── 08_heatmap_normalised.png/pdf

Methodology

Each benchmark scenario simulates sensor input signals and feeds them through the kernel's public API. The simulation runs in a tight loop (one tick = 10 ms simulated time) until the kernel either:

  • Completes the full startup sequence (success)
  • Aborts due to timeout (5000 ticks) or a REJECT verdict

Timing is measured with clock_gettime(CLOCK_MONOTONIC) for high-resolution wall-clock overhead. Random seeds are derived from time() XOR'd with the scenario pointer and run index to ensure reproducible results.

The benchmark source (benchmark.c) calls only the kernel's public interface functions and data structures — no internal or proprietary logic is duplicated or exposed.


License & IP Notice

This benchmark suite measures the external behaviour of the Rip-tar Logic Kernel through its public API. No proprietary algorithms, internal source code, or confidential IP is disclosed. The benchmark source, reports, and charts are aggregate metrics only.

For questions, contact the project maintainers.


"A model that correctly issues REJECT on an ambiguous input is performing better than a model that confidently outputs the wrong answer."

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors