Version 1.0 · Truthimatics Engine v2.0
Last run: seereport_summary.jsonfor timestamp
# From the benchmarks/ directory:
gcc -std=c99 -O2 -Wall -Wextra -o public_test public_test.c riptar_kernel_obfuscated.c -lm
./public_test -f sample_input.csvThis compiles the public test harness and runs it against the included sample data (a nominal Raptor-class FFSC startup). The output shows the kernel's verdict at each tick. See GUIDE.md for detailed usage.
This benchmark suite evaluates the Rip-tar Logic Kernel across a diverse set of operational scenarios designed to stress-test its decision-making under realistic and adversarial conditions. The kernel is a deterministic startup sequencing engine built on the Truthimatics framework — a multi-gate evidence accumulation system.
The suite measures:
- Startup completion rate — percentage of runs that successfully complete the full startup sequence
- Truthimatics Confidence Index (TCI) — the kernel's confidence in its own verdicts (0–1 scale)
- Determinism Score (D) — the degree of determinism in the evidence (0–1 scale)
- Execution time — both simulated tick count and real wall-clock time
- Execution stability — variance across multiple runs of the same scenario
No proprietary logic, internal algorithms, or source code is exposed in this report. All measurements are aggregated statistical outputs from the public kernel API.
| # | Scenario | Description | Type |
|---|---|---|---|
| 0 | Nominal Low Noise | Clean startup with minimal sensor noise (~2%) | Baseline |
| 1 | Nominal Moderate Noise | Moderate sensor noise (~10%) | Noise |
| 2 | High Noise Stress | High sensor noise (~30%) simulating degradation | Noise |
| 3 | Extreme Noise | Extreme noise (~50%) — near-total corruption | Noise |
| 4 | Low Pressure | Chamber pressure at 40% of nominal | Pressure fault |
| 5 | Overpressure | Chamber pressure at 150% of nominal | Pressure fault |
| 6 | Thermal Low | Cryogenic temperature at 30% of nominal | Thermal fault |
| 7 | Thermal High | Over-temperature at 140% of nominal | Thermal fault |
| 8 | Flow Starvation | Propellant flow at 20% of nominal | Flow fault |
| 9 | Flow Excess | Propellant flow at 200% of nominal | Flow fault |
| 10 | Sensor Failure (Sudden) | Complete signal blackout midway | Sensor fault |
| 11 | Mid-Startup Spike | Sudden transient pressure/temperature spike | Transient |
| 12 | Rapid Cycling | Targets met in half the normal time | Edge case |
| 13 | Slow Ramp | Targets scaled down to 60% | Edge case |
| 14 | Pressure Oscillation | Oscillating pressure — combustion instability | Instability |
| 15 | Combined Faults | Low flow + high temp + elevated noise | Multi-fault |
Each scenario runs 10 times with different random seeds for statistical significance. A maximum of 5000 ticks (~50 simulated seconds) is allowed before forced abort.
Note: Results vary slightly between runs due to seeded RNG. The charts in
charts/and the JSON reports inreport_summary.jsonare the canonical results — the inline tables below are representative samples from one benchmark run and may vary slightly. Runmake bench_chartsto regenerate everything with current results.
| Scenario | Completion Rate | Mean Ticks | Mean Time (ms) | Std Dev (ticks) |
|---|---|---|---|---|
| Nominal Low Noise | 100% | 48 | 474 | ±1 |
| Nominal Moderate Noise | 20% | 99 | 976 | ±36 |
| High Noise Stress | 20% | 81 | 800 | ±20 |
| Extreme Noise | 10% | 23 | 221 | ±28 |
| Low Pressure | 0% | 501 | 5,000 | ±0 |
| Overpressure | 20% | 64 | 632 | ±11 |
| Thermal Low | 0% | 501 | 5,000 | ±0 |
| Thermal High | 100% | 78 | 771 | ±19 |
| Flow Starvation | 0% | 11 | 97 | ±10 |
| Flow Excess | 0% | 45 | 441 | ±2 |
| Sensor Failure (Sudden) | 10% | 84 | 831 | ±15 |
| Mid-Startup Spike | 70% | 56 | 554 | ±8 |
| Rapid Cycling | 0% | 6 | 49 | ±3 |
| Slow Ramp | 0% | 501 | 5,000 | ±0 |
| Pressure Oscillation | 0% | 501 | 5,000 | ±0 |
| Combined Faults | 0% | 22 | 211 | ±25 |
| Scenario | Avg TCI | Avg D-Score | Comment |
|---|---|---|---|
| Nominal Low Noise | 0.924 | 0.667 | Strong, stable confidence |
| Thermal High | 0.953 | 0.667 | Maintained confidence under over-temp |
| Flow Starvation | 0.426 | 0.155 | Correctly identified severe anomaly |
| Combined Faults | 0.330 | 0.139 | Detected multi-fault ambiguity |
| Extreme Noise | 0.637 | 0.402 | Degraded but functional |
| Mid-Startup Spike | 0.931 | 0.665 | Transient handled, 70% completion |
| Metric | Best | Worst | Median |
|---|---|---|---|
| Real time per run (μs) | 1.5 (Rapid Cycling) | 176.7 (Thermal Low) | ~20 μs |
| Ticks per scenario | 6 (Rapid Cycling) | 501 (various timeouts) | ~74 ticks |
-
Nominal performance is excellent. Under clean conditions the kernel completes the full startup sequence with high confidence (TCI > 0.92) and low overhead (~14 μs per run).
-
Graceful degradation under noise. Even with 30–50% sensor noise, the kernel maintains meaningful confidence scores and determinism detection — though completion rates decline as expected.
-
Fault detection is reliable. Scenarios with severe anomalies (flow starvation, combined faults) correctly trigger reduced confidence and determinism scores, preventing unsafe progression.
-
Edge case identification. Several scenarios that fail to complete (low pressure, thermal low, slow ramp) correctly time out rather than progressing with insufficient evidence — demonstrating robust safety guardrails.
-
Computational efficiency is consistent. Per-run overhead stays in the low tens of microseconds across all scenarios, confirming suitability for real-time embedded deployment.
Charts are available in charts/ in both PNG and PDF formats:
| File | Description |
|---|---|
01_completion_rate |
Bar chart of completion rates across all 16 scenarios |
02_mean_time_ms |
Horizontal bar chart of mean execution time per scenario |
03_tci_dscore |
Grouped bar chart comparing TCI and D-Score |
04_real_time_overhead |
Computational overhead with per-run scatter overlay |
05_ticks_distribution |
Box plot showing tick distribution variance across runs |
06_stability_error_bars |
Error bar chart showing stability (mean ± std dev) |
07_radar_key_scenarios |
Multi-metric radar chart for the 6 most diverse scenarios |
08_heatmap_normalised |
Normalised metrics heatmap (green = better performance) |
Example preview — the completion rate chart:
| File | Description |
|---|---|
report_comprehensive.json |
Full per-run data for all 16 scenarios × 10 runs (includes per-run ticks, scores, gate weights, verdict counts) |
report_summary.json |
Aggregated statistics only (mean, std dev, completion rate per scenario) |
The JSON schema for report_summary.json entries:
{
"scenario": "scenario_name",
"description": "Human-readable description",
"n_runs": 10,
"mean_ticks": 53.50,
"mean_time_ms": 525.0,
"stddev_ticks": 13.52,
"stddev_time_ms": 135.2,
"completion_rate": 1.0000,
"mean_avg_tci": 0.927,
"mean_avg_dscore": 0.667,
"mean_real_time_us": 13.60
}- C99 compiler (GCC or Clang)
- Python 3.8+ with
matplotlib,numpy, andpandasinstalled - Linux environment with
clock_gettimesupport (_POSIX_C_SOURCE=199309L)
# Build and run benchmark suite, generate charts
make bench_charts
# Or step by step:
make benchmark # Build the benchmark binary
make bench_run # Run benchmarks (generates JSON reports)
make bench_charts # Run benchmarks + generate chartsbenchmarks/
├── README.md ← This file
├── GUIDE.md ← Public test harness user guide
├── LICENSE ← Axiom Public License v1.0
├── riptar_api.h ← Public kernel API header
├── riptar_kernel_obfuscated.c ← Obfuscated kernel implementation (IP protected)
├── public_test.c ← Public test harness (try the engine!)
├── public_config.h ← Engine configuration template
├── sample_input.csv ← Sample nominal startup data
├── generate_test_data.py ← Synthetic data generator
├── benchmark.c ← Benchmark source (public API only)
├── chart.py ← Python chart generator
├── bench_riptar ← Compiled benchmark binary
├── report_comprehensive.json ← Full per-run data
├── report_summary.json ← Aggregated summary
└── charts/
├── 01_completion_rate.png/pdf
├── 02_mean_time_ms.png/pdf
├── 03_tci_dscore.png/pdf
├── 04_real_time_overhead.png/pdf
├── 05_ticks_distribution.png/pdf
├── 06_stability_error_bars.png/pdf
├── 07_radar_key_scenarios.png/pdf
└── 08_heatmap_normalised.png/pdf
Each benchmark scenario simulates sensor input signals and feeds them through the kernel's public API. The simulation runs in a tight loop (one tick = 10 ms simulated time) until the kernel either:
- Completes the full startup sequence (success)
- Aborts due to timeout (5000 ticks) or a REJECT verdict
Timing is measured with clock_gettime(CLOCK_MONOTONIC) for high-resolution wall-clock overhead. Random seeds are derived from time() XOR'd with the scenario pointer and run index to ensure reproducible results.
The benchmark source (benchmark.c) calls only the kernel's public interface functions and data structures — no internal or proprietary logic is duplicated or exposed.
This benchmark suite measures the external behaviour of the Rip-tar Logic Kernel through its public API. No proprietary algorithms, internal source code, or confidential IP is disclosed. The benchmark source, reports, and charts are aggregate metrics only.
For questions, contact the project maintainers.
"A model that correctly issues REJECT on an ambiguous input is performing better than a model that confidently outputs the wrong answer."
