Autonomous Risk Monitoring Infrastructure for AI Agents On-Chain
Three-layer risk evaluation pipeline that detects and blocks malicious AI agent actions before they execute on-chain — combining on-chain compliance checks, behavioral risk scoring, and multi-AI consensus through Chainlink CRE.
Primary Track: Risk & Compliance — on-chain policy enforcement, behavioral anomaly detection, severity-based incident response, Proof of Reserves Also Applying: CRE & AI (8 CRE primitives, 3 trigger types + dual-AI consensus) · Privacy (ConfidentialHTTPClient + TEE) · Tenderly (Virtual TestNet deployment + Simulation API)
| Live Dashboard | bun run mock-api && bun run dashboard → http://localhost:3000 |
| Presentation Mode | http://localhost:3000/presentation (10 interactive slides) |
| Tenderly Explorer | Virtual TestNet Transactions |
| Contracts | 0x5F938e4c62991Eb4af3Dd89097978A1f376e6CC8 (Guardian) · 0xFA7deF53FEaC45dB96A5B15C32ca4E6B009b25e6 (Registry) |
| Deployer | 0x23fC03ec91D319e4Aa14e90b6d3664540FDf2446 |
- Files Using Chainlink — every file that integrates a Chainlink service
- The Problem — why AI agents need decentralized risk monitoring
- Architecture — three-layer defense pipeline diagram
- Three-Layer Defense — compliance pre-check, behavioral scoring, multi-AI consensus
- Chainlink Services — Deep Integration — 8 CRE primitives, 3 trigger types + Data Feeds + Automation
- Privacy & Confidential Compute — TEE-backed evaluation via ConfidentialHTTPClient
- Attack Coverage — 14 demo scenarios across 3 phases
- Smart Contracts — SentinelGuardian, PolicyLib, AgentRegistry + 90 tests
- Interactive Risk Monitoring Dashboard — 4-tab dashboard overview
- Tenderly Integration — Deep Usage — Virtual TestNet, Simulation API, monitoring, debugging
- Quick Start — clone, install, run
- Tech Stack
- What Makes SentinelCRE Different — 7 differentiators
Required by hackathon submission rules — every file that integrates a Chainlink service.
| File | Chainlink Services Used |
|---|---|
sentinel-workflow/main.ts |
HTTPClient, ConfidentialHTTPClient, EVMClient (callContract, writeReport, filterLogs, headerByNumber, logTrigger), HTTPCapability, CronCapability, ConsensusAggregationByFields, identical, median, json, bigintToProtoBigInt, protoBigIntToBigint, encodeCallMsg, getNetwork, LAST_FINALIZED_BLOCK_NUMBER, Runner, handler — full CRE workflow with 3 trigger types (HTTP + Cron + Log), on-chain read/write/query, dual-AI consensus, and confidential compute |
sentinel-workflow/behavioral.ts |
Pure behavioral engine executed inside CRE workflow context — 7 anomaly dimensions scored during CRE pipeline (no async, no Date.now(), CRE WASM compatible) |
sentinel-workflow/workflow.yaml |
CRE workflow configuration — local and staging settings, workflow path, config path, secrets path |
| File | Chainlink Services Used |
|---|---|
contracts/src/SentinelGuardian.sol |
Receives CRE workflow verdicts via processVerdict() (WORKFLOW_ROLE), enforces on-chain policy, processes challenge resolutions from CRE re-evaluation. Automation-ready finalizeExpiredChallenge() for Chainlink Automation (checkUpkeep/performUpkeep pattern) |
contracts/src/libraries/PolicyLib.sol |
checkReserves() calls Chainlink Data Feeds via IAggregatorV3.latestRoundData() for Proof of Reserves — verifies reserve backing before mints with cumulative tracking, staleness checks, and configurable collateralization ratios |
contracts/src/interfaces/IAggregatorV3.sol |
Chainlink AggregatorV3Interface — latestRoundData() + decimals() for Proof of Reserves data feed integration |
| File | Chainlink Services Used |
|---|---|
config/sentinel.config.json |
Production CRE config — enableConfidentialCompute: true, AI endpoints, chain selector, contract addresses |
config/sentinel.local.config.json |
Local dev CRE config — deterministic evaluation endpoints, enableConfidentialCompute: false fallback |
project.yaml |
CRE project settings — RPC endpoints for ethereum-testnet-sepolia |
| File | Chainlink Services Used |
|---|---|
contracts/test/ProofOfReserves.t.sol |
10 tests for Chainlink Data Feeds integration — reserve verification, cumulative drain prevention, feed price updates, collateralization ratios, stale data handling |
contracts/test/mocks/MockV3Aggregator.sol |
Mock Chainlink AggregatorV3 for testing Proof of Reserves without live feed |
contracts/test/SentinelGuardian.t.sol |
45 tests exercising CRE verdict processing, policy enforcement, circuit breakers |
contracts/test/Integration.t.sol |
Full lifecycle tests — register → CRE verdict → freeze → challenge → CRE re-evaluation → resolve |
| File | Chainlink Services Used |
|---|---|
dashboard/src/lib/contracts.ts |
ABIs for SentinelGuardian + AgentRegistry — reads CRE-written on-chain state (verdicts, policies, incidents) |
dashboard/src/app/api/agents/route.ts |
Reads agent data from CRE-managed contracts via Tenderly RPC |
dashboard/src/app/api/incidents/route.ts |
Reads incident history written by CRE workflow verdicts |
dashboard/src/app/api/evaluate/route.ts |
Forwards proposals to AI evaluation server (mirrors CRE workflow pipeline) |
dashboard/src/app/api/challenge/route.ts |
Submits challenge appeals for CRE re-evaluation |
| File | Chainlink Services Used |
|---|---|
docs/CONFIDENTIAL-COMPUTE.md |
Deep dive on ConfidentialHTTPClient integration, TEE boundaries, Vault DON secret templates |
docs/CRE_INTEGRATION.md |
Full CRE services integration reference (HTTPClient, EVMClient, CronCapability, ConsensusAggregationByFields) |
docs/ARCHITECTURE.md |
3-layer defense architecture with CRE pipeline diagrams |
docs/SECURITY_MODEL.md |
Fail-safe design principles, severity classification, CRE-powered challenge resolution |
docs/INTEGRATION-GUIDE.md |
Step-by-step onboarding: contract deployment, agent registration, policy configuration, behavioral learning, monitoring |
docs/CHALLENGES.md |
Engineering challenges: CRE SDK early adoption patterns, behavioral scoring math, CC transparency design, Next.js build issues |
AI agents are executing real on-chain actions today — DeFi swaps, token mints, contract upgrades. When these agents are compromised, there is no decentralized risk monitoring layer to stop them.
This isn't theoretical — $3.4B+ stolen across these exploits alone:
| Exploit | Loss | What SentinelCRE Would Have Caught |
|---|---|---|
| Ronin Bridge (2022) | $625M | Unauthorized transfers — value limit + contract whitelist + AI pattern detection |
| Poly Network (2021) | $611M | Cross-chain exploit — function blocklist + behavioral anomaly (contract diversity) |
| Wormhole (2022) | $320M | Unauthorized minting — mint cap + Proof of Reserves check |
| Euler Finance (2023) | $197M | Flash loan attack — value limit + velocity detection + AI consensus |
| Nomad Bridge (2022) | $190M | Replay exploit — rate limiting + cumulative volume tracking |
| Beanstalk (2022) | $182M | Governance manipulation — function blocklist + behavioral scoring |
| Mango Markets (2022) | $114M | Oracle manipulation — value limit + sequential probing detection |
2025 — The problem is accelerating:
| Incident | Loss | What SentinelCRE Would Have Caught |
|---|---|---|
| Bybit Hack (Feb 2025) | $1.5B | Largest single crypto theft — value limit + behavioral anomaly (unprecedented withdrawal pattern) + AI consensus |
| Moonwell Exploit (Feb 2025) | $1.78M | AI-generated oracle bug — target whitelist + value limit + dual-AI recognizes oracle manipulation pattern |
| AIXBT Hack (Mar 2025) | $106K | Dashboard compromise at 2 AM — time-of-day anomaly +10, behavioral scoring catches off-hours drain pattern |
| Anthropic Research (2025) | $1.22/exploit | AI agents autonomously exploit 50%+ of historically attacked contracts — sequential probing +35 catches binary-search pattern, Confidential Compute hides thresholds |
Current solutions are reactive incident response. Kill switches fire after the damage is done. Monitoring dashboards show you the attack in progress. SentinelCRE is proactive risk prevention — every action is evaluated through three independent defense layers before it touches the chain.
flowchart TB
A["AI Agent Proposes Action"] --> B["CRE HTTP Trigger"]
subgraph CRE["Chainlink CRE Workflow"]
B --> L1
subgraph L1["Layer 1: Compliance Pre-Check"]
P1["Value Limits"] --> P2["Contract Whitelist"]
P2 --> P3["Function Blocklist"]
P3 --> P4["Rate Limits"]
P4 --> P5["Mint Caps"]
P5 --> P6["Proof of Reserves"]
end
L1 -->|"Pass"| L2
subgraph L2["Layer 2: Behavioral Risk Scoring"]
B1["7 Anomaly Dimensions"]
B2["Origin Baseline Comparison"]
B3["Risk Score 0-155"]
end
L2 --> L3
subgraph L3["Layer 3: Multi-AI Consensus"]
subgraph TEE["ConfidentialHTTPClient (TEE Enclave)"]
M1["Claude Evaluation"]
M2["Secondary Model"]
end
M3["Both Must APPROVE"]
end
end
L3 -->|"APPROVED"| D["On-Chain: Action Forwarded"]
L3 -->|"DENIED"| E["On-Chain: Agent Frozen + Incident Logged"]
L1 -->|"Fail"| E
style L1 fill:#10b981,color:#fff
style L2 fill:#f59e0b,color:#fff
style L3 fill:#6366f1,color:#fff
style E fill:#ef4444,color:#fff
No single point of failure. Even if both AI models are compromised, Layer 1 catches policy violations. Even if an action passes policy, Layer 2 catches behavioral anomalies. All three layers must agree before any action executes.
For detailed architecture diagrams, CRE pipeline breakdown, and defense layer analysis, see docs/ARCHITECTURE.md.
Seven hard-coded risk controls in PolicyLib.sol that no AI can override:
| # | Check | What It Prevents |
|---|---|---|
| 1 | Transaction value limits | Single large unauthorized transfers |
| 2 | Contract address whitelist | Interaction with unknown/malicious contracts |
| 3 | Function signature blocklist | Dangerous operations (upgradeTo, selfdestruct) |
| 4 | Rate limiting (per-window) | Burst attacks and rapid-fire exploits |
| 5 | Daily volume caps | Gradual drain across many small transactions |
| 6 | Mint amount caps | Infinite mint attacks |
| 7 | Proof of Reserves (Data Feed) | Ensures reserve backing before mints — prevents cumulative depletion |
These run on-chain inside SentinelGuardian.processVerdict(). If any check fails, the action is denied immediately — the AI models never even see it.
A 7-dimension anomaly detection engine that learns what "normal" looks like for each agent and flags deviations:
| # | Dimension | Max Score | Trigger |
|---|---|---|---|
| 1 | Value Deviation | +25 | Transaction value > 2.5 standard deviations from agent's historical mean |
| 2 | Contract Diversity | +20 | First interaction with an unknown contract |
| 3 | Velocity | +15 | Action submitted < 50% of agent's expected interval |
| 4 | Function Pattern | +30 | Unusual function signature for this agent's history |
| 5 | Time-of-Day | +10 | Activity outside the agent's established operating hours |
| 6 | Sequential Probing | +35 | 3+ monotonically increasing values — binary search pattern detection |
| 7 | Cumulative Drift | +20 | Rolling average has drifted > 3σ from frozen origin baseline |
| Total Possible | 155 | Risk threshold: 50 |
Origin Baseline Freezing: After an agent's first N approved actions, its behavioral baseline (mean value, standard deviation) is frozen permanently. All future actions are compared against this origin — preventing slow drift attacks where an agent gradually poisons its own baseline over hundreds of transactions. The freeze window is configurable via originWindowSize per agent profile — the demo uses 5 actions for a compact walkthrough, but production deployments should use 20–50+ actions for statistical robustness.
"The agent was escalating 5-15% per transaction. Each individual step was within normal bounds. The cumulative drift caught what no single check could see."
"The agent never reached the actual threshold. It was caught probing for it."
Two independent AI models evaluate every action with the behavioral risk analysis injected into their context:
- Claude and a secondary model each receive the proposal, the on-chain policy, and the behavioral anomaly breakdown
- Both must independently return APPROVED — a single denial triggers a block
- CRE's
ConsensusAggregationByFieldsensures all DON nodes agree on the verdict field-by-field ConfidentialHTTPClient(@chainlink/cre-sdkv1.0.9) hides API keys, evaluation prompts, and risk thresholds inside a TEE — agents receive only APPROVED/DENIED with zero information about boundaries, criteria, or AI consultation
For fail-safe design principles and severity classification details, see docs/SECURITY_MODEL.md.
We use 8 CRE primitives across 3 trigger types plus Data Feeds and Automation-ready hooks:
| Service | How We Use It | Risk & Compliance Value |
|---|---|---|
| CRE HTTPClient | Calls 2 AI models with ConsensusAggregationByFields |
DON-level BFT consensus on AI risk verdicts — no single node can approve a malicious action |
| CRE ConfidentialHTTPClient | TEE-backed AI calls — API keys injected via Vault DON {{ANTHROPIC_API_KEY}} template, prompts and responses stay inside enclave |
Agents cannot see evaluation prompts, behavioral scoring weights, or AI reasoning. Feature-flagged via enableConfidentialCompute config |
| CRE EVMClient | callContract() reads policies, writeReport() writes verdicts, filterLogs() queries incident history, headerByNumber() confirms chain liveness |
Automated compliance enforcement with immutable audit trail |
| CRE EVMClient LogTrigger | Event-driven handler reacts to CircuitBreakerTriggered and ActionDenied events in near-real-time |
Immediate threat response — no polling delay, 3rd trigger type |
| CRE CronCapability | Periodic health checks with chain liveness + incident scanning | Proactive risk monitoring beyond request-response |
| Data Feeds | AggregatorV3Interface for Proof of Reserves |
Real-time reserve verification before mints — cumulative tracking prevents gradual reserve depletion |
| Automation | finalizeExpiredChallenge() uses checkUpkeep/performUpkeep |
Expired compliance appeal windows auto-finalize without manual intervention |
CRE's ConsensusAggregationByFields is the critical enabler. It ensures the dual-AI consensus isn't just application logic — it's enforced at the DON level. Combined with ConfidentialHTTPClient hiding the evaluation prompts inside a TEE, this creates a risk evaluation pipeline where:
- Agents can't see behavioral scoring weights or AI evaluation criteria (
ConfidentialHTTPClientkeeps Layer 2/3 logic inside the enclave) - Agents can't game the AI evaluators (prompts never leave the TEE)
- API keys are injected via Vault DON secret templates (
{{ANTHROPIC_API_KEY}},{{OPENAI_API_KEY}}), never exposed to node operators - No single DON node can override the consensus (BFT aggregation)
- All verdicts are written immutably on-chain (audit trail)
For the full CRE services integration reference (HTTPClient, ConfidentialHTTPClient, EVMClient with 5 methods, CronCapability, LogTrigger, ConsensusAggregationByFields), see docs/CRE_INTEGRATION.md.
SentinelCRE uses Chainlink Confidential Compute to make the behavioral scoring and AI evaluation layers opaque to agents and node operators. Layer 1 policy parameters are on-chain by design (transparent compliance), while Layer 2 behavioral thresholds and Layer 3 AI reasoning stay hidden inside the TEE.
Traditional AI agent guardrails leak information. If an agent can observe its own limits — policy thresholds, evaluation prompts, AI reasoning — it can systematically probe, evade, or manipulate those guardrails. This is not theoretical:
| Leak Vector | What an Agent Learns | How It Exploits It |
|---|---|---|
| On-chain policy parameters | Exact value/mint limits | Submits transactions at 99.9% of the limit |
| Evaluation prompts | What criteria the AI evaluator checks | Crafts actions that satisfy every listed criterion |
| AI model responses | Confidence scores, reasoning | Reverse-engineers guardrail boundaries via binary search |
| API keys in transit | Credentials for AI model providers | Exfiltrates keys for unauthorized model access |
SentinelCRE's Layer 3 (Multi-AI Consensus) executes inside a TEE (Trusted Execution Environment) via CRE's ConfidentialHTTPClient. The entire evaluation — prompt construction, API calls, AI responses, verdict reasoning — stays inside the enclave. Only a single bit exits: APPROVED or DENIED.
┌─────────────────────────────────── TEE Enclave ───────────────────────────────────┐
│ │
│ 1. Behavioral scoring weights + anomaly threshold (hidden from agent) │
│ 2. AI evaluation prompt with risk context (hidden from agent + node operators) │
│ 3. API keys resolved from Vault DON: {{ANTHROPIC_API_KEY}}, {{OPENAI_API_KEY}} │
│ 4. Claude + GPT-4 evaluate independently (reasoning hidden) │
│ 5. ConsensusAggregationByFields enforces DON-level BFT agreement │
│ │
└────────────────────────────────────────────────────────────────────────────────────┘
│
APPROVED / DENIED
│
Agent sees ONLY this
Implementation (live in sentinel-workflow/main.ts):
// Feature-flagged: enableConfidentialCompute in config
const confClient = new ConfidentialHTTPClient();
// Model 1: Claude via Anthropic API
const confRequest1 = {
vaultDonSecrets: [{ key: 'ANTHROPIC_API_KEY', namespace: 'sentinel' }],
request: {
url: config.aiEndpoint1,
method: 'POST',
bodyString: claudeBody, // Contains evaluation prompt — hidden inside TEE
multiHeaders: {
'x-api-key': { values: ['{{ANTHROPIC_API_KEY}}'] },
'anthropic-version': { values: ['2023-06-01'] },
},
},
};
// Model 2: GPT-4 via OpenAI API — independent model for consensus diversity
const confRequest2 = {
vaultDonSecrets: [{ key: 'OPENAI_API_KEY', namespace: 'sentinel' }],
request: {
url: config.aiEndpoint2,
method: 'POST',
bodyString: gptBody,
multiHeaders: {
Authorization: { values: ['Bearer {{OPENAI_API_KEY}}'] },
},
},
};
// Both AI model calls execute inside the enclave
const response1 = sendRequester.sendRequest(confRequest1).result();
const response2 = sendRequester.sendRequest(confRequest2).result();| Data | Without Confidential Compute | With Confidential Compute |
|---|---|---|
| Layer 1 policy params (value limits, mint caps) | On-chain, publicly readable | On-chain, publicly readable (transparent compliance) |
| Layer 2 behavioral thresholds (weights, anomaly score) | Embedded in prompts visible to node operators | Hidden inside TEE — only verdict exits |
| Behavioral risk scores | Included in AI context, visible during execution | Computed and consumed entirely within enclave |
| AI evaluation prompts | Visible to DON node operators | Never leave the TEE |
| API keys (Anthropic + OpenAI) | Passed as secrets, visible during execution | Injected via Vault DON {{TEMPLATE}} syntax inside enclave |
| AI model reasoning | Confidence scores and explanations visible | Only APPROVED/DENIED exits — no reasoning leaked |
| Agent behavioral baselines | Accessible through prompt inspection | Opaque — agent cannot learn its own profile |
In regulated environments (DeFi protocols, institutional custody, tokenized assets), the security model itself is sensitive. Revealing guardrail parameters to monitored agents is equivalent to publishing the alarm codes alongside the security system. Confidential Compute ensures that:
- Agents can't game what they can't see — On-chain policy limits are transparent (agents can read them), but behavioral scoring criteria and AI evaluation logic are invisible. Knowing the value limit doesn't help bypass sequential probing detection or AI consensus
- Node operators can't front-run — Even DON participants can't extract policy parameters during execution
- Forensic data stays access-controlled — Incident reports with guardrail details don't leak via on-chain transparency
- Compliance auditors get cryptographic proof — TEE attestation proves the correct evaluation was performed without revealing the evaluation itself
For full integration details, see docs/CONFIDENTIAL-COMPUTE.md.
The demo follows a narrative arc across three phases — baseline operations train the system, obvious attacks validate policy enforcement, then edge cases prove behavioral analysis catches what traditional security misses:
Phase 1 — Training Baseline (all APPROVED — the system learns what "normal" looks like):
| # | Scenario | Purpose |
|---|---|---|
| 1 | Normal Trade (0.5 ETH swap) | Establishes value baseline |
| 2 | Normal Mint (500K tokens) | Establishes mint baseline |
| 3 | Token Approval | Establishes contract interaction baseline |
Phase 2 — Policy Violations (all DENIED — obvious attacks any risk system should catch):
| # | Attack | Detection Method |
|---|---|---|
| 1 | Compromised Wallet Drain (100 ETH) | Value policy violation |
| 2 | Infinite Mint (1B tokens) | Mint cap + AI pattern detection |
| 3 | Prompt Injection | Dual-AI independently detects injection |
| 4 | Flash Loan + Oracle Manipulation | Target whitelist + value limit + AI |
| 5 | Stealth Proxy Upgrade | Function blocklist (defense-in-depth) |
Phase 3 — Edge Cases: Would Slip Traditional Security (all DENIED — only behavioral risk scoring catches these):
These attacks pass every policy check. Values are within limits, targets are approved, functions are whitelisted. Traditional rule-based systems would approve them all.
| # | Attack | Why Traditional Security Misses It | How Behavioral Analysis Catches It |
|---|---|---|---|
| 6 | Sequential Probing (2→4→8 ETH) | Each value is within the 1 ETH policy limit individually | Sequential Probing +35 — detects monotonically increasing binary search pattern |
| 7 | Off-Hours Emergency Drain | 0.8 ETH on approved DEX with whitelisted function | Time-of-Day +10 — agent has never operated at 3 AM before |
| 8 | Velocity Burst (16 rapid txns) | Each 0.1 ETH transaction passes all policy checks | Velocity +15 — action interval is 5x faster than agent's baseline |
| 9 | Multi-Contract Scatter | Small 0.3 ETH value, legitimate function signature | Contract Diversity +20 — first-ever interaction with unknown contract |
| 10 | Sybil Coordination (900K mint) | 900K is under the 1M policy cap | Value Deviation +25 — near-max value is statistically anomalous for this agent |
| 11 | Slow Drift Injection (0.5→2.5 ETH) | Each step is within 1σ of the rolling average | Cumulative Drift +20 — rolling avg has drifted 400% from frozen origin baseline |
bun run demo:normal # 4 legitimate actions → all APPROVED
bun run demo:rogue # 10 attack scenarios → all DENIED
bun run demo:probing # Sequential probing → caught at probe 3
bun run demo:drift # 20-step slow drift → caught at ~probe 16| Contract | Lines | Purpose |
|---|---|---|
| SentinelGuardian.sol | 510 | Core risk engine — AccessControl + Pausable. Processes CRE verdicts, enforces compliance policy, triggers circuit breakers, manages agent lifecycle, handles challenge windows |
| PolicyLib.sol | 166 | Validation library — 7 independent compliance checks + checkAll() batch validator. Uses CheckParams struct to avoid stack-too-deep |
| AgentRegistry.sol | 59 | Agent metadata registry — name, description, owner. Separated for independent upgrades |
Not every denial is correct. SentinelCRE includes structured due process:
Denial → Severity classified (Low / Medium / Critical)
→ Critical (infinite mint, 10x value): permanent freeze, no appeal
→ Medium: 30-minute appeal window
→ Low: 1-hour appeal window
→ CHALLENGER_ROLE files appeal
→ CRE re-evaluates with adjusted risk thresholds
→ Overturned: agent unfrozen | Upheld: agent stays frozen
This mirrors real-world financial compliance — suspicious transactions are held for review, not permanently blocked, unless the threat is critical.
| Suite | Tests | Coverage |
|---|---|---|
| SentinelGuardian | 45 | Registration, verdict processing, policy enforcement, circuit breaker, freeze/unfreeze/revoke, rate limits, daily volume, cumulative mints |
| Challenge | 14 | Severity classification, appeal flow, resolution (overturn/uphold), expiry, authorization |
| Proof of Reserves | 10 | Reserve verification, cumulative drain prevention, feed price updates, collateral ratios |
| AgentRegistry | 8 | Registration, enumeration, duplicate prevention, metadata |
| Integration | 8 | Full lifecycle: register → approve → deny → freeze → challenge → resolve |
cd contracts && forge test -v
# [PASS] 90 tests across 5 suitesFor step-by-step onboarding (deployment, agent registration, policy configuration, monitoring), see docs/INTEGRATION-GUIDE.md. For development challenges and how we solved them, see docs/CHALLENGES.md.
Four tabs built with Next.js 15 + React 19 + Tailwind CSS 4 (Architecture opens first for demo video flow):
| Tab | What It Shows |
|---|---|
| Architecture | Detailed reference — problem statement with real DeFi exploits ($625M Ronin, $320M Wormhole, $114M Mango Markets), three-layer defense diagram, 8-step verdict pipeline, 7 Chainlink integration cards with LIVE/READY status, expandable smart contracts with Solidity code snippets, 7 behavioral dimension breakdown with weight bars, and tech stack grid |
| Demo | Narrative walkthrough — 3 baselines train the system, then 11 escalating attacks are detected. Shows 8-step CRE pipeline animation, dual-AI verdicts (Claude + GPT-4), and 7-dimension behavioral risk breakdown |
| Guardian | Rich agent profiles (TradingBot + MintBot) with behavioral score trend sparklines (green→red), session performance metrics (100% detection rate, 0% false positive rate, attack $ prevented), defense analytics charts (donut, severity bars, risk histogram, defense layer stacked bar), threat timeline with phase dividers, wallet addresses, and filterable incident log |
| Simulator | Behavioral Training Ground — pick an agent, run safe actions (score stays low), then run attacks (score spikes). Cumulative behavioral score meter with CSS gradient gauge. At score 70+, AGENT LOCKOUT fires on-chain via processVerdict. Reset to retrain |
Tenderly is not just our deployment target — it powers the entire development, simulation, and monitoring pipeline. SentinelCRE uses 4 Tenderly capabilities: Virtual TestNet, Simulation API, Transaction Debugging, and Live Monitoring.
Contracts are deployed on Tenderly's Virtual Sepolia TestNet with pre-funded accounts:
| Contract | Address |
|---|---|
| SentinelGuardian | 0x5F938e4c62991Eb4af3Dd89097978A1f376e6CC8 |
| AgentRegistry | 0xFA7deF53FEaC45dB96A5B15C32ca4E6B009b25e6 |
| Deployer | 0x23fC03ec91D319e4Aa14e90b6d3664540FDf2446 |
Why Virtual TestNet was essential:
- No faucet hunting — pre-funded accounts with unlimited ETH, zero setup friction
- Instant transactions — no block confirmation delays during development/demo
- Persistent state — contracts stay deployed across sessions. Judges can inspect all historical transactions without re-deploying
- Sepolia fork — real network conditions (EVM version, gas pricing, precompiles) without public testnet unreliability
Every demo verdict fires real processVerdict() and unfreezeAgent() transactions to Tenderly. All 14 demo scenarios + 28 enterprise simulator scenarios produce verifiable on-chain state.
The dashboard's Simulator tab is powered by Tenderly's Simulation API via a 244-line client library (dashboard/src/lib/tenderly.ts):
| Feature | What It Returns | How SentinelCRE Uses It |
|---|---|---|
simulateTransaction() |
Gas used, success/revert, decoded events | Every processVerdict() call is simulated before display — judges see exact gas costs and emitted events |
simulateBundle() |
Sequential multi-tx simulation with shared state | Enterprise simulator runs attack sequences (e.g., 3 probing txns) with each tx seeing the state from the previous one |
| State diff parsing | Storage slot before/after values | Dashboard shows which contract storage slots changed (agent state, incident count, frozen status) |
| Balance change tracking | ETH balance diffs per address | Visualizes the economic impact of each simulated action |
| Call trace recursion | Full internal call tree (CALL, STATICCALL, DELEGATECALL) | Dashboard renders the execution path through SentinelGuardian → PolicyLib → external calls |
API route: /api/simulate — accepts either a structured proposal (auto-encodes processVerdict calldata) or custom calldata for arbitrary contract interaction.
The TenderlyFeedPanel component provides real-time on-chain visibility:
- Polls every 12 seconds via
/api/tenderly— scans the last 60 blocks for transactions to Guardian and Registry contracts - Color-coded function names —
processVerdict(yellow),unfreezeAgent(cyan),registerAgent(green),grantRole(blue),updatePolicy(orange) - Transaction counts per contract — judges see cumulative on-chain activity at a glance
- Direct Explorer link — one click to view full transaction details, decoded calldata, and state changes in the Tenderly Explorer
Tenderly's transaction debugging was critical during development:
- Decoded call traces revealed exactly where
processVerdict()was reverting during PolicyLib integration — traced abytes32encoding mismatch that would have taken hours to debug from raw revert data alone - State diff inspection verified that circuit breaker logic (agent freeze, incident logging, severity classification) was writing to the correct storage slots
- Gas profiling informed optimization decisions:
processVerdict()approved path ~85K gas, denied path ~120K gas,registerAgent()~180K gas (dynamic array copy)
Judges can verify all on-chain activity — every demo verdict, agent registration, and policy update is permanently recorded on the Virtual TestNet: Tenderly Explorer
No testnet funds needed. The dashboard is pre-configured to use a Tenderly Virtual TestNet (Sepolia fork) with funded accounts. Clone, install, and run.
Prerequisites: Bun (runtime/package manager) · Foundry (for contract tests only)
# Install dependencies
cd SentinelCRE && bun install
# Run smart contract tests (90 tests)
cd contracts && forge test -v
# Start the risk monitoring dashboard (2 terminals)
bun run mock-api # Terminal 1: AI evaluation service + behavioral engine (port 3002)
bun run dashboard # Terminal 2: Dashboard (http://localhost:3000)
# CLI demo — legitimate agent behavior
bun run demo:normal # 4 legitimate actions → all APPROVED
# CLI demo — attack scenarios
bun run demo:rogue # 10 attack scenarios → all DENIED
bun run demo:probing # Sequential probing → caught at probe 3
bun run demo:drift # 20-step slow drift → caught at ~probe 16
# Reset behavioral profiles between demos
bun run behavioral:reset| Layer | Technology |
|---|---|
| Smart Contracts | Solidity 0.8.24, Foundry, OpenZeppelin v5.5.0 |
| CRE Workflow | Chainlink CRE SDK v1.0.9 (ConfidentialHTTPClient + HTTPClient + EVMClient with callContract/writeReport/filterLogs/headerByNumber/logTrigger), TypeScript, Bun |
| Behavioral Engine | Pure TypeScript, 7 statistical dimensions |
| Dashboard | Next.js 15, React 19, Tailwind CSS 4, viem |
| Simulation & Deployment | Tenderly Virtual TestNet (RPC, persistent state), Simulation API (gas profiling, state diffs, call traces), live tx monitoring |
| Testing | Foundry (forge test), 90 tests across 5 suites |
| Security Analysis | Slither (Trail of Bits) — 0 critical, 0 high findings (details) |
-
Proactive risk prevention, not reactive incident response — Every action is evaluated through three independent layers before it touches the chain. By the time you see an alert, the threat is already blocked. See
docs/ARCHITECTURE.md. -
Three-layer defense with no single point of failure — On-chain compliance checks catch policy violations. Behavioral scoring catches anomalous patterns. Multi-AI consensus catches context-dependent threats. No single layer is sufficient; together they're comprehensive. See
docs/SECURITY_MODEL.md. -
Deep CRE integration — 8 CRE primitives across 3 trigger types (HTTP, Cron, Log) + Data Feeds + Automation. Not a wrapper around a single Chainlink service. EVMClient used for reads (
callContract), writes (writeReport), event queries (filterLogs), chain liveness (headerByNumber), and event-driven triggers (logTrigger). ConsensusAggregationByFields enforces AI verdict consensus at the DON level. Seedocs/CRE_INTEGRATION.md. -
Confidential behavioral and AI evaluation — Layer 1 policy params are on-chain (transparent compliance), but Layer 2 behavioral scoring weights and Layer 3 AI evaluation prompts execute inside a TEE via
ConfidentialHTTPClient. API keys are injected from Vault DON secrets using{{TEMPLATE}}syntax. An agent can read its value limit from the contract, but it cannot see the 7 behavioral dimensions, the anomaly threshold, its own frozen baseline, or the AI evaluation criteria — so knowing Layer 1 limits doesn't help bypass Layers 2 and 3. Seedocs/CONFIDENTIAL-COMPUTE.md. -
Behavioral intelligence — Seven anomaly dimensions that learn per-agent baselines. Catches sophisticated attacks that pass every individual rule: sequential probing, slow drift injection, velocity bursts, off-hours activity. See
sentinel-workflow/behavioral.ts. -
Compliance due process — Severity-based appeal windows mirror real-world financial systems. Critical threats are permanently blocked; low-severity denials get a structured review process. See
docs/SECURITY_MODEL.md. -
Production-grade testing — 90 tests covering edge cases like cumulative mint drain, rate limit window resets, PoR collateral ratios, and challenge resolution flows. See
TECHNICAL.md.
Team: Willis Tang — @ProjectWaja | Project Waja License: MIT
