Jarvis Architecture

🇰🇷 Korean Version: README_KO.md

A validator-first LLM execution architecture for safer task automation systems.

Overview

Jarvis Architecture is an experimental LLM orchestration architecture that separates:

Natural language understanding
Semantic validation
Execution authority

Instead of directly trusting LLM outputs, Jarvis treats the LLM as an intent extraction component while deterministic validators and service layers decide whether execution is allowed.

The core idea is:

Structured output alone is not sufficient for safe execution.

Motivation

Modern LLM agents commonly use the following flow:

User Input
→ LLM
→ Structured Output (JSON / Function Calling)
→ Execute

However, schema-valid outputs are often still:

Semantically invalid
Contradictory
Dangerous
Ambiguous

Examples:

"Schedule an event on February 31."
"Delete everything but don't delete it."
"Remind me later."
"{ "action": "delete_all" }"

Jarvis Architecture attempts to reduce unsafe execution through deterministic validation and policy enforcement.

Architecture

Flow:

User Input
→ Intent Classification
→ Structured Extraction
→ Semantic Validator
→ Policy Gate
→ Service Layer
→ Execution

Validator responsibilities:

Datetime validation
Temporal consistency checking
Contradictory command detection
Dangerous action blocking
Ambiguous request detection
Clarification routing

Possible outcomes:

ACCEPT
REJECT
CLARIFY

Compared Architectures

This repository contains three intentionally simplified architectures.

1. Plain LLM Chatbot

Flow:

User Input
→ LLM
→ Execute

Characteristics:

Direct trust model
No structured output
No semantic validation

2. Structured Output Agent

Flow:

User Input
→ LLM
→ Structured JSON
→ Schema Parse
→ Execute

Characteristics:

Structured output
Basic schema parsing
No semantic verification

3. Validator-First Agent (Jarvis)

Flow:

User Input
→ LLM Intent Extraction
→ Semantic Validator
→ Policy Gate
→ Execute

Characteristics:

Deterministic validation
Execution authority separation
Clarification support
Dangerous request rejection

Experimental Results

Benchmark:

55 multilingual scheduling requests
Korean / English / Chinese
Valid / Invalid / Dangerous / Ambiguous requests

Results:

Architecture	Accuracy	False Accept
Plain LLM Chatbot	25.45%	38
Structured Output Agent	29.09%	37
Validator-First Agent	72.73%	3

Key finding:

Structured formatting alone does not provide semantic safety.

Repository Structure

/agent_resp_judge
/baseline_test_runner
/logs
/plain-agent
/structured-agent
/validator_first_agent

Research Goal

This project is not intended to compete with frontier LLM capability.

The goal is to explore:

Safer execution architectures
Deterministic semantic validation
Separation between language understanding and execution authority
Reliability improvements using system architecture rather than larger models

Limitations

Current limitations include:

Small benchmark size
Rule-based validators
Limited task domain
Simplified execution environment
Incomplete multilingual handling

This repository should be viewed as an experimental prototype and research toy project.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jarvis Architecture

Overview

Motivation

Architecture

Compared Architectures

1. Plain LLM Chatbot

2. Structured Output Agent

3. Validator-First Agent (Jarvis)

Experimental Results

Repository Structure

Research Goal

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent_resp_judge		agent_resp_judge
baseline_test_runner		baseline_test_runner
logs		logs
plain-agent		plain-agent
structured-agent		structured-agent
validator_first_agent		validator_first_agent
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_KO.md		README_KO.md

Folders and files

Latest commit

History

Repository files navigation

Jarvis Architecture

Overview

Motivation

Architecture

Compared Architectures

1. Plain LLM Chatbot

2. Structured Output Agent

3. Validator-First Agent (Jarvis)

Experimental Results

Repository Structure

Research Goal

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages