ARIA — Autonomous Reflective Intelligence Agent

AI agent with self-improvement, compressed memory, and an OpenClaw-style skill system. Runs locally with Ollama. Vibecoded with Claude Opus 4.6 and tested with codegemma:latest [[DO NOT EXPOSE THIS APPLICATION TO THE INTERNET]]

Architecture

aria-agent/
├── main.py                  # Entry point
├── config.json              # Configuration
├── core/
│   ├── agent.py             # Main orchestrator (Thread 1 + 2)
│   ├── config.py            # Configuration loader
│   ├── prompts.py           # All prompts & strings (PL/EN)
│   ├── memory.py            # Compressed memory (ST → LT) + RL neural network + pending tasks
│   ├── skills_manager.py    # Skill manager (SKILL.md format) + error logs
│   ├── computer.py          # System tools (shell, files, Python)
│   ├── llm.py               # Ollama/OpenAI LLM client (no timeouts, chunked responses)
│   ├── reflection.py        # Background reflection thread + system exploration + task processing
│   └── logger.py            # Central event logging
├── web/
│   ├── server.py            # WebUI HTTP server + API
│   └── static/index.html    # React SPA dashboard
├── skills/                  # Skills (OpenClaw/AgentSkills format)
└── memory/                  # Persistent memory (auto-generated)
    ├── memory.json
    ├── memory_network.json  # RL neural network weights
    ├── pending_tasks.json   # User tasks waiting for completion
    ├── reflections.jsonl
    ├── self_model.json
    └── logs/
        └── aria.log         # Full event log (rotating, 10MB)

Two Thinking Threads

Thread 1: Human Communication

Interactive REPL (CLI) or WebUI with real-time dashboard
Understands slash commands (/exec, /python, /read, /write, etc.)
JSON-structured Chain of Thought: Analyze → Memory → Skill Selection → Execute → Plan → Answer
Categorizes and weighs importance of every interaction (RL neural network scoring)
Detects when it can't answer and adds task to pending queue
Long responses auto-chunked across multiple messages (no truncation)

Thread 2: Reflection & Self-Improvement

Runs continuously in the background, independently from conversations
Cycles through phases: introspection, pattern analysis, skill planning, skill building, skill testing, self-improvement, knowledge synthesis, exploration, pending tasks, system exploration
Autonomously creates, improves, tests, and fixes skills (prefers improving over duplicating)
System exploration: safely discovers system environment via read-only shell commands (double safety check: hardcoded blocklist + LLM verification)
Proactive messaging: sends messages to user on any topic at any time (skill creation, task completion, observations, greetings)
Pending task processing: retries user tasks that failed, notifies user when completed
Skill error log clearing: clears error history after successful fix cycles
Updates the agent's self-model

LLM Communication (No Timeouts)

ARIA's LLM client has zero timeouts for Ollama — it relies entirely on Ollama's structured done flag to know when a response is complete. This eliminates mid-word truncation issues on slow hardware.

Streaming: reads chunks until {"done": true} — no socket/connection timeout
Non-streaming: internally uses streaming to avoid blocking
Truncation detection: checks done_reason == "length" and auto-continues
chat_long(): automatically splits responses truncated by token limit into multiple parts (up to 3), sending "Continue" to the model between parts

Skill System: Improve Before Create

Thread 2's skill building phase now prefers improving existing skills over creating new duplicates:

LLM receives full code of all existing skills as context
LLM must choose action: "improve" or action: "create" with justification
Duplicate detection: Jaccard similarity check on name+description — rejects >50% overlap with existing skills
Improvement workflow: backup old code → write new → test → rollback on failure
Creation workflow: only when truly new functionality is needed

System Exploration

Thread 2 periodically explores the host system to understand its environment:

LLM plans 3-5 read-only shell commands based on what it already knows
Double safety check before each command:
- Hardcoded blocklist: rm, sudo, dd, mkfs, shutdown, kill, chmod -R, > /, etc.
- LLM safety evaluation: assesses whether the command modifies files, installs packages, or could harm the system
Safe commands execute with 15s timeout
Results stored in memory as system_discovery category
Knowledge accumulates across cycles — LLM sees previous discoveries to avoid redundant exploration

Pending Tasks System

When the agent detects it can't fulfill a user request (e.g., missing a required skill), it:

Adds the task to pending_tasks.json with the reason and missing capability
Notifies the user that the task has been queued
Thread 2 periodically checks pending tasks during the pending_tasks phase
When the capability is available (skill created/fixed), Thread 2 executes the task
User receives a proactive message with the result
Tasks are automatically cleaned up after 48 hours

Skill Error Tracking

Each skill maintains an error log (in .stats.json):

Errors are recorded automatically when scripts fail
Thread 2 reads error logs to prioritize fixes
Error logs are cleared after successful fix cycles
View errors via /skill <name> or the Skills panel in WebUI

Chain of Thought (JSON-Structured)

All internal reasoning steps use strict JSON format to prevent CoT artifacts from leaking into user-facing responses:

Step 1: Analyze → {"intent": "...", "skills": [...], "can_answer": true/false, ...}
Step 2: Memory recall → relevant context
Step 3: Skill selection → {"selections": [{"name": "...", "args": [...], "reason": "..."}]}
Step 4: Execute skills → raw output
Step 5: Plan → {"key_info": "...", "format": "...", "approach": "..."}
Step 6: Interpret → {"summary": "...", "useful": true/false}
Step 7: Final answer → natural language response (no CoT artifacts)

RL Memory Network

Memory importance is scored by a lightweight reinforcement learning neural network that runs entirely on CPU:

Architecture: 28 features → 16 hidden (ReLU) → 1 output (sigmoid)
Training: REINFORCE-style policy gradient, single-step updates
Persistence: weights saved to memory/memory_network.json

Features extracted (28 total):

Text metrics: length, word count, lexical diversity, question/exclamation density
Content signals: has code, has URL, digit ratio, multiline, structured content
Category one-hot encoding (8 categories)
Metadata: interaction number, skill mentions, error mentions, source type

RL Rewards:

Entry recalled by user query: +1.0
User explicitly says "remember/zapamiętaj": +2.0
Entry with high access count during compression: +0.5
Entry never accessed with low importance: -0.5

Scoring blend: final_importance = 0.4 × heuristic + 0.6 × neural_network

The network improves over time — frequently useful memories get higher scores, low-value content gets filtered out.

Compressed Memory (LLM-Powered)

New interaction → RL importance scoring → Short-term memory
                                               ↓ (buffer full)
                                    LLM-generated summaries
                                               ↓
                                    Long-term compressed blocks
                                               ↓
                                    Episodic memory (key moments)

Short-term: Full details, max 20 entries, scored by RL network
Long-term: LLM-generated summaries with max character limit (falls back to rule-based if LLM unavailable)
Episodic: Breakthrough moments (new skills, discoveries)
Pending tasks: User requests awaiting completion
RL training: happens during compression (reward for accessed entries, penalty for never-used)
Semantic search: /recall <query> (also triggers RL reward for recalled entries)

Logging

All events are logged to memory/logs/aria.log (rotating, 10MB max, 3 backups):

2025-01-15 14:23:01 | INFO  | USER [5] | jak wygląda mój system?
2025-01-15 14:23:01 | DEBUG | COT/analyze | {"intent": "system info", ...}
2025-01-15 14:23:02 | INFO  | SKILL system-monitor/main.py exit=0 | CPU: 45%, RAM: 3.2GB...
2025-01-15 14:23:03 | INFO  | AGENT [5] skills=['system-monitor'] | Twój system wygląda...
2025-01-15 14:23:30 | INFO  | T2/system_exploration | Running: df -h (disk space)
2025-01-15 14:23:45 | INFO  | T2->USER | Ulepszyłem umiejętność: system-monitor...
2025-01-15 14:24:00 | INFO  | TASK/completed task-1705312981-0 | Free space: 45GB...

Log categories: USER, AGENT, COT/*, T2/*, T2->USER, SKILL, SKILL_ERR, TASK/*, CMD, ERROR, WARN

Skill System (OpenClaw Format)

Each skill is a folder with SKILL.md:

---
name: my-skill
description: When and why to use this skill
---

# Instructions in Markdown
...

Creating & Improving Skills

Manually: /create-skill {json} with name, description, instructions, script_code
Automatically: Thread 2 creates or improves skills based on user interaction patterns
Programmatically: skills_manager.create_skill(name, desc, instructions, scripts)

Language Support

ARIA supports Polish (pl) and English (en). All prompts, LLM instructions, UI strings, and command descriptions switch based on language setting.

Set language via:

Environment variable: ARIA_LANG=en (takes priority)
config.json: "agent": { "language": "en" }
Docker Compose: ARIA_LANG=en in environment section

Quick Start (Docker) RECOMMENDED

# Prerequisites: Ollama running on host with a model pulled
ollama pull llama3.2

# Start ARIA (Polish, default)
docker compose up -d

# Start ARIA in English
ARIA_LANG=en docker compose up -d

# Start with a different model
ARIA_MODEL=qwen3:8b docker compose up -d

# Open WebUI
open http://localhost:8080

# View agent logs
docker compose logs -f aria
# Or check the detailed log file: memory/logs/aria.log

Quick Start (Local)

# Install Ollama and pull a model
ollama pull llama3.2

# CLI mode
python main.py

# CLI with live Thread 2 output
python main.py --reflection

# WebUI on port 8080
python main.py --web

# WebUI on custom port
python main.py --web --port 3000

Configuration

Environment Variables

Variable	Default	Description
`ARIA_LANG`	`pl`	Language: `pl` or `en`
`ARIA_MODEL`	`codegemma:latest`	Ollama model name
`OLLAMA_HOST`	`http://host.docker.internal:11434`	Ollama server URL

config.json

{
    "llm": {
        "base_url": "http://localhost:11434",
        "model": "llama3.2",
        "temperature": 0.7,
        "max_tokens": 4096,
        "context_length": "auto",
        "api_type": "ollama",
        "timeout": 300
    },
    "reflection_llm": {
        "base_url": "http://localhost:11434",
        "model": "llama3.2",
        "temperature": 0.9,
        "max_tokens": 4096,
        "context_length": "auto",
        "api_type": "ollama",
        "timeout": 180
    },
    "agent": {
        "language": "en",
        "reflection_interval": 3,
        "stream_responses": true,
        "short_term_limit": 20
    }
}

Note: timeout values in config are retained for non-Ollama endpoints (OpenAI-compatible). Ollama communication uses no timeouts — it relies on the structured done flag.

Commands

Command	Description
`/help`	List all commands
`/status`	Agent status + Thread 2 info
`/memory`	Memory overview
`/recall <query>`	Search memory
`/skills`	List skills
`/skill <n>`	Skill details
`/run <skill>`	Run a skill script
`/create-skill <json>`	Create a new skill
`/tasks`	View pending tasks queue
`/thread2 [n]`	Show last n Thread 2 cycles
`/reflect`	Force a reflection cycle
`/exec <cmd>`	Execute shell command
`/python <code>`	Execute Python code
`/ls [path]`	List directory
`/read <path>`	Read file
`/write <path> <content>`	Write to file
`/sysinfo`	System information
`/compress`	Force memory compression
`/selfmodel`	Agent self-model
`/models`	List Ollama models
`/model <n>`	Switch active model
`/pull [model]`	Pull model from Ollama
`/ollama`	Ollama connection status

WebUI Features

Real-time chat with JSON-structured Chain of Thought (expandable)
Thread 2 live thought stream panel
Proactive messages from Thread 2 (skill creation/improvement, task completion, system discoveries)
Status dashboard with memory/skills/interaction/task stats
Command palette with autocomplete (type /)
Skill browser with run buttons and error counts
Memory inspector
Reflection timeline
SSE-based real-time updates

Inspiration

Architecture inspired by OpenClaw:

SKILL.md-based skill system (AgentSkills format)
Local command and script execution
Persistent memory across sessions
Modular, extensible architecture

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
core		core
skills/web-search		skills/web-search
web		web
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.json		config.json
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARIA — Autonomous Reflective Intelligence Agent

Architecture

Two Thinking Threads

Thread 1: Human Communication

Thread 2: Reflection & Self-Improvement

LLM Communication (No Timeouts)

Skill System: Improve Before Create

System Exploration

Pending Tasks System

Skill Error Tracking

Chain of Thought (JSON-Structured)

RL Memory Network

Compressed Memory (LLM-Powered)

Logging

Skill System (OpenClaw Format)

Creating & Improving Skills

Language Support

Quick Start (Docker) RECOMMENDED

Quick Start (Local)

Configuration

Environment Variables

config.json

Commands

WebUI Features

Inspiration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARIA — Autonomous Reflective Intelligence Agent

Architecture

Two Thinking Threads

Thread 1: Human Communication

Thread 2: Reflection & Self-Improvement

LLM Communication (No Timeouts)

Skill System: Improve Before Create

System Exploration

Pending Tasks System

Skill Error Tracking

Chain of Thought (JSON-Structured)

RL Memory Network

Compressed Memory (LLM-Powered)

Logging

Skill System (OpenClaw Format)

Creating & Improving Skills

Language Support

Quick Start (Docker) RECOMMENDED

Quick Start (Local)

Configuration

Environment Variables

config.json

Commands

WebUI Features

Inspiration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages