Accelerated Knowledge Discovery Core (akd-core)

A human-centric multi-agent system (MAS) framework for scientific discovery — providing base classes, streaming, human-in-the-loop (HITL), guardrails, and out-of-box agents and tools.

Ecosystem

akd-core is the foundation layer that drives the entire AKD ecosystem:

akd-core (this repo) — base classes, streaming infrastructure, HITL, guardrails, and out-of-box agents/tools for scientific discovery
akd-framework — the AKD backend application, built on akd-core
akd-ext — community extensions that use akd-core's base agents and tools to build domain-specific capabilities

akd-core is standalone and pip-installable. Everything downstream inherits its streaming, HITL, and guardrail infrastructure.

Core Philosophy

Human-in-the-loop control — researchers direct the discovery process; AI augments, never replaces
Scientific integrity — deep attribution, evidence validation, and rigorous guardrails
Transparent and reproducible — every workflow is a shareable, inspectable artifact
Open collaboration — community-driven framework for shared scientific advancement

See Design Philosophy for the full set of principles and golden rules.

What You Get

Async-first — all agents and tools implement async def _arun() with full astream() support
Streaming-native — 11 typed event types covering tokens, reasoning, tool calls, and HITL
Type-safe — Pydantic v2 schemas with required docstrings for all inputs and outputs
Composable — tools combine via Composite patterns (search, resolvers, guardrails)
HITL built-in — pause, save state, get human input, resume seamlessly
Guardrails — pluggable safety layer with decorator API
LLM-agnostic — works with any provider via LiteLLM (OpenAI, Anthropic, Ollama, etc.)

from akd.agents import BaseAgent

agent = BaseAgent(config={"model_name": "gpt-4o-mini"})
async for event in agent.astream(input_params):
    match event.event_type:
        case "streaming": print(event.token, end="")
        case "tool_calling": print(f"Calling {event.tool_name}...")
        case "human_input_required": response = input(event.human_prompt)
        case "completed": result = event.output

Streaming

Everything in akd-core is a stream of typed events. Agents emit StreamEvent objects as they execute:

Event	Description
`STARTING`	Agent/tool begins execution
`RUNNING`	Progress update
`STREAMING`	Raw LLM tokens as they arrive
`THINKING`	Reasoning tokens (Claude extended thinking, o1/o3)
`PARTIAL`	Partial structured output as it streams
`TOOL_CALLING`	Agent invokes a tool
`TOOL_RESULT`	Tool returns its result
`HUMAN_INPUT_REQUIRED`	Agent needs human input — execution pauses
`HUMAN_RESPONSE`	Resumed with human input
`COMPLETED`	Execution finished successfully
`FAILED`	Execution failed with error details

Each event carries typed data (e.g., CompletedEventData[T] includes the output, FailedEventData includes the error) and a run_context for execution state.

Human-in-the-Loop

HITL is a first-class concept, not an afterthought. The HumanTool enables any agent to pause execution, request human input, and resume:

Agent calls HumanTool during its tool loop
Framework emits HUMAN_INPUT_REQUIRED event with the question and full message history
Caller saves state and collects human response
Resume with RunContext(messages=saved_history, human_response=HumanResponse(...))
Agent continues exactly where it left off

This works across any transport — REST APIs, WebSockets, CLI — because the pause/resume is state-based, not connection-based.

Out-of-Box Agents

Category	Agent	Description
Utility	`RelevancyAgent`	Binary relevance classification
	`MultiRubricRelevancyAgent`	Multi-dimensional relevance scoring
Base	`BaseAgent`	Core agent with streaming, tool calling, HITL, message trimming
	`LiteLLMInstructorBaseAgent`	Structured Pydantic output via Instructor

Domain-specific agents live in downstream packages and can be registered at runtime via AgentRegistry.register_agent(YourAgent).

Out-of-Box Tools

Category	Tool	Description
Search	`SearxNGSearchTool`	Web search via SearxNG
	`SerperSearchTool`	Web search via Serper API
	`SemanticScholarSearchTool`	Academic paper search
	`CompositeSearchTool`	Multi-source search (combines backends)
	`SearchPipeline`	Full pipeline: search + resolve + scrape
Scraping	`WebScraper`	Web content extraction
	`PDFScraper`	PDF content extraction
	`DoclingScraper`	Advanced document parsing (tables, structure)
Resolvers	`CrossRefDoiResolver`	DOI resolution via CrossRef
	`ArxivResolver`	arXiv paper lookup
	`ADSResolver`	NASA ADS paper lookup
	`UnpaywallResolver`	Open access paper lookup
	`CompositeResolver`	Chain multiple resolvers
Evaluation	`RelevancyTool`	Content relevance scoring
	`RerankerTool`	Result reranking
	`SourceValidator`	Source credibility assessment
Special	`HumanTool`	Human-in-the-loop interaction
	`OutputTool`	Structured output capture

Guardrails

akd-core includes a pluggable guardrail system with a unified GuardrailProtocol interface:

Providers:

GraniteGuardianTool — IBM Granite Guardian model (local or cloud)
RiskAgent — LLM-based risk assessment with configurable criteria
CompositeGuardrail — chain multiple providers (AND, OR, CONSENSUS modes)

Risk categories: Granite built-in categories, Atlas dynamic taxonomy, and science-specific risks (misinformation, bias, attribution).

from akd.guardrails import guardrail
from akd.guardrails.providers import GraniteGuardianTool

@guardrail(input_guardrail=GraniteGuardianTool(), fail_on_input_risk=True)
class SafeAgent(BaseAgent):
    ...

Workflow Planner

The planner converts natural language research goals into executable multi-agent workflows:

from akd.planner.llm_planner import create_planner

planner = await create_planner()
session = await planner.plan_workflow("Find papers on AlphaFold and identify research gaps")
response = await session.start()

while not response.ready_to_generate:
    user_input = input(f"{response.message}\nYour response: ")
    response = await session.respond(user_input)

workflow = await session.generate_workflow()

The planner uses an AgentRegistry with auto-discovery, field mapping between agent inputs/outputs, and generates executable WorkflowFormat definitions.

Extending akd-core

akd-core is designed to be extended. Every agent and tool follows a consistent 4-part pattern:

InputSchema — Pydantic model defining what goes in (requires docstring)
OutputSchema — Pydantic model defining what comes out (requires docstring)
Config — BaseAgentConfig or BaseToolConfig with settings
Implementation — subclass BaseAgent[In, Out] or BaseTool[In, Out], implement _arun()

from akd._base import InputSchema, OutputSchema
from akd.agents._base import AKDAgent, BaseAgentConfig

class MyInput(InputSchema):
    """What goes in."""
    query: str

class MyOutput(OutputSchema):
    """What comes out."""
    answer: str

class MyAgent(AKDAgent[MyInput, MyOutput]):
    input_schema = MyInput
    output_schema = MyOutput

    async def _arun(self, params: MyInput, run_context=None, **kwargs) -> MyOutput:
        ...  # your logic here

AKDAgent is the default batteries-included agent — it comes with LiteLLM + Instructor, ReAct tool calling, HITL, streaming, and output routing. Your agent inherits all of it. See CONTRIBUTING.md for the full guide with tool examples, guardrail integration, and planner registration.

This is exactly how akd-ext builds on akd-core — importing base classes and creating domain-specific agents and tools.

Quick Start

Prerequisites

Python 3.12+
uv package manager

Installation

As a dependency (for akd-ext, akd-framework, or your own project):

uv pip install "akd @ git+https://github.com/NASA-IMPACT/accelerated-discovery.git@develop"

Or add to your pyproject.toml:

dependencies = [
    "akd @ git+https://github.com/NASA-IMPACT/accelerated-discovery.git@develop",
]

Optional extras: pull in extra dependencies for specific features.

Extra	What it pulls in	Install when you...
`serializer`	`langgraph`	use `AKDSerializer` as a langgraph checkpoint serde (e.g. `AsyncPostgresSaver(serde=AKDSerializer())`)
`ml`	`pandas`, `sentence-transformers`, `docling`, `deepeval`	need ML-backed rerankers, scrapers, or eval tools
`dev`	`pytest`, `pytest-asyncio`, `pytest-cov`, `pytest-xdist`, `pre-commit`, `memray`, `scalene`	run the test suite or hack on akd itself
`local`	`marimo`, `jupyter`, `ipykernel`, `ipywidgets`	run the marimo notebooks under `notebooks/`

# As a dependency, with an extra:
uv pip install "akd[serializer] @ git+https://github.com/NASA-IMPACT/accelerated-discovery.git@develop"

# In your pyproject.toml:
dependencies = [
    "akd[serializer] @ git+https://github.com/NASA-IMPACT/accelerated-discovery.git@develop",
]

For local development:

# Create and activate virtual environment
uv venv --python 3.12
source .venv/bin/activate

# Install core dependencies
uv sync

# With development tooling (pytest, pre-commit, profilers)
uv sync --extra dev

# With notebooks (marimo, jupyter)
uv sync --extra dev --extra local

# With ML extras (pandas, sentence-transformers, docling, deepeval)
uv sync --extra ml

# With the langgraph checkpoint serde (AKDSerializer)
uv sync --extra serializer

# Combine extras freely, e.g. full dev setup:
uv sync --extra dev --extra local --extra ml --extra serializer

# Setup environment variables
cp .env.example .env
# Edit .env with your API keys

Usage

See the notebooks directory for examples.

Project Structure

akd/
  _base/         # AbstractBase, schemas, streaming, HITL, tool calling, sessions
  agents/        # Out-of-box agents (search, analysis, utility)
  tools/         # Out-of-box tools (search, scraping, resolvers, evaluation)
  guardrails/    # GuardrailProtocol, providers, risk categories, decorators
  planner/       # LLM planner, agent registry, workflow builder
  configs/       # Project configuration and prompts

docs/            # Design philosophy and specs
notebooks/       # Usage examples (Jupyter, Marimo)
scripts/         # Utility scripts and demos
tests/           # Test suite (mirrors akd/ structure)

Contributing

See CONTRIBUTING.md for setup, style guide, branch conventions, and how to create agents and tools.

License

Apache License 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1,277 Commits
.github		.github
akd		akd
config		config
docs		docs
images		images
notebooks		notebooks
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
INTEGRATION_TESTING.md		INTEGRATION_TESTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerated Knowledge Discovery Core (akd-core)

Ecosystem

Core Philosophy

What You Get

Streaming

Human-in-the-Loop

Out-of-Box Agents

Out-of-Box Tools

Guardrails

Workflow Planner

Extending akd-core

Quick Start

Prerequisites

Installation

Usage

Project Structure

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Accelerated Knowledge Discovery Core (akd-core)

Ecosystem

Core Philosophy

What You Get

Streaming

Human-in-the-Loop

Out-of-Box Agents

Out-of-Box Tools

Guardrails

Workflow Planner

Extending akd-core

Quick Start

Prerequisites

Installation

Usage

Project Structure

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages