Karasu

Beta — This project is under active development. Expect breaking changes between releases.

A threat intelligence automation platform that ingests URLs, extracts structured intelligence using an LLM, allows analyst review, and publishes the result as a MISP event.

Overview

Analysts submit a URL (threat report, blog post, PDF). The platform fetches the content, runs it through an LLM to extract structured threat intelligence, and presents the result in an editor for review. Once satisfied, the analyst pushes the event directly to a MISP instance.

Pipeline

URL submitted → Fetch content → LLM extraction → Analyst review → Push to MISP

Extracted intelligence

Category	Detail
Summary	Short description of the threat
Threat actors	Named groups or individuals
Target sectors	Industries or sectors targeted
Target countries	Countries targeted
IoCs	IPs, domains, URLs, MD5 / SHA1 / SHA256 hashes (with `to_ids` flag)
TTPs	MITRE ATT&CK technique ID, name, and context
Detection rules	Engine (Sigma, KQL, …) and query
Threat hunting hypotheses	Title, hypothesis, approach, and visibility

MISP event structure

Intelligence	MISP representation
IoCs	Typed attributes (`ip-dst`, `domain`, `url`, `md5`, `sha1`, `sha256`)
Threat actors	`threat-actor` attributes
Target sectors	`target-org` attributes
Target countries	`target-location` attributes
TTPs	Galaxy tags (`misp-galaxy:mitre-attack-pattern`) + `attack-pattern` objects
Detection rules	`text` attributes with engine comment
Threat hunting hypotheses	Event reports (markdown, tagged `Threat Hunting Hypothesis`)

Architecture

┌─────────────┐     ┌─────────────┐     ┌──────────────┐
│   Frontend  │────▶│   Backend   │────▶│  PostgreSQL  │
│  React/Vite │     │   FastAPI   │     └──────────────┘
│   (nginx)   │     │  (Uvicorn)  │     ┌──────────────┐
└─────────────┘     └──────┬──────┘────▶│    Redis     │
                           │            └──────────────┘
                    ┌──────▼──────────────────────────────┐
                    │           Celery Workers             │
                    │  ┌─────────┐ ┌─────────┐ ┌──────┐  │
                    │  │  fetch  │ │ extract │ │ misp │  │
                    │  └────┬────┘ └────┬────┘ └──┬───┘  │
                    └───────┼───────────┼──────────┼──────┘
                            │           │          │
                    fetch page      LLM provider  MISP
                    content         (pluggable)   galaxy
                                    │             resolution
                                    ▼             │
                              ┌──────────┐        ▼
                              │ Azure AI │   ┌─────────┐
                              │ Foundry  │   │  MISP   │
                              └──────────┘   └─────────┘

Processing pipeline

URL submitted → [fetch] → [extract] → Analyst review → [misp] → MISP event published

Services

Service	Role
`frontend`	React SPA served by nginx, proxies `/api` to the backend
`backend`	FastAPI REST API, JWT auth, business logic
`celery`	Background workers consuming the fetch, extract, and misp queues
`postgres`	Primary data store for URLs, raw content, and extracted intelligence
`redis`	Celery broker and result backend

Celery queues

Queue	Task	Description
`fetch`	`fetch_url_task`	Downloads URL content (HTML, PDF) and stores raw text
`extract`	`extract_llm_task`	Runs LLM extraction; falls back to split extraction on token limit
`misp`	`push_to_misp_task`	Resolves ATT&CK galaxy tags and publishes the event to MISP

LLM abstraction

The LLM provider is pluggable via LLM_PROVIDER in the environment. The active provider is selected at runtime by the factory in backend/app/services/llm/factory.py. See Adding a custom LLM provider for implementation details.

Getting started

Prerequisites

Docker and Docker Compose
A running MISP instance with the MITRE ATT&CK galaxy imported
An LLM provider (default: Azure AI Foundry with Mistral Small 2503)

1. Configure environment

Copy the example and fill in the values:

cp backend/.env.example backend/.env

All required values are listed in .env.example with generation instructions where applicable. Key values:

Variable	Description	How to generate
`POSTGRES_PASSWORD`	Database password	Choose a strong password
`SECRET_KEY`	Application secret	`python -c "import secrets; print(secrets.token_hex(32))"`
`JWT_SECRET_KEY`	JWT signing key	`python -c "import secrets; print(secrets.token_hex(32))"`
`MISP_TOKEN_ENCRYPTION_KEY`	Fernet key for encrypting MISP tokens at rest	`python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"`
`AZURE_API_KEY`	Azure AI Foundry API key	Azure portal
`AZURE_INFERENCE_ENDPOINT`	Azure AI Foundry endpoint URL	Azure portal
`MISP_URL`	Base URL of your MISP instance	e.g. `https://misp.example.com`
`CORS_ORIGINS`	Comma-separated allowed origins	e.g. `https://your-host`

2. Build and run

docker compose up --build -d

The frontend is available at https://your-host (port 443). HTTP traffic on port 80 is redirected to HTTPS automatically. The self-signed TLS certificate will trigger a browser warning on first visit — add a browser exception to proceed.

3. Create the first admin user

Once the containers are running, create the initial admin account:

docker compose exec backend python -m app.scripts.create_admin <username> <password>

Additional users can be created and managed through the User Management page in the UI.

4. Set your MISP API token

Each user must configure their personal MISP API token before they can push events:

Log in and click your username in the top bar
Enter your MISP API token and click Save token

Tokens are encrypted at rest and never exposed in API responses. Events pushed to MISP are attributed to the token owner, preserving audit trail integrity.

Development (without Docker)

Backend

cd backend
python -m venv .venv && source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .env   # fill in values
uvicorn app.main:app --reload

Start the Celery worker separately:

celery -A app.workers.celery_app worker --loglevel=info -Q fetch,extract,misp

Frontend

cd frontend
npm install
npm run dev

Adding a custom LLM provider

Karasu uses an abstraction layer that makes it straightforward to swap in a different LLM provider without touching the rest of the application.

1. Implement the base class

Create a new file in backend/app/services/llm/ and implement BaseLLMService:

from app.services.llm.base import BaseLLMService
from app.services.llm.schemas import LLMRequest, LLMResponse, LLMTokenLimitExceeded

class MyLLMService(BaseLLMService):

    async def extract(self, request: LLMRequest) -> LLMResponse:
        # Make a single request to your LLM using MISP_EXTRACTION_PROMPT
        # Raise LLMTokenLimitExceeded if the model hits its output limit
        # Return an LLMResponse with extracted_data, token counts, and model name
        ...

    async def extract_split(self, request: LLMRequest) -> LLMResponse:
        # Make two parallel requests using MISP_IOC_TTP_PROMPT and MISP_ANALYSIS_PROMPT
        # Merge the results and return a single LLMResponse
        ...

The prompts are defined in backend/app/services/llm/prompts.py. The extracted JSON must conform to the schema the rest of the pipeline expects — refer to the existing AzureFoundryLLMService implementation as a reference.

LLMTokenLimitExceeded must be raised (not caught silently) when the model's output is truncated — this is what triggers the split extraction fallback in the worker.

2. Register the provider in the factory

In backend/app/services/llm/factory.py, add your provider:

from app.services.llm.my_provider import MyLLMService

def get_llm_client() -> BaseLLMService:
    if settings.LLM_PROVIDER == "azure_foundry":
        return AzureFoundryLLMService()
    if settings.LLM_PROVIDER == "my_provider":
        return MyLLMService()

    raise ValueError(f"Unsupported LLM provider: {settings.LLM_PROVIDER}")

3. Set the provider in your environment

LLM_PROVIDER=my_provider

Design decisions

Human review before MISP publication

LLM extraction is not treated as ground truth. After extraction completes, the result is presented to the analyst in an editor where every field — IoCs, TTPs, detection rules, threat hunting hypotheses — can be inspected, corrected, or removed before anything is sent to MISP. The push to MISP is always a deliberate, manual action. This keeps a human in the loop and prevents LLM hallucinations or misclassifications from polluting the threat intelligence platform automatically.

Per-user MISP API tokens

Each analyst authenticates to MISP using their own personal API token rather than a shared service account. This preserves attribution in MISP's audit log — events pushed by different analysts are recorded under their respective accounts. Tokens are stored encrypted at rest using Fernet symmetric encryption and are never exposed in API responses.

Split extraction for large documents

LLM models have finite output limits. Long or verbose documents can cause the model to truncate its response mid-JSON, producing unusable output. To handle this, Karasu uses a two-request fallback strategy: the first attempt extracts everything in a single request capped at 10,000 output tokens. If the model hits this limit, the task automatically splits the work into two parallel requests — one for IoCs and TTPs, one for detection rules and threat hunting hypotheses — each with their own 10,000 token cap. Each extraction attempt can therefore consume up to 30,000 output tokens. Combined with up to 3 retry attempts for transient failures, a single document can consume up to 120,000 output tokens in the worst case. This avoids silently incomplete extractions without requiring the analyst to resubmit.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Karasu

Table of Contents

Overview

Pipeline

Extracted intelligence

MISP event structure

Architecture

Getting started

Prerequisites

1. Configure environment

2. Build and run

3. Create the first admin user

4. Set your MISP API token

Development (without Docker)

Backend

Frontend

Adding a custom LLM provider

1. Implement the base class

2. Register the provider in the factory

3. Set the provider in your environment

Design decisions

Human review before MISP publication

Per-user MISP API tokens

Split extraction for large documents

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Karasu

Table of Contents

Overview

Pipeline

Extracted intelligence

MISP event structure

Architecture

Getting started

Prerequisites

1. Configure environment

2. Build and run

3. Create the first admin user

4. Set your MISP API token

Development (without Docker)

Backend

Frontend

Adding a custom LLM provider

1. Implement the base class

2. Register the provider in the factory

3. Set the provider in your environment

Design decisions

Human review before MISP publication

Per-user MISP API tokens

Split extraction for large documents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages