Data export tool for ByteFreezer parquet files. Query with DuckDB, export to Elasticsearch, Splunk, webhooks, or any custom destination.
ByteFreezer stores all ingested data as Parquet files in S3/MinIO. The Connector reads those files using DuckDB and exports filtered subsets to external systems. Instead of sending everything to your SIEM, export only the 5% you need for active investigation.
packer --> parquet (S3/MinIO) --> [CONNECTOR] --> Elasticsearch / Splunk / webhook
Web UI: http://localhost:8090 — explore datasets, write queries, preview results, configure export destinations.
This is not a black-box product — it's a working codebase you own and modify. The connector ships with three destinations (stdout, Elasticsearch, webhook) and a simple Destination interface. Need Splunk HEC, Snowflake, Kafka, or a custom internal API? Point Claude Code at this repo with the ByteFreezer MCP connected, and it has everything it needs: the destination interface pattern, MCP tools to discover your datasets and schema, and CLAUDE.md with step-by-step instructions.
ByteFreezer includes two tools for querying data. Use both or either:
| Query Service (port 8000) | Connector (port 8090) | |
|---|---|---|
| Purpose | Interactive analysis | Data export to external systems |
| AI/NL queries | Yes (Anthropic, OpenAI, Ollama) | No |
| Export to SIEM | No | Yes (Elasticsearch, Splunk, webhook) |
| Batch/watch modes | No | Yes — scheduled, cursor-tracked export |
| Best for | Ad-hoc investigation | Continuous SIEM feed, alerting pipelines |
| Mode | Command | Description |
|---|---|---|
| interactive | --mode interactive (default) |
Web UI at :8090 for exploring datasets and testing queries |
| batch | --mode batch |
Run configured query once, send to destination, exit |
| watch | --mode watch |
Poll for new data on a timer, continuously export |
The connector is included in the ByteFreezer on-prem Docker Compose stack. After deploying the full stack, the connector web UI is available at:
http://<your-host>:8090
To run a one-shot query from the command line:
docker exec -w /app bytefreezer-connector ./bytefreezer-connector --mode batchgo build -o bytefreezer-connector .docker pull ghcr.io/bytefreezer/bytefreezer-connector:latest
docker run -p 8090:8090 -v ./config.yaml:/app/config.yaml:ro ghcr.io/bytefreezer/bytefreezer-connector:latestEdit config.yaml with your control API credentials:
control:
url: "https://api.bytefreezer.com"
api_key: "your-service-key"
account_id: "your-account-id"For batch/watch modes, also set:
query:
tenant_id: "your-tenant-id"
dataset_id: "your-dataset-id"
sql: >
SELECT timestamp, source_ip, message
FROM read_parquet('PARQUET_PATH', hive_partitioning=true, union_by_name=true)
WHERE severity >= 4
destination:
type: elasticsearch
config:
url: "http://localhost:9200"
index: "security-alerts"# Interactive mode — open http://localhost:8090
./bytefreezer-connector --config config.yaml
# Batch export to stdout
./bytefreezer-connector --config config.yaml --mode batch
# Continuous watch mode
./bytefreezer-connector --config config.yaml --mode watch
# Re-export from beginning (reset cursor)
./bytefreezer-connector --config config.yaml --mode batch --reset-cursorUse PARQUET_PATH as placeholder. The connector replaces it with the S3 glob path for your dataset.
-- All records
SELECT * FROM read_parquet('PARQUET_PATH', hive_partitioning=true, union_by_name=true)
LIMIT 100
-- Filter by time partition
SELECT * FROM read_parquet('PARQUET_PATH', hive_partitioning=true, union_by_name=true)
WHERE year = 2026 AND month = 3
-- Specific fields only
SELECT timestamp, source_ip, message
FROM read_parquet('PARQUET_PATH', hive_partitioning=true, union_by_name=true)
WHERE severity >= 4| Destination | Config Key | Description |
|---|---|---|
stdout |
— | JSON lines to stdout |
elasticsearch |
url, index, username, password |
Elasticsearch bulk API |
webhook |
url, method, headers |
HTTP POST to any endpoint |
Create destinations/your_dest.go:
package destinations
import (
"context"
"github.com/bytefreezer/connector/connector"
)
func init() {
connector.RegisterDestination("your_dest", func() connector.Destination {
return &YourDest{}
})
}
type YourDest struct{}
func (d *YourDest) Name() string { return "your_dest" }
func (d *YourDest) Init(config map[string]interface{}) error { return nil }
func (d *YourDest) Send(ctx context.Context, batch connector.Batch) error { return nil }
func (d *YourDest) Close() error { return nil }The init() function auto-registers the destination. Set destination.type: "your_dest" in config.
├── main.go # Entry point, HTTP routes, mode switching
├── ui.go # Embedded interactive web UI
├── config/config.go # Config struct + koanf loader
├── connector/
│ ├── connector.go # DuckDB engine, S3 config, query execution
│ ├── control_client.go # Control API client (S3 creds, health reporting)
│ ├── cursor.go # Cursor persistence (JSON file)
│ └── destination.go # Destination interface + registry
├── destinations/
│ ├── stdout.go # JSON lines to stdout
│ ├── elasticsearch.go # Elasticsearch bulk API
│ └── webhook.go # Generic HTTP POST
├── config.yaml # Example configuration
├── Dockerfile # Docker image (debian:bookworm-slim)
└── CLAUDE.md # Claude Code instructions
In watch and interactive modes, the connector registers with the ByteFreezer control plane as bytefreezer-connector and reports health every 30 seconds. It appears on the Health page in the UI alongside proxy, receiver, piper, packer, and query.
- Connector docs — full documentation
- CLAUDE.md — instructions for Claude Code + MCP tools reference