Slack-driven job orchestration for a Windows workstation. Messages and /queue commands in a designated Slack channel are routed to specialized workers via BullMQ priority queues backed by Redis.
Slack Channel (#claude-code-cli)
|
Controller (Socket Mode - WebSocket, no ngrok)
|--- Router (keyword regex + LM Studio LLM fallback)
|--- Queue Manager (BullMQ / Redis)
|
Workers (NSSM Windows services)
|--- local-worker (LM Studio chat completions)
|--- code-worker (Claude Code CLI headless)
|--- research-worker (stub - future implementation)
|
Optional: Qdrant (job summary vector memory)
Key features: priority queuing (P1/P2/P3), interactive Slack controls (buttons for cancel/retry/promote/pause/resume), event-level idempotency via Redis SET NX, auto-restart via NSSM, graceful degradation when optional services are down.
| Prerequisite | Version | Purpose | Install |
|---|---|---|---|
| Node.js | >= 20 | Runtime for controller and workers | nodejs.org |
| Docker Desktop | latest | Redis (required), Qdrant (optional) | docker.com |
| Chocolatey | latest | Package manager for NSSM | chocolatey.org/install |
| NSSM | latest | Windows service management | choco install nssm -y |
| LM Studio | latest | Local LLM inference + embeddings | lmstudio.ai |
| Claude Code CLI | latest | Headless code execution for code-worker | npm install -g @anthropic-ai/claude-code |
LM Studio requirements:
- A chat model must be loaded for local-worker job processing and LLM routing fallback.
- An embedding model (e.g.,
nomic-embed-text-v1.5) must be loaded for Qdrant vector memory. This is optional -- the system works without it.
- Go to api.slack.com/apps and click Create New App > From scratch.
- Name it whatever you like (e.g., "AgentOps") and select your workspace.
Navigate to OAuth & Permissions and add these Bot Token Scopes:
| Scope | Purpose |
|---|---|
chat:write |
Post messages and job receipts |
channels:history |
Read channel messages for message-based commands |
channels:read |
List channels |
commands |
Register and handle /queue slash command |
Navigate to Event Subscriptions:
- Toggle Enable Events to On.
- Under Subscribe to bot events, add:
message.channels - Save changes.
Request URL is not needed for Socket Mode. Leave it blank or enter any placeholder.
Navigate to Slash Commands and click Create New Command:
| Field | Value |
|---|---|
| Command | /queue |
| Request URL | Not needed for Socket Mode (enter any placeholder like https://localhost) |
| Short Description | Manage the job queue |
| Usage Hint | `status |
Navigate to Interactivity & Shortcuts:
- Toggle Interactivity to On.
- Request URL: not needed for Socket Mode (enter any placeholder like
https://localhost). - Save changes.
Navigate to Settings > Socket Mode:
- Toggle Enable Socket Mode to On.
- When prompted, generate an app-level token with the
connections:writescope. - Copy this token -- it starts with
xapp-and becomes yourSLACK_APP_TOKEN.
Navigate to OAuth & Permissions and click Install to Workspace:
- Authorize the app.
- Copy the Bot User OAuth Token (
xoxb-...) -- this is yourSLACK_BOT_TOKEN. - Navigate to Basic Information and copy the Signing Secret -- this is your
SLACK_SIGNING_SECRET.
In Slack, go to your target channel (e.g., #claude-code-cli) and run:
/invite @YourBotName
Right-click the channel name > View channel details > scroll to the bottom and copy the Channel ID (starts with C).
Copy .env.example to .env and fill in real values:
cp .env.example .env| Variable | Required | Default | Description |
|---|---|---|---|
SLACK_BOT_TOKEN |
Yes | - | Bot User OAuth Token from step 7 (xoxb-...) |
SLACK_SIGNING_SECRET |
Yes | - | Signing Secret from Basic Information page |
SLACK_CHANNEL_ID |
Yes | - | Target channel ID from step 9 (e.g., C0ABKSQ3TKQ) |
SLACK_APP_TOKEN |
Yes | - | App-level token from step 6 (xapp-...) |
REDIS_HOST |
No | localhost |
Docker Redis host |
REDIS_PORT |
No | 6379 |
Docker Redis port |
PORT |
No | 3000 |
Controller HTTP port (unused in Socket Mode, kept for config schema) |
LM_STUDIO_URL |
No | http://localhost:1234/v1 |
LM Studio OpenAI-compatible API endpoint |
QDRANT_URL |
No | http://localhost:6333 |
Qdrant REST API endpoint (optional) |
NODE_ENV |
No | development |
development, production, or test |
Example .env:
SLACK_BOT_TOKEN=xoxb-YOUR-BOT-TOKEN-HERE
SLACK_SIGNING_SECRET=your-signing-secret-here
SLACK_CHANNEL_ID=C0ABKSQ3TKQ
SLACK_APP_TOKEN=xapp-YOUR-APP-TOKEN-HERE
REDIS_HOST=localhost
REDIS_PORT=6379
PORT=3000
LM_STUDIO_URL=http://localhost:1234/v1
QDRANT_URL=http://localhost:6333
NODE_ENV=developmentAll environment variables are validated at startup via Zod schema (shared/src/types/config.ts). The controller will exit with a clear error message if required values are missing.
# Run as Administrator (right-click PowerShell > Run as Administrator)
cd C:\Users\jcwoo\.agentops
# Run bootstrap (checks prereqs, installs NSSM, registers services, starts everything)
.\scripts\bootstrap.ps1What bootstrap does (in order):
- Verifies administrator privileges
- Checks Docker Desktop is running
- Checks Chocolatey is installed
- Checks Node.js is installed
- Verifies TypeScript is compiled (all
dist/directories exist) - Verifies
.envfile exists - Installs NSSM via Chocolatey (if not present)
- Creates
logs/andlegacy/directories - Runs
install-services.ps1to register 4 NSSM services - Runs
start-all.ps1to start Docker Redis + all services - Waits 5 seconds for stabilization
- Prints a status table showing each service's state
Before running bootstrap, ensure:
.envis populated with real Slack tokens (see Environment Configuration above)- TypeScript is compiled: run
npx tsc -bfrom the project root - Docker Desktop is running
- LM Studio is running with a chat model loaded (for worker processing)
Service account: During install-services.ps1, you will be prompted for your Windows password. Services run under .\jcwoo to access user-level environment variables and PATH.
All scripts require Administrator privileges.
.\scripts\start-all.ps1Startup sequence:
- Checks Docker Desktop is accessible
- Starts Redis via
docker compose up -d - Waits for Redis health check (PING/PONG, up to 30 seconds)
- Starts all 4 NSSM services
- Verifies all services are running
.\scripts\stop-all.ps1Shutdown sequence (ordered for safety):
- Stops workers first (local-worker, code-worker, research-worker)
- Stops controller last (so it doesn't enqueue to dead workers)
- Each service gets 15 seconds for graceful shutdown
- Stops Redis via
docker compose down - Prints shutdown summary
# Service names
# agentops-controller
# agentops-local-worker
# agentops-code-worker
# agentops-research-worker
nssm start agentops-controller
nssm stop agentops-controller
nssm restart agentops-controller
nssm status agentops-controller| Script | Purpose |
|---|---|
scripts\bootstrap.ps1 |
First-time setup (prereqs + install + start + verify) |
scripts\start-all.ps1 |
Start Docker Redis + all 4 NSSM services |
scripts\stop-all.ps1 |
Stop all services + Docker Redis (ordered shutdown) |
scripts\install-services.ps1 |
Register NSSM services (idempotent, does not start them) |
scripts\uninstall-services.ps1 |
Remove all NSSM services (preserves log files) |
scripts\legacy-disable.ps1 |
Disable old pm2-based system (archive scripts, clean state) |
| Command | Action |
|---|---|
/queue help |
Show formatted help text with all subcommands |
/queue status |
Show Block Kit status message with queue counts and worker health (no LLM calls) |
/queue add <text> |
Enqueue a job at default priority (P3) |
/queue addp1 <text> |
Enqueue a priority 1 job (highest priority) |
/queue addp2 <text> |
Enqueue a priority 2 job |
/queue addp3 <text> |
Enqueue a priority 3 job (lowest priority) |
/queue workers <n> |
Set global concurrency across all queues (1-6) |
/queue pause |
Pause all queues (in-progress jobs complete, no new claims) |
/queue resume |
Resume all queues |
/queue cancel <id> |
Cancel a job (removes if waiting, flags if active) |
/queue retry <id> |
Retry a failed job (re-enters queue) |
/queue promote <id> p1|p2|p3 |
Change a job's priority |
Type directly in the channel (no /queue prefix needed):
| Input | Behavior |
|---|---|
| Any text | Enqueued as a single job, routed by keyword classifier |
status or queue status |
Shows queue status (same as /queue status) |
queue: followed by newline-separated items |
Enqueues multiple jobs in a batch |
p1: <text> or p2: <text> or p3: <text> |
Priority tag prefix sets job priority |
Job receipt buttons (appear on each enqueued job):
- Cancel -- Remove the job from queue or flag for cancellation
- Retry -- Re-enqueue a failed job
- Promote -- Increase the job's priority
Status message buttons (appear on /queue status output):
- Pause / Resume -- Toggle queue processing
- Refresh -- Update the status counts in-place
- Set Workers -- Dropdown to change global concurrency (1-6)
- View Status -- Post a fresh status message
The router classifies each job to determine which queue it belongs to:
- Keyword classifier (fast): Regex patterns match against job text. Code-related keywords (repo, PR, bug, .ts, .py, GitHub URLs) route to
codequeue. Writing-related keywords (summarize, draft, compose) route tolocalqueue. Research keywords (research, investigate, compare) route toresearchqueue. - LLM fallback (when ambiguous): If keyword confidence is low (< 2 matches or tied), the router calls LM Studio for classification.
- Default: If no matches at all, defaults to
localqueue with low confidence.
Requires Docker Redis running.
cd C:\Users\jcwoo\.agentops
# Run all tests
npx vitest run
# Run with verbose output
npx vitest run --reporter=verboseTest files (in tests/acceptance/):
| File | What It Tests |
|---|---|
queue-processing.test.ts |
Multi-item enqueue, parallel processing, crash recovery |
idempotency.test.ts |
Pause/resume, duplicate event prevention via Redis SET NX |
controller-unit.test.ts |
Command parser, help text builder, status formatter |
Tests use dedicated test-* queue names to avoid interfering with production data. Queues are cleaned up via obliterate() in afterAll.
Qdrant-related tests are skipped automatically if Qdrant is not running (graceful skip, not failure).
Run through these after deployment, major changes, or first-time setup:
- Zero-token idle: Start system, wait 60 seconds, verify no LLM calls in logs. Check with
nssm status agentops-controllerthat the service is running. Verify no requests to LM Studio or Claude CLI unless a job is queued. -
/queue help: Run the command, verify formatted help text appears with all subcommands listed. -
/queue status: Run the command, verify Block Kit status message with per-queue counts and worker health. Confirm no LLM calls appear in controller logs (status is a pure Redis read). -
/queue add: Run/queue add summarize today's news, verify a job receipt with Cancel/Retry/Promote buttons appears. Check controller logs for routing decision (keyword or LLM). - Button - Cancel: Click Cancel on a queued job. Verify job is removed (thread reply confirms).
- Button - Retry: Wait for a job to fail (or manually fail one), then click Retry. Verify job re-enters queue.
- Button - Promote: Click Promote on a P3 job. Verify thread reply confirms priority change.
- Button - Pause/Resume: Click Pause on a status message. Verify all queues pause (status updates in-place). Click Resume. Verify queues resume.
- Button - Refresh: Click Refresh on a status message. Verify counts update in-place.
- Message command: Type a task directly in the channel (no
/queueprefix). Verify it's enqueued and receipt posted. - Duplicate events: Send a message, verify only 1 job is created. Check Redis for idempotency keys:
docker exec agentops-redis-1 redis-cli KEYS 'idemp:*' - Multi-item batch: Type
queue:followed by 3 items on separate lines. Verify 3 separate jobs are created with individual receipts.
NSSM captures stdout and stderr to individual log files:
logs\agentops-controller-stdout.log
logs\agentops-controller-stderr.log
logs\agentops-local-worker-stdout.log
logs\agentops-local-worker-stderr.log
logs\agentops-code-worker-stdout.log
logs\agentops-code-worker-stderr.log
logs\agentops-research-worker-stdout.log
logs\agentops-research-worker-stderr.log
View log configuration:
nssm get agentops-controller AppStdout
nssm get agentops-controller AppStderrLog rotation is configured at 10 MB per file.
# Check individual service
nssm status agentops-controller
nssm status agentops-local-worker
nssm status agentops-code-worker
nssm status agentops-research-worker
# Check all agentops services via PowerShell
Get-Service agentops-*| Pattern | Meaning |
|---|---|
[controller] Bolt app running in Socket Mode |
Controller started successfully |
[controller] Listening on channel: C0... |
Correct channel configured |
[controller] Qdrant vector memory enabled |
Qdrant connected and operational |
[controller] Running without Qdrant (optional) |
Qdrant unavailable -- system works fine |
[local-worker] Started on queue "local" with concurrency 2 |
Local worker started |
[code-worker] Started on queue "code" with concurrency 1 |
Code worker started |
[router] Keyword match: CODE, patterns: repo, .ts |
Job routed via keyword classifier |
[router] Low confidence, using LLM fallback -> local |
Job routed via LM Studio LLM |
[qdrant] Stored summary for job X |
Summary saved to Qdrant on job completion |
[qdrant] Not available, running without vector memory |
Qdrant down at startup (non-fatal) |
[qdrant] Embedding failed: ... |
LM Studio embedding model not loaded or down |
[events] Duplicate event X, skipping |
Idempotency layer rejected a duplicate Slack event |
# Connection count
docker exec agentops-redis-1 redis-cli INFO clients
# Total key count
docker exec agentops-redis-1 redis-cli DBSIZE
# List queue keys
docker exec agentops-redis-1 redis-cli KEYS 'bull:*'
# List idempotency keys
docker exec agentops-redis-1 redis-cli KEYS 'idemp:*'
# Check Redis health
docker exec agentops-redis-1 redis-cli PINGIf Qdrant is running, the dashboard is at http://localhost:6333/dashboard.
The agentops-jobs collection stores job completion summaries with vector embeddings for similarity search.
# Check Docker Desktop is running
docker ps
# Start Redis manually
docker compose -f C:\Users\jcwoo\.agentops\docker-compose.yml up -d
# Verify Redis responds
docker exec agentops-redis-1 redis-cli PING
# Expected: PONG
# Check Docker logs
docker logs agentops-redis-1Common causes: Docker Desktop not running, port 6379 already in use by another Redis instance.
# Test LM Studio API
curl http://localhost:1234/v1/models
# Should return a JSON list of loaded modelsImpact when down:
- local-worker jobs fail (no LLM available for processing)
- LLM classifier falls back to keyword-only routing (functional but less accurate)
- Qdrant embeddings fail (summaries not stored, but jobs still complete)
Fix: Open LM Studio, ensure a chat model is loaded and the server is running on port 1234.
| Symptom | Likely Cause | Fix |
|---|---|---|
| Controller crashes on startup with Slack auth error | Invalid or expired tokens | Verify SLACK_BOT_TOKEN and SLACK_APP_TOKEN in .env match your Slack app |
xapp- token rejected |
Socket Mode not enabled | Enable Socket Mode in Slack app settings and regenerate the app-level token |
| Bot messages not appearing | Bot not in channel | Run /invite @YourBotName in the target channel |
/queue command returns "dispatch_failed" |
Slash command not configured | Verify /queue exists in Slack app's Slash Commands section |
| Events not firing | Wrong event subscription | Verify message.channels is subscribed under Event Subscriptions |
| Interactivity errors | Interactivity not enabled | Enable Interactivity in Slack app settings |
# Check service status
nssm status agentops-controller
# Check if service is installed
Get-Service agentops-controller
# Check stderr log for crash reason
type C:\Users\jcwoo\.agentops\logs\agentops-controller-stderr.log
# Try manual start for better error visibility
nssm start agentops-controller
# Check for port conflicts
netstat -an | findstr 6379
# Rebuild TypeScript and restart
npx tsc -b
nssm restart agentops-controllerCommon causes:
- TypeScript not compiled (missing
dist/files) -- runnpx tsc -b .envfile missing or invalid -- check Zod validation errors in stderr log- Redis not running -- start with
docker compose up -d - NSSM services not installed -- run
.\scripts\install-services.ps1
Qdrant is optional. The system operates normally without it -- no job processing is affected.
# Start Qdrant (if using the standalone container)
docker start qdrant
# Verify Qdrant health
curl http://localhost:6333/healthz
# Expected: HTTP 200
# Check Qdrant dashboard
# http://localhost:6333/dashboardEmbedding model not loaded: If you see [qdrant] Embedding failed in logs, load an embedding model in LM Studio (e.g., nomic-embed-text-v1.5). The embedding endpoint is at http://localhost:1234/v1/embeddings.
Dimension mismatch: If the collection was created with one embedding model and you switch to another with different dimensions, delete the collection via the Qdrant dashboard and restart the controller. It will recreate the collection with the correct dimensions.
# Check for stalled jobs
docker exec agentops-redis-1 redis-cli KEYS 'bull:*:stalled'Possible causes: Worker crashed while processing a job.
BullMQ auto-recovery: Stalled jobs are automatically reclaimed after the stalled interval (default 30 seconds). The job will be retried up to 3 times with exponential backoff (5s, 10s, 20s).
Manual recovery:
# Restart the specific worker
nssm restart agentops-local-worker
# Nuclear option: clear all queue data (WARNING: destroys all jobs)
docker exec agentops-redis-1 redis-cli FLUSHDB# Clean build
npx tsc -b --clean
npx tsc -b
# If npm dependencies are missing
npm install
cd controller && npm install && cd ..
cd workers\local-worker && npm install && cd ..\..
cd workers\code-worker && npm install && cd ..\..
cd workers\research-worker && npm install && cd ..\..
cd shared && npm install && cd ...agentops/
shared/ # Shared types and utilities
src/types/
job.ts # JobData, JobMetadata, JobPriority, JobStatus, QueueName
config.ts # Zod env schema (Config type)
queue.ts # QUEUE_NAMES constant, createRedisConnection()
slack.ts # JobReceipt, QueueStatusMessage types
controller/ # Slack app + queue manager + router + Qdrant
src/
server.ts # Entry point: Bolt app, Socket Mode, Qdrant init, shutdown
config.ts # Loads .env via dotenv + Zod validation
slack/
commands.ts # /queue slash command handler (12 subcommands)
events.ts # message.channels event handler (single, multi-item, status)
actions.ts # 8 interactive button/menu action handlers
middleware.ts # Channel filter (pure function)
queue/
manager.ts # BullMQ queue singletons, all queue operations
formatters.ts # Block Kit builders for status, receipts, help
router/
index.ts # routeJob(): keyword first, LLM fallback
keyword-classifier.ts # Regex pattern matching (CODE/LOCAL/RESEARCH)
lm-classifier.ts # LM Studio chat completion classifier
qdrant/
embeddings.ts # LM Studio /v1/embeddings via OpenAI client
client.ts # Qdrant init, store, search, close (graceful degradation)
index.ts # Barrel re-export
util/
command-parser.ts # Parse /queue subcommands and message text
idempotency.ts # Redis SET NX EX deduplication (24h TTL)
workers/
local-worker/ # LM Studio inference worker
src/
worker.ts # BullMQ Worker on "local" queue, concurrency 2
processor.ts # Job processing logic (LM Studio chat completion)
code-worker/ # Claude Code CLI worker
src/
worker.ts # BullMQ Worker on "code" queue, concurrency 1
processor.ts # Job processing logic (Claude Code headless CLI)
research-worker/ # Future research worker (stub)
src/
worker.ts # BullMQ Worker on "research" queue
processor.ts # Placeholder processor
scripts/ # PowerShell service management
bootstrap.ps1 # First-time setup (prereqs + install + start)
start-all.ps1 # Start Docker Redis + all NSSM services
stop-all.ps1 # Ordered shutdown (workers first, then controller, then Redis)
install-services.ps1 # Register 4 NSSM services (idempotent)
uninstall-services.ps1 # Remove NSSM services (preserves logs)
legacy-disable.ps1 # Disable old pm2-based system
tests/ # Vitest acceptance tests
acceptance/
queue-processing.test.ts # BullMQ queue behavior tests
idempotency.test.ts # Dedup and pause/resume tests
controller-unit.test.ts # Parser and formatter unit tests
docker-compose.yml # Redis 7 Alpine with AOF persistence
.env.example # Environment variable template
.env # Actual environment values (not in git)
vitest.config.ts # Vitest configuration
tsconfig.base.json # Shared TypeScript settings (skipLibCheck, Node16)
tsconfig.json # Root project references (composite build)
package.json # Root package with devDependencies (vitest, typescript)
- Slack message or
/queuecommand arrives via Socket Mode WebSocket connection. - Controller validates the channel (must match
SLACK_CHANNEL_ID) and deduplicates via RedisSET key NX EX 86400(24-hour TTL). Duplicate Slack event retries are silently dropped. - Router classifies the job text: keyword regex matching runs first (deterministic, fast). If confidence is low (< 2 pattern matches or tied), it falls back to LM Studio LLM classification.
- Job is added to the correct BullMQ queue (
code,local, orresearch) with the assigned priority (P1=1, P2=2, P3=3 -- lower number = higher priority in BullMQ). - Job receipt is posted to Slack via
client.chat.postMessagewith interactive Cancel/Retry/Promote buttons. Themessage_tsis captured and stored in job metadata for threaded replies. - Worker claims the job from its queue. Each worker type processes differently:
local-worker: Sends job text to LM Studio chat completion API, posts result as a thread reply.code-worker: Executes job text via Claude Code CLI in headless mode, posts result as a thread reply.research-worker: Stub -- returns a placeholder response.
- On completion: If Qdrant is available, a
QueueEventslistener in the controller stores the job summary as a vector embedding in theagentops-jobscollection. This enables future similarity search across completed jobs. - On failure: BullMQ retries up to 3 times with exponential backoff (5s base delay). After all retries exhausted, the job moves to the failed state. Old completed jobs are pruned at 100 per queue; failed jobs at 50.
| Decision | Rationale |
|---|---|
| Socket Mode (no ngrok/public URL) | Single-workspace internal tool. WebSocket eliminates external URL management. |
| Workers as separate NSSM services | Crash isolation. A worker crash doesn't take down the controller or other workers. NSSM auto-restarts on exit with 5s delay. |
| Qdrant is optional enrichment | Never blocks job processing. All Qdrant calls wrapped in try/catch. System is fully functional without it. |
| Idempotency via Redis SET NX EX | 24-hour TTL prevents duplicate processing from Slack event retries. Composite key for actions: message_ts + action_id + action_ts. |
| Keyword-first routing with LLM fallback | Deterministic classification is fast and predictable. LLM only called when keywords are ambiguous, avoiding unnecessary inference costs. |
| Dynamic ESM import for Qdrant client | @qdrant/js-client-rest is ESM-only. The project uses CJS (Node16 moduleResolution). Dynamic import() at runtime bridges the gap. |
| UUID point IDs in Qdrant | Qdrant string IDs must be valid UUIDs. BullMQ job IDs are numeric strings. BullMQ job ID stored in the payload for traceability. |
| Services run as user account (.\jcwoo) | Access to user-level PATH, environment variables, and npm global packages (Claude Code CLI). |