Skip to content

carderne/agent-sql

Repository files navigation

agent-sql

This exists to solve these problems:

  1. Agents prefer code to tools
  2. Security is important
  3. RLS is a pain to get right
  4. LLMs (and humans) are dumb

Rather than write hundreds of tools for your deployed agent, just give it one: executeQuery. Then let agent-sql sanitise those queries to stop it from doing anything dangerous (or dumb). Primarily targeting deployed agents, but you should get your coding harness to use it too.

Apparently this is how Trigger.dev does it. And Cloudflare.

You can try it out live (and try to hack it) here: sql.rdrn.me

How it works

agent-sql works by fully parsing the supplied SQL query into an AST and transforming it:

  • Only SELECT: it's impossible to insert, drop or anything else.
  • Reduced subset: CTEs, subqueries and other tricky things are rejected.
  • Limited functions: passed through a (configurable) whitelist.
  • No DoS: a default LIMIT is applied, but can be adjusted.
  • WHERE guards: insert multiple tenant/ownership conditions to be inserted.
  • JOINs added: if needed to reach the guard tenant tables (save on tokens).
  • No sneaky joins: no join secrets on true. We have your back.

Coming soon

  • Configurable support for INSERT, UPDATE, even DROP TABLE
  • Support CTEs, subqueries and more, once they can be fully hardened
  • Plugins/hooks for common harnesses

Quickstart

npm install agent-sql
import { agentSql } from "agent-sql";

const sql = agentSql("SELECT * FROM msg", { "msg.tenant_id": 123 });

console.log(sql);
// SELECT *
// FROM msg
// WHERE msg.tenant_id = 123
// LIMIT 10000

Usage

Define a schema

In the simple example above, all JOINs will be blocked. For agent-sql to know what joins and tables to permit, you need to define a schema. Heads up: if you use Drizzle, you can just use your Drizzle schema.

import { agentSql, defineSchema } from "agent-sql";

// Define your schema.
// Only the tables listed will be permitted
// Joins can only use the FKs defined here
const schema = defineSchema({
  tenant: { id: null },
  msg: { tenant_id: { ft: "tenant", fc: "id" } },
});

// Use your schema from above
// Specify 1+ column->value pairs that will be enforced
const result = agentSql("SELECT * FROM msg", { "tenant.id": 123 }, schema);

Output:

SELECT
  msg.*                        -- qualify the *
FROM msg
INNER JOIN tenant              -- add the needed join for the guard
  ON tenant.id = msg.tenant_id -- use the schema to join correctly
WHERE tenant.id = 123          -- apply the guard
LIMIT 10000                    -- limit the rows

Bad stuff is blocked

The following query will be blocked (many times over).

SELECT
    sneaky_func('./bad_file')      -- won't pass whitelist
FROM secret
JOIN random                        -- not an approved table
  ON random.id = secret.id         -- not an approved FK pair
JOIN danger                        -- disconnected from join graph
  ON true                          -- not allowed
WHERE true                         -- won't trick anyone

Make a closure

You'll probably want to do something like the below in production. Mix and match to your taste.

const sanitise = (sql: string) => agentSql(sql, guards, schema);

//later
sanitise("SELECT * FROM foo");

Integration with AI SDK and Drizzle

If you're using Drizzle, you can skip the schema step and use the one you already have!

Just pass it through, and agentSql will respect your schema.

import { tool } from "ai";
import { sql } from "drizzle-orm";
import { z } from "zod";

import { agentSql } from "agent-sql";
import { defineSchemaFromDrizzle } from "agent-sql/drizzle";

import { db } from "@/db";
import * as drizzleSchema from "@/db/schema";

// No need to re-enter your schema, we'll pull it in from Drizzle
const schema = defineSchemaFromDrizzle(drizzleSchema);

// Create your closure
const sanitise = (sql: string, tenantId: string) =>
  agentSql(sql, { "tenant.id": tenantId }, schema);

function makeSqlTool(tenantId: string) {
  return tool({
    description: "Run raw SQL against the DB",
    inputSchema: z.object({ query: z.string() }),
    execute: async ({ query }) => {
      // The LLM can pass any query it likes, we'll sanitise it if possible
      // and return helpful error messages if not
      const safeQuery = sanitise(query, tenantId);
      // Now we can throw that straight at the db and be confident it'll only
      // return data from the specified tenant
      return db.execute(sql.raw(safeQuery));
    },
  });
}

If you don't want your whole Drizzle schema available

You can also exclude tables if you don't want agents to see them:

import { defineSchemaFromDrizzle } from "agent-sql/drizzle";

const schema = defineSchemaFromDrizzle(drizzleSchema, {
  exclude: ["api_keys"],
});

Development

First install Vite+:

curl -fsSL https://vite.plus | bash

Install dependencies:

vp install

Format, lint, typecheck:

vp check --fix

Run the unit tests:

vp test

Build the library:

vp pack

About

Sanitise agent-written SQL for multi-tenant DBs.

Resources

License

Stars

Watchers

Forks

Contributors