Improvement: Treat TexeraAgents as access-controlled Resources, just like other resources (workflows, datasets and computing units) #5302

bobbai00 · 2026-05-31T20:46:53Z

bobbai00
May 31, 2026
Collaborator

Background

In #4495, a new microservice, agent-service, is introduced to manage TexeraAgent instances that help users do data science using natural language and workflows. After this change, the architecture becomes the following:

The diagrams below show the service traffic when users CRUD TexeraAgent resources:

The diagrams below show the service traffic when users collaborate with TexeraAgent, and TexeraAgent performs ReAct:

Problem Of Current Design

agent-service currently manages TexeraAgent as in-memory objects with no access control. This introduces two major issues:

Limited scalability: agent traces are ephemeral. Whenever agent-service goes down, the agent trace is lost.
No per-user isolation: all users see each other's agents. The conversation between the user and the agent may contain confidential or sensitive information about workflow execution and the user's request.

Proposal: Treat TexeraAgent As A Resource

Texera already manages several kinds of resources: users, workflows, datasets, computing units, and workflow executions. To solve the problems above, TexeraAgent should be treated as another resource type. Agents are persisted in texera_db, managed by Postgres, and are owned by a user.

Relational DB Schema Change

A new entity, agent, is added. One user can own multiple agents through agent.uid -> user.uid.

The schema of the agent entity is:

CREATE TABLE IF NOT EXISTS agent
(
    aid           UUID PRIMARY KEY,
    uid           INT NOT NULL,
    name          VARCHAR(128) NOT NULL,
    model_type    VARCHAR(256) NOT NULL,
    config        JSONB NOT NULL DEFAULT '{}'::jsonb,
    react_steps   JSONB NOT NULL DEFAULT '[]'::jsonb,
    creation_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE
);

Column responsibilities:

Column	Type	Description
`aid`	`UUID`	Stable agent ID exposed by the agent API.
`uid`	`INT`	Owner user ID. This is the access-control boundary for the current PR.
`name`	`VARCHAR(128)`	User-visible agent name.
`model_type`	`VARCHAR(256)`	Model selected for the agent, for example `gpt-4o-mini`.
`config`	`JSONB`	Persistent agent configuration, including system prompt, tool metadata, and runtime settings.
`react_steps`	`JSONB`	Persistent ReAct trace as an array of JSON objects.
`creation_time`	`TIMESTAMP`	Creation timestamp used for ordering and display.

config contains durable agent settings only. It should include data such as:

{
  "systemPrompt": "...",
  "tools": [
    {
      "name": "addOperator",
      "description": "...",
      "inputSchema": {},
      "enabled": true
    }
  ],
  "settings": {
    "maxOperatorResultCharLimit": 2000,
    "maxOperatorResultCellCharLimit": 2000,
    "operatorResultSerializationMode": "tsv",
    "toolTimeoutSeconds": 240,
    "executionTimeoutMinutes": 4,
    "disabledTools": [],
    "maxSteps": 100,
    "allowedOperatorTypes": ["CSVFileScan", "Filter"]
  }
}

react_steps stores the durable ReAct state as a JSON array. Each element represents one user or agent step and may include nested tool calls, tool results, token usage, and workflow snapshots:

[
  {
    "id": "step-...",
    "messageId": "msg-...",
    "stepId": 0,
    "timestamp": 1780217116417,
    "role": "user",
    "content": "Analyze this workflow",
    "isBegin": true,
    "isEnd": true,
    "messageSource": "chat"
  }
]

The table intentionally does not store workflow ID, computing unit ID, workflow name, or user JWT. Those values are request-scoped and are supplied when the user sends an agent task. This avoids stale workflow bindings, stale computing unit bindings, and persisted credentials.

Service Traffic Change

The traffic when users CRUD agents becomes:

The traffic when users collaborate with agents becomes:

Access Control

All agent APIs require authentication. The agent service validates JWTs using the same secret as the rest of Texera, loaded from:

common/config/src/main/resources/auth.conf

Access rules:

Operation	Rule
Create agent	Caller must have a valid JWT. The inserted row stores the caller's `uid`.
List agents	Caller only sees agents where `agent.uid = caller.uid`.
Read/update/delete/control agent	Caller must own the target agent.
WebSocket connect	Caller must own the target agent.
WebSocket task message	Caller must own the agent and provide valid task context.

Unauthorized requests return 401. Authenticated users accessing another user's agent return 403.

Persistence Flow

create agent
  -> validate JWT
  -> create runtime TexeraAgent
  -> insert agent row with owner uid, config, and empty react_steps

send task
  -> validate WebSocket access
  -> validate task context
  -> retrieve workflow using userToken + workflowId
  -> run TexeraAgent
  -> persist ReAct steps after step updates and completion

chenlica · 2026-06-01T06:48:50Z

chenlica
Jun 1, 2026
Collaborator

This is an important change. Given the length of the description, I prefer to have a meeting and report the results here.

0 replies

bobbai00 · 2026-06-05T23:04:30Z

bobbai00
Jun 5, 2026
Collaborator Author

Can @Yicong-Huang, @zuozhiw , @Xiao-zhen-Liu and @aglinxinyuan chime in on this ?

0 replies

aglinxinyuan · 2026-06-05T23:09:16Z

aglinxinyuan
Jun 5, 2026
Collaborator

I'm ok with the idea.

0 replies

Yicong-Huang · 2026-06-05T23:31:13Z

Yicong-Huang
Jun 5, 2026
Collaborator

I do believe you are talking about managing metadata of agents, which is the CRUD of creating agent as a resource. There is one thing missing in the discussion, which is the access control service, the access grant of which user can manage which agent should be routed to that service. After an agent is created, the access control service should gate it as a resource: only user who has certain privilege can access the agent. Please add that part into the discussion.

I agree such metadata it should be persisted. and agree with the proposal in general. Let's see the details after you revise the discussion.

One more thing, which I think you partially touched upon, is how do user collaborate with agents. There are in general two ways:

agent act (send requests to other services) on behalf of a human user.
we treat agent as a real user, and human user has to grant access to a certain resource to an agent user. Then agent user can act.

I think this needs a further, possibly separate, discussion.

0 replies

zuozhiw · 2026-06-05T23:36:06Z

zuozhiw
Jun 5, 2026
Collaborator

this is a natural step to add persistence, the table schema overall looks good , some comments on the DDL

    aid           UUID PRIMARY KEY,
    uid           INT NOT NULL,
    name          VARCHAR(128) NOT NULL,
    model_type    VARCHAR(256) NOT NULL,
    config        JSONB NOT NULL DEFAULT '{}'::jsonb,
    react_steps   JSONB NOT NULL DEFAULT '[]'::jsonb,
    creation_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE

add a last updated time column
consider rename uid to createdby_uid
consider rename name to title
think about the alternative of using a separate table to store messages and one message per row, because if we store messages in a big json blob, every time don't we have to like read and write the whole history? one row per message might be more efficient so that we can just do append. Furturemore, in that table each row can include some message metadata, also we can build index on messages if we ever want to search them. just a suggestion
also I have a question on the format of react steps, what's the format of this? I would highly recommend we don't invent our own format, instead we just reuse existing format, e.g., if vercel has some format can't we just use that?

after we store the created by user id access control should be fairly simple and reuse whatever we have right now.

0 replies

chenlica · 2026-06-06T05:08:07Z

chenlica
Jun 6, 2026
Collaborator

We have a few efforts related to new resources, such as Agent services, "Templates," and ML models. To simplify the design, I wonder whether we can do the following:

First introduce the resource without supporting sharing, i.e., each resource is owned by one user only.
Then discuss how to do sharing.

0 replies

bobbai00 · 2026-06-08T05:12:52Z

bobbai00
Jun 8, 2026
Collaborator Author

Authentication foundation + per-deployment-mode setup

Splitting agent access control into two layers makes the rollout incremental:

Authentication — verify the JWT and put the agent-service behind the gateway's ext-authz (the access-control-service), instead of the agent-service decoding tokens itself. Tracked in Authorize agent-service requests in the access-control-service (JWT authn + per-agent authz) #5561.
Authorization (per-agent ownership) — this discussion. Until the persistence + owner model here lands, authz is intentionally allow-all: any authenticated REGULAR/ADMIN user may reach any agent. The access-control-service is where the per-agent decision will live once agents have an owner (uid).

The authentication layer is a gateway concern, so it's wired up differently in each deployment mode:

Kubernetes (Envoy Gateway) — canonical path, mirroring how computing-unit / executions / pve are already authorized:

An Envoy SecurityPolicy (extAuth.http → access-control-service /api/auth) targeting the *-agent-service-route HTTPRoute.
A branch in AccessControlResource.authorize() for /api/agents… that runs JwtParser.parseToken + SessionUser.isRoleOf(REGULAR/ADMIN) and returns the trusted x-user-* headers on success (401/403 otherwise).

Docker Compose (single-node, nginx) — nginx is the gateway, so use its ext-authz equivalent, auth_request. Sketch:

location /api/agents {
    auth_request /_agent_auth;
    proxy_pass http://agent-service:3001;
    # ...existing ws-upgrade headers...
}
location = /_agent_auth {
    internal;
    # $request_uri makes access-control see /api/auth/api/agents/... — same shape Envoy produces,
    # so the SAME authorize() branch handles both gateways.
    proxy_pass http://access-control-service:9096/api/auth$request_uri;
    proxy_pass_request_body off;
    proxy_set_header Content-Length "";
}

The Authorization: Bearer header is forwarded automatically, and the /react WebSocket carries its token as ?access-token=… (part of $request_uri), so no authorize() changes are needed beyond the new /api/agents branch.

Bare-metal dev (Angular ng serve + proxy.config.json) — the dev proxy is not a real gateway and can't do ext-authz subrequests. This is already the case for the other protected routes: in proxy.config.json, /wsapi and /pve point straight at the services (:8085), bypassing access-control entirely. So for parity, /api/agents stays direct-to-:3001 in dev with no ext-authz — enforcement is exercised through the docker-compose or k8s gateways. (To test the auth path locally, point the dev proxy's /api/agents at the docker-compose nginx instead of :3001.)

Frontend (all modes): attach the JWT to every agent call — Authorization: Bearer on REST (today agentHeaders() only sets X-Agent-Workflow-Id and fetchModelTypes() sends nothing) and ?access-token=… on the /react WS — then drop the agent-service's home-grown decodeJWT / validateToken and have it trust the gateway-injected x-user-* headers (which only exist after ext-authz passes).

Net: ship authentication now (allow-all authz), and this discussion's ownership model becomes the per-agent authorization step layered on top.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement: Treat TexeraAgents as access-controlled Resources, just like other resources (workflows, datasets and computing units) #5302

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Improvement: Treat TexeraAgents as access-controlled Resources, just like other resources (workflows, datasets and computing units) #5302

Uh oh!

Uh oh!

bobbai00 May 31, 2026 Collaborator

Background

Problem Of Current Design

Proposal: Treat TexeraAgent As A Resource

Relational DB Schema Change

Service Traffic Change

Access Control

Persistence Flow

Replies: 7 comments

Uh oh!

chenlica Jun 1, 2026 Collaborator

Uh oh!

bobbai00 Jun 5, 2026 Collaborator Author

Uh oh!

aglinxinyuan Jun 5, 2026 Collaborator

Uh oh!

Uh oh!

Yicong-Huang Jun 5, 2026 Collaborator

Uh oh!

zuozhiw Jun 5, 2026 Collaborator

Uh oh!

chenlica Jun 6, 2026 Collaborator

Uh oh!

bobbai00 Jun 8, 2026 Collaborator Author

Authentication foundation + per-deployment-mode setup

bobbai00
May 31, 2026
Collaborator

chenlica
Jun 1, 2026
Collaborator

bobbai00
Jun 5, 2026
Collaborator Author

aglinxinyuan
Jun 5, 2026
Collaborator

Yicong-Huang
Jun 5, 2026
Collaborator

zuozhiw
Jun 5, 2026
Collaborator

chenlica
Jun 6, 2026
Collaborator

bobbai00
Jun 8, 2026
Collaborator Author