[Feature][CLI] Add SeaTunnel CLI for natural language config generation by SEZ9 · Pull Request #10789 · apache/seatunnel

SEZ9 · 2026-04-19T21:38:27Z

Purpose of this pull request

Add seatunnel-cli, a Python CLI tool that generates Apache SeaTunnel HOCON pipeline
configurations from natural language descriptions (English and Chinese).

Key capabilities:

Multi-agent pipeline: Planner → Config Generator → Validator → Auto-fix (up to 3 rounds)
100+ connectors: Auto-generated metadata catalog from Java source (*Factory.java, *Options.java) with 1200+ option definitions and
inheritance chain resolution
Multi-provider LLM support: AWS Bedrock, Anthropic API, OpenAI (and compatible APIs)
Three-tier knowledge base: Runtime REST API → auto-generated catalog → keyword routing
Dry-run validation: Local HOCON syntax check + engine --check + REST API validation
Auto-save: Generated configs automatically saved to ~/.seatunnel/last_job.conf
Auto-fix on failure: /check and /run failures trigger LLM-powered diagnosis and config repair
Session & memory: Persistent conversation sessions and connection detail memory across sessions
Interactive & single-shot modes: REPL for exploration, one-liner for scripting

Usage:

cd seatunnel-cli
pip install -e ".[bedrock]"
seatunnel  # interactive mode
seatunnel "Sync MySQL users table to S3 Parquet" -o job.conf  # single-shot

Does this PR introduce any user-facing change?

Yes. This adds a new seatunnel-cli module (Python) as a standalone tool alongside the existing Java codebase. It introduces:

- seatunnel CLI command for natural language config generation
- Interactive REPL with /save, /check, /run, /connectors, /memory commands
- Built-in connector catalog (connector_catalog.json) with 100 connectors
- --sync-catalog option to regenerate catalog from SeaTunnel Java source

No changes to existing Java modules. The CLI is fully self-contained under seatunnel-cli/.

How was this patch tested?

- Manual testing of interactive mode and single-shot mode with multiple LLM providers (Bedrock, OpenAI)
- Connector catalog generation tested against current dev branch: 100 connectors, 1273 options, 99.5% option resolution rate (7 unresolved out of
1273)
- Dry-run validation tested with local HOCON parsing and engine --check mode
- Auto-fix loop tested with intentionally broken configs (missing required fields, wrong option names)
- Session persistence and memory store tested across multiple CLI sessions
- Tested with both English and Chinese natural language inputs

Check list

- If any new Jar binary package adding in your PR, please add License Notice according
https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md — N/A (Python module, no Jar packages)
- If necessary, please update the documentation to describe the new feature. — README.md included in seatunnel-cli/
- If necessary, please update incompatible-changes.md to describe the incompatibility caused by this PR. — N/A (new module, no breaking changes)
- If you are contributing the connector code, please check that the following files are updated: — N/A (not a connector)

---

Add SeaTunnel CLI for natural language config generation

davidzollo · 2026-04-20T03:54:52Z

The repository CI uses apache/skywalking-eyes@v0.5.0 to perform license-header checks; .licenserc.yaml ignores *.md, *.json, and .gitignore, but does not ignore *.toml and *.sh. The setup.sh file in this PR contains an ASF header, while pyproject.toml and env.example.sh do not.

Suggestion: Add an ASF header in TOML comment format to pyproject.toml; keep the shebang on the first line of env.example.sh, and add the ASF header starting from the second line, following the style of setup.sh.

nzw921rx · 2026-04-20T05:05:38Z

@SEZ9 pls enable CI followed by the instruction https://github.com/apache/seatunnel/pull/10789/checks?check_run_id=72041491227

davidzollo · 2026-04-20T06:10:13Z

Amazing! This is a very innovative feature.

Thanks for the contribution. I tested this PR locally with a real OpenAI-compatible provider, using DeepSeek (OPENAI_BASE_URL=https://api.deepseek.com, OPENAI_MODEL=deepseek-chat).

The good news is that the basic CLI flow can work: provider initialization, planner/config/validator flow, static catalog loading, config rendering, auto-save, and a simple FakeSource -> Console generation all completed successfully. The generated FakeSource -> Console config also passed validate_hocon.

However, I found several issues that should be fixed before merge.

1. Source/sink option rules are incorrectly merged for connectors with the same name

This is the most important functional issue.

For Jdbc, the generated catalog currently merges source and sink rules into one connector entry:

types = [source, sink]
required = [url, driver, schema_save_mode, data_save_mode]

But according to the actual Java source, schema_save_mode and data_save_mode are sink-only options.

JdbcSourceFactory.optionRule() only requires:

.required(JdbcSourceOptions.URL, JdbcSourceOptions.DRIVER)

while JdbcSinkFactory.optionRule() requires:

.required(
    JdbcSinkOptions.URL,
    JdbcSinkOptions.DRIVER,
    JdbcSinkOptions.SCHEMA_SAVE_MODE,
    JdbcSinkOptions.DATA_SAVE_MODE)

Because the catalog merges both factories by connector name, a normal Jdbc source job is incorrectly treated as missing sink-only options. In my real DeepSeek test, the CLI repeatedly entered the fix loop and finally generated this invalid source config:

source {
  Jdbc {
    url = "jdbc:mysql://localhost:3306/test"
    driver = "com.mysql.cj.jdbc.Driver"
    username = "${MYSQL_USER}"
    password = "${MYSQL_PASSWORD}"
    query = "SELECT id, name FROM users"

    schema_save_mode = "DISABLED"
    data_save_mode = "DISABLED"
  }
}

This is not a valid semantic fix. The CLI only produced it because its own catalog validation was wrong.

Suggested fix:

Store catalog details by (plugin_type, connector_name), not only by connector_name.
Validate source.Jdbc against source rules only.
Validate sink.Jdbc against sink rules only.
Keep the compact display index if desired, but do not use merged required options for validation or LLM tool details.
Add a regression test for source { Jdbc { url, driver, query } } to ensure it does not require schema_save_mode / data_save_mode.

2. Basic installation path does not start the CLI successfully

pip install ./seatunnel-cli succeeds, but running seatunnel fails because the default provider is bedrock and boto3 is not part of the base dependencies.

Users must install one of the extras, for example:

pip install -e ".[bedrock]"
pip install -e ".[openai]"
pip install -e ".[anthropic]"

Suggested fix:

Either include the default provider SDK in base dependencies, or
make the CLI fail gracefully with a clear message before stack trace, or
require --provider and document that base install alone is not enough.

3. ASF license headers are missing in new files

seatunnel-cli/pyproject.toml and seatunnel-cli/env.example.sh do not have ASF license headers. The repository SkyWalking Eyes config does not ignore .toml or .sh, so the license-header check is likely to fail.

Suggested fix:

Add ASF header to pyproject.toml.
Keep shebang as the first line in env.example.sh, then add ASF header below it.

DanielLeens · 2026-04-20T07:03:08Z

Hi @SEZ9, thanks for putting together such an ambitious CLI prototype. I pulled the branch locally as seatunnel-review-10789 and reviewed it against the current dev baseline.

Runtime path I checked:

seatunnel console command
  -> seatunnel_cli.cli.main()
      -> Orchestrator in agents.py
          -> LLMProvider / connector catalog / memory store
          -> generated HOCON config
      -> /check or /run
          -> local seatunnel.sh --check / --config, or REST validation through SEATUNNEL_API_BASE

The idea is valuable, but I do not think this PR is ready to merge yet. A few items need to be closed first:

The GitHub Build check is ACTION_REQUIRED, so the PR has not gone through the required project validation yet.
This adds a new Python package and LLM runtime under seatunnel-cli/, with dependencies declared in seatunnel-cli/pyproject.toml (rich, prompt-toolkit, pyhocon, and optional boto3, anthropic, openai). Since this becomes part of the Apache source/release surface, we need the dependency/license/release integration to be explicit rather than living only as an isolated subdirectory.
Some new non-code package files still need Apache source-release treatment. For example seatunnel-cli/pyproject.toml and seatunnel-cli/env.example.sh start without the ASF header pattern used by the new Python sources. The current CI action-required state may be related, but either way the release checks must be clean.
seatunnel-cli/seatunnel_cli/connector_catalog.json is a large generated catalog checked into the repo. Please add a reproducible generation/update story in CI or tests, otherwise it can silently drift from the Java connector Option definitions and generate configs that no longer match the engine.

Conclusion: can merge after fixes

This is a promising direction, but before merge I would like to see CI enabled and green, source-release/dependency handling made explicit, and the static catalog generation path made reproducible. Happy to review the next revision; the feature idea itself is exciting.

chl-wxp

This PR introduces a lot of functionality at once (CLI, multi-agent pipeline, catalog generation, validation, execution, memory, etc.), which makes it quite heavy to review and reason about.

It might be more practical to start with a minimal viable workflow:

interactive or one-shot CLI
natural language → HOCON config generation

without validation, engine integration, or metadata catalog for now.

This helps establish the core value (“NL → config”) first, and keeps the initial scope small and easier to review.

Then we can layer additional capabilities (validation, catalog, execution, auto-fix, etc.) in separate PRs.

This incremental approach would reduce risk and improve maintainability.

chl-wxp · 2026-04-20T16:09:47Z

Can rely on existing inspection capabilities:https://github.com/apache/seatunnel/pull/10763/changes

DanielLeens · 2026-04-21T15:36:29Z

Hi @SEZ9, I rechecked the current PR head locally as seatunnel-review-10789 at 072da0ccf65f. I reviewed the full diff against upstream/dev and did not run local Maven/tests in this batch; this is a source-level review.

The new CLI path is outside the Java engine runtime but becomes part of the Apache source/release surface:

seatunnel console command
  -> seatunnel_cli.cli.main()
  -> Orchestrator / LLMProvider / connector catalog / memory store
  -> generated HOCON config
  -> local seatunnel.sh --check/--config or REST validation

The idea is useful, but the PR still needs release-quality integration work. Fetched metadata reports Build: ACTION_REQUIRED. The new Python package, optional LLM dependencies, source-release headers, and generated connector catalog need explicit handling so this does not become a drifting sidecar tool.

Conclusion: can merge after fixes

Blocking items:

Get the required Build validation enabled and green.
Make dependency/license/source-release treatment explicit for the new Python package files.
Add a reproducible generation/update path for connector_catalog.json so it stays aligned with Java connector options.

SeaTunnel CLI

072da0c

Add SeaTunnel CLI for natural language config generation

SEZ9 mentioned this pull request Apr 19, 2026

[Discussion] Support AI generation for SeaTunnel task config files #10651

Open

chl-wxp reviewed Apr 20, 2026

View reviewed changes

chl-wxp suggested changes Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][CLI] Add SeaTunnel CLI for natural language config generation#10789

[Feature][CLI] Add SeaTunnel CLI for natural language config generation#10789
SEZ9 wants to merge 1 commit intoapache:devfrom
SEZ9:feature/seatunnel-cli

SEZ9 commented Apr 19, 2026

Uh oh!

davidzollo commented Apr 20, 2026

Uh oh!

nzw921rx commented Apr 20, 2026 •

edited by davidzollo

Loading

Uh oh!

davidzollo commented Apr 20, 2026 •

edited

Loading

Uh oh!

DanielLeens commented Apr 20, 2026

Uh oh!

chl-wxp left a comment

Uh oh!

chl-wxp Apr 20, 2026

Uh oh!

DanielLeens commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

SEZ9 commented Apr 19, 2026

Purpose of this pull request

Uh oh!

davidzollo commented Apr 20, 2026

Uh oh!

nzw921rx commented Apr 20, 2026 • edited by davidzollo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidzollo commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Source/sink option rules are incorrectly merged for connectors with the same name

2. Basic installation path does not start the CLI successfully

3. ASF license headers are missing in new files

Uh oh!

DanielLeens commented Apr 20, 2026

Conclusion: can merge after fixes

Uh oh!

chl-wxp left a comment

Choose a reason for hiding this comment

Uh oh!

chl-wxp Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

DanielLeens commented Apr 21, 2026

Conclusion: can merge after fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nzw921rx commented Apr 20, 2026 •

edited by davidzollo

Loading

davidzollo commented Apr 20, 2026 •

edited

Loading