[Feature][CLI] Add SeaTunnel CLI for natural language config generation#10789
[Feature][CLI] Add SeaTunnel CLI for natural language config generation#10789SEZ9 wants to merge 1 commit intoapache:devfrom
Conversation
Add SeaTunnel CLI for natural language config generation
|
The repository CI uses apache/skywalking-eyes@v0.5.0 to perform license-header checks; .licenserc.yaml ignores *.md, *.json, and .gitignore, but does not ignore *.toml and *.sh. The setup.sh file in this PR contains an ASF header, while pyproject.toml and env.example.sh do not. Suggestion: Add an ASF header in TOML comment format to pyproject.toml; keep the shebang on the first line of env.example.sh, and add the ASF header starting from the second line, following the style of setup.sh. |
|
@SEZ9 pls enable CI followed by the instruction https://github.com/apache/seatunnel/pull/10789/checks?check_run_id=72041491227 |
|
Amazing! This is a very innovative feature. Thanks for the contribution. I tested this PR locally with a real OpenAI-compatible provider, using DeepSeek ( The good news is that the basic CLI flow can work: provider initialization, planner/config/validator flow, static catalog loading, config rendering, auto-save, and a simple However, I found several issues that should be fixed before merge. 1. Source/sink option rules are incorrectly merged for connectors with the same nameThis is the most important functional issue. For But according to the actual Java source,
.required(JdbcSourceOptions.URL, JdbcSourceOptions.DRIVER)while .required(
JdbcSinkOptions.URL,
JdbcSinkOptions.DRIVER,
JdbcSinkOptions.SCHEMA_SAVE_MODE,
JdbcSinkOptions.DATA_SAVE_MODE)Because the catalog merges both factories by connector name, a normal source {
Jdbc {
url = "jdbc:mysql://localhost:3306/test"
driver = "com.mysql.cj.jdbc.Driver"
username = "${MYSQL_USER}"
password = "${MYSQL_PASSWORD}"
query = "SELECT id, name FROM users"
schema_save_mode = "DISABLED"
data_save_mode = "DISABLED"
}
}This is not a valid semantic fix. The CLI only produced it because its own catalog validation was wrong. Suggested fix:
2. Basic installation path does not start the CLI successfully
Users must install one of the extras, for example: pip install -e ".[bedrock]"
pip install -e ".[openai]"
pip install -e ".[anthropic]"Suggested fix:
3. ASF license headers are missing in new files
Suggested fix:
|
|
Hi @SEZ9, thanks for putting together such an ambitious CLI prototype. I pulled the branch locally as Runtime path I checked: The idea is valuable, but I do not think this PR is ready to merge yet. A few items need to be closed first:
Conclusion: can merge after fixesThis is a promising direction, but before merge I would like to see CI enabled and green, source-release/dependency handling made explicit, and the static catalog generation path made reproducible. Happy to review the next revision; the feature idea itself is exciting. |
chl-wxp
left a comment
There was a problem hiding this comment.
This PR introduces a lot of functionality at once (CLI, multi-agent pipeline, catalog generation, validation, execution, memory, etc.), which makes it quite heavy to review and reason about.
It might be more practical to start with a minimal viable workflow:
interactive or one-shot CLI
natural language → HOCON config generation
without validation, engine integration, or metadata catalog for now.
This helps establish the core value (“NL → config”) first, and keeps the initial scope small and easier to review.
Then we can layer additional capabilities (validation, catalog, execution, auto-fix, etc.) in separate PRs.
This incremental approach would reduce risk and improve maintainability.
There was a problem hiding this comment.
Can rely on existing inspection capabilities:https://github.com/apache/seatunnel/pull/10763/changes
|
Hi @SEZ9, I rechecked the current PR head locally as The new CLI path is outside the Java engine runtime but becomes part of the Apache source/release surface: The idea is useful, but the PR still needs release-quality integration work. Fetched metadata reports Conclusion: can merge after fixesBlocking items:
|
Purpose of this pull request
Add
seatunnel-cli, a Python CLI tool that generates Apache SeaTunnel HOCON pipelineconfigurations from natural language descriptions (English and Chinese).
Key capabilities:
*Factory.java,*Options.java) with 1200+ option definitions andinheritance chain resolution
--check+ REST API validation~/.seatunnel/last_job.conf/checkand/runfailures trigger LLM-powered diagnosis and config repairUsage: