Skip to content

[Improvement]: Improve AmsThriftUrl parsing and validation #4232

@slfan1989

Description

@slfan1989

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Motivation

AmsThriftUrl is used by clients to parse AMS thrift and ZooKeeper URLs. The current implementation has a few issues that can make URL parsing less robust:

  1. parserThriftUrl() lowercases the entire URL before parsing.
  2. socketTimeout query parameter parsing is duplicated.
  3. thrift URL validation is weak for missing host, missing port, or unsupported scheme.
  4. Some error messages are too generic, such as invalid ams url.

Lowercasing the whole URL may unexpectedly change case-sensitive parts, such as catalog names or query parameter values.

Current behavior

For example:

AmsThriftUrl.parse("ThRiFt://LOCALHOST:1260/MyCatalog?socketTimeout=6000", null)

currently lowercases the URL before parsing, so values such as host/catalog may be changed unexpectedly.

Invalid URLs such as the following are not clearly rejected during parsing:

http://127.0.0.1:1260/catalog
thrift:///catalog
thrift://127.0.0.1/catalog

Expected behavior

AmsThriftUrl should:

  • Preserve the original URL case for host, catalog name, and query values.
  • Treat URL schemes case-insensitively.
  • Validate that thrift URLs use the thrift scheme.
  • Validate that thrift URLs contain both host and port.
  • Parse socketTimeout consistently in thrift and ZooKeeper URL paths.
  • Provide clearer error messages for invalid URLs.

Proposed changes

  • Avoid calling toLowerCase() on the whole URL.
  • Parse URL scheme with case-insensitive comparison.
  • Add explicit thrift URL validation for:
    • unsupported scheme
    • missing host
    • missing port
  • Extract shared socketTimeout query parsing logic.
  • Add unit tests for AmsThriftUrl.

How should we improve?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions