Skip to content

1143 Add per-URL proxy metadata selection#1935

Open
preko-p wants to merge 2 commits into
apache:mainfrom
preko-p:1143-proxy-metadata-selection
Open

1143 Add per-URL proxy metadata selection#1935
preko-p wants to merge 2 commits into
apache:mainfrom
preko-p:1143-proxy-metadata-selection

Conversation

@preko-p

@preko-p preko-p commented Jun 8, 2026

Copy link
Copy Markdown

Fixes #1143.

Summary

  • Add per-URL proxy selection from URL metadata for the built-in proxy managers.
  • Support http.proxy.skip=true, full http.proxy connection strings, and component metadata keys (http.proxy.host, http.proxy.port, http.proxy.type, http.proxy.user, http.proxy.pass).
  • Validate invalid metadata proxy values instead of falling back to configured proxies.
  • Keep configured proxy defaults and rotation behavior unchanged when metadata is absent.
  • Document metadata proxy precedence and behavior.

Verification

  • git diff --check
  • Docker: mvn -B -ntp -Dskip.format.code=false -pl core git-code-format:format-code -DskipTests
  • Docker: mvn -B -ntp -pl core -Dtest=SingleProxyManagerTest,MultiProxyManagerTest,HttpProtocolProxyConcurrencyTest,HttpClientProtocolProxyManagerTest,FetcherBoltTest,SimpleFetcherBoltTest -DfailIfNoTests=false test

Focused selector result: Tests run: 44, Failures: 0, Errors: 0, Skipped: 0, BUILD SUCCESS.

Note: I also attempted Docker mvn clean verify. The core module passed 264 tests and the reactor advanced past core, but the full reactor later failed in stormcrawler-opensearch on JaCoCo threshold failures unrelated to this patch, so I am not claiming a full-repo pass.

Reference apache#1143.

AI-assistant: Codex CLI
@rzo1 rzo1 requested review from dpol1, jnioche, mvolikas and sigee June 9, 2026 06:15
@rzo1 rzo1 added this to the 3.6.1 milestone Jun 9, 2026
@rzo1 rzo1 added core ideas java Pull requests that update Java code labels Jun 9, 2026
@rzo1

rzo1 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Thanks for the PR. Two minor, non-blocking points I'd like to raise:

1. getConfiguredProxy relies on toString() equality, which is brittle for the component-key path

private Optional<SCProxy> getConfiguredProxy(SCProxy proxy) {
    String proxyString = proxy.toString();
    for (SCProxy configuredProxy : this.proxies) {
        if (configuredProxy.toString().equals(proxyString)) {
            return Optional.of(configuredProxy);
        }
    }
    return Optional.empty();
}

Since this only exists to reuse the configured SCProxy instance for LEAST_USED usage accounting, a miss is harmless (the metadata proxy is still used, just as a throwaway instance) — so there are no false positives and no functional bug. But the match is byte-sensitive on toString(), and the two construction paths normalize differently: the component-key path lowercases the protocol (type.toLowerCase(Locale.ROOT)), whereas configured proxies loaded from the file preserve whatever case the file used (regex group [^:]+). So a configured entry like HTTP://host:8080 would fail to match a semantically-identical metadata proxy built from component keys, silently losing the accounting benefit.

Could we compare on a normalized key, or give SCProxy an equals()/hashCode()? Also worth noting: the component-key → configured-match scenario isn't covered by a test (the existing one deliberately uses an out-of-rotation proxy), so this brittle path is currently untested.

2. Port-range validation is duplicated

The 1..65535 bound now lives in two places with near-identical messages:

// SingleProxyManager.configure() — int from ConfUtils.getInt
if (proxyPort < 1 || proxyPort > 65535) { ... }
// ProxyMetadata.validatePort() — String, plus NumberFormatException handling
if (parsedPort < 1 || parsedPort > 65535) { ... }

Could we consolidate the bound check into a shared helper (e.g. in ProxyMetadata) that SingleProxyManager also calls, keeping the string-parsing wrapper where it is?

@preko-p

preko-p commented Jun 9, 2026

Copy link
Copy Markdown
Author

Thanks for the careful review. I pushed a small follow-up for both points:

  • replaced the toString() comparison in getConfiguredProxy() with a normalized semantic proxy comparison, so component-key metadata can match a configured proxy even when the configured protocol casing differs
  • added a regression test for that path under LEAST_USED so the configured proxy instance gets the usage increment
  • consolidated the shared 1..65535 port bound check while keeping metadata string parsing/error handling in ProxyMetadata

Local verification:

git diff --check
/tmp/apache-maven-3.9.9/bin/mvn -pl core -Dtest=MultiProxyManagerTest,SingleProxyManagerTest test

The focused Maven run passed: 30 tests, 0 failures/errors.

@dpol1 dpol1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from my side. Good coverage, including the edge cases for partial auth, out-of-range ports, and credentials with reserved characters. Fail-fast on invalid metadata is the right call.

Minor note for a follow-up: getConfiguredProxy in MultiProxyManager does a linear scan per metadata-override request. With large proxy lists a HashMap keyed on normalized protocol+host+port in configure() would avoid the per-request work. Not a blocker here — @rzo1 worth an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core ideas java Pull requests that update Java code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use proxy on a per URL basis

4 participants