[codex] Improve Nettacker HTTP detection accuracy and explicit port handling#1545
Conversation
- Improved target expansion to extract ports and schemas from URLs. - Added baseline response comparison for HTTP requests to detect changes. - Introduced new CORS misconfiguration vulnerability detection module. - Updated various YAML configurations to support new response conditions. - Added unit tests for baseline response handling and target expansion logic.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (15)
Summary by CodeRabbit
WalkthroughThis PR introduces HTTP baseline response comparison for condition evaluation and URL-based target parsing with auto-detected ports/schemes. HTTP helpers compute content fingerprints and generate randomized baseline requests; YAML vulnerability and scan modules are configured with baseline response filtering. URL expansion now parses explicit scheme/port/path and toggles skip_service_discovery for port scanning. ChangesHTTP Baseline Response Support
URL-based Target Parsing & Port/Scheme Auto-detection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Microsoft Presidio Analyzer (2.2.362)nettacker/core/app.pyMicrosoft Presidio Analyzer failed to scan this file Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR validation failed: No linked issue and no valid closing issue reference in PR description |
Summary
This PR improves Nettacker's HTTP scanning accuracy for several cases that showed up during a lab assessment against modern Node/SPA applications (
bkimminich/juice-shopandnirocr/nodegoat). The main theme is reducing false positives caused by status-code-only matching while also fixing silent false negatives in header-based vulnerability modules.The branch is based directly on
OWASP/Nettacker:masterand is intended to be reviewable as one focused changeset.The changes are intentionally split between two layers:
content_length,content_sha1, missing-header matching, and catch-all baseline comparison)This should preserve the current YAML-driven module model while giving module authors safer matching tools for modern web applications.
Problems addressed
1. URL probe modules false-positive on SPA catch-all routes
dir_scan,admin_scan, andpma_scanpreviously treated200,401, or403as enough evidence that a path existed. That produces very noisy output against SPAs and catch-all routes that return the sameindex.htmlfor every unknown path.For example, Angular/React/Vue/Svelte applications commonly route unknown paths back to the frontend shell with
200 OK. In that situation, status-code-only matching cannot distinguish a real/adminpage from/nettacker-random-nonsense.This PR adds an opt-in
baseline_responsecondition for HTTP modules. When a module uses it, Nettacker performs a low-impact request to a random sibling path and only reports a probe hit if the probe response differs from the random baseline by status code, body length beyond tolerance, or body SHA-1.Applied to:
dir_scanadmin_scanpma_scan2. Header absence was not matchable by YAML header conditions
Several header vulnerability modules tried to detect unsafe header values, but a completely missing header could fail to match and therefore produce no finding. Missing security headers are often the common case, so this silently under-reports real issues.
The HTTP condition matcher now evaluates missing response headers as an empty string (
"") instead of a non-string falsey value. That lets existing regex-based YAML conditions explicitly match absence with^$.Updated modules:
clickjacking_vulncontent_type_options_vulnx_xss_protection_vulncontent_security_policy_vulnalready had an absence-compatible regex, but it benefits from the engine fix because missing headers can now match consistently.3. OPTIONS method detection missed CORS preflight method lists
http_options_enabled_vulnonly looked at the legacyAllowheader. Modern web frameworks frequently expose allowed methods throughAccess-Control-Allow-Methodson OPTIONS preflight responses instead.This PR makes either header sufficient evidence for the module:
AllowAccess-Control-Allow-Methods4. WAF detection produced false positives from status-code deltas
waf_scanhad a generic fallback heuristic: compare the baseline request status code to an XSS-payload request status code, and report "WAF detected" if they differ.That is too weak as a WAF signal. Normal application routing, caching, redirects, frontend fallbacks, and framework behavior can all produce status differences without a WAF or CDN in front of the app.
This PR removes the status-delta-only heuristic and leaves the existing positive-signature checks in place. WAF findings now require a vendor/header/body/status signature from the existing
iterative_response_matchdatabase rather than a generic status-code difference.It also removes the previous typo-bearing fallback log (
differenet).5. Explicit URL ports should not be blocked by default service discovery
When a user provides a URL with an explicit scheme and port, such as
http://jshop:3000, or provides-g 3000, Nettacker already has enough user intent to scan that port. The prior flow could run service discovery first, fail to classify the service, and stop with "no live service found" even though the requested HTTP service was reachable.This PR updates target expansion to preserve explicit URL scheme/port into the parsed runtime options and skip the service-discovery gate for explicit user-provided ports. This means modules such as
http_status_scancan run against common dev/test ports like3000,4000,8000, and8080without requiring-das a workaround.It also fixes the English message:
no any live service found to scan.no live service found to scan.6. Add a focused CORS misconfiguration module
This PR adds
cors_misconfiguration_vulnfor common unsafe CORS responses:Access-Control-Allow-Origincombined with credentialsPUT,PATCH, orDELETEThis complements the existing
http_cors_vulnmodule by adding checks for wildcard/reflected-origin and broad-method combinations observed in modern APIs.Implementation details
HTTP response fingerprints
nettacker/core/lib/http.pynow records two extra response fields for HTTP responses:content_lengthcontent_sha1These are exposed as normal YAML matchable response conditions. The YAML schema test was extended so module definitions can use those fields.
Missing header matching
Missing headers are now passed to regex matching as
"". This keeps the matching model simple and makes absence explicit in YAML:Baseline comparison condition
A module can now opt into catch-all filtering with:
For such modules, Nettacker requests a random sibling path and compares the probe to that baseline. The condition passes only when at least one of these differs:
max_content_length_deltaThis keeps the feature opt-in so it only affects modules that are vulnerable to catch-all false positives.
Explicit port handling
Nettacker.expand_targets()now usesurllib.parse.urlsplit()for URL targets. It extracts:arguments.portswhen-g/--portswas not separately suppliedarguments.schemawhen--schemawas not separately suppliedurl_base_pathIf
arguments.portsis present, the default service-discovery pre-pass is skipped because the user has already selected the port set to scan.Files changed
Core behavior:
nettacker/core/lib/http.pynettacker/core/app.pynettacker/locale/en.yamlModule definitions:
nettacker/modules/scan/dir.yamlnettacker/modules/scan/admin.yamlnettacker/modules/scan/pma.yamlnettacker/modules/scan/waf.yamlnettacker/modules/vuln/clickjacking.yamlnettacker/modules/vuln/content_type_options.yamlnettacker/modules/vuln/http_options_enabled.yamlnettacker/modules/vuln/x_xss_protection.yamlnettacker/modules/vuln/cors_misconfiguration.yamlTests:
tests/core/lib/test_http.pytests/core/test_app_targets.pytests/test_yaml_schema_and_regex.pyCompatibility notes
baseline_response.-g/--portsand explicit URL ports now bypass the service-discovery gate for downstream modules. Default scans without explicit ports still use the existing service-discovery pre-pass.Validation
I ran the focused regression and schema checks in the repository virtualenv:
.venv/bin/python -m pytest -o addopts='' \ tests/core/lib/test_http.py \ tests/core/test_app_targets.py \ tests/test_yaml_schema_and_regex.py -qResult:
I also ran Ruff on the changed Python files and new tests:
Result:
And checked whitespace:
Result: clean.
Reviewer notes
The most important design question is whether
baseline_responsebelongs in the generic HTTP matcher as implemented here, or whether maintainers would prefer a more explicit condition name or a module-level option. I kept it as an opt-in condition because it fits the existing YAML condition model and avoids changing unaffected modules.A second review point is CORS severity. The new module currently treats credentialed cross-origin access and broad unsafe methods as reportable. If the project prefers more granular severities for CORS combinations, this module can be split into multiple YAML steps or separate modules.
Finally, the WAF change intentionally favors false-positive reduction over broad heuristic detection. A status-code delta alone is not strong enough evidence for "WAF detected" on modern apps; vendor/header/body signatures remain the safer path.