Skip to content

[codex] Improve Nettacker HTTP detection accuracy and explicit port handling#1545

Closed
kerberosmansour wants to merge 1 commit intoOWASP:masterfrom
kerberosmansour:fix/nettacker-recommendations-2026-05-07
Closed

[codex] Improve Nettacker HTTP detection accuracy and explicit port handling#1545
kerberosmansour wants to merge 1 commit intoOWASP:masterfrom
kerberosmansour:fix/nettacker-recommendations-2026-05-07

Conversation

@kerberosmansour
Copy link
Copy Markdown

Summary

This PR improves Nettacker's HTTP scanning accuracy for several cases that showed up during a lab assessment against modern Node/SPA applications (bkimminich/juice-shop and nirocr/nodegoat). The main theme is reducing false positives caused by status-code-only matching while also fixing silent false negatives in header-based vulnerability modules.

The branch is based directly on OWASP/Nettacker:master and is intended to be reviewable as one focused changeset.

The changes are intentionally split between two layers:

  • small HTTP engine primitives that module YAML can reuse (content_length, content_sha1, missing-header matching, and catch-all baseline comparison)
  • targeted module updates for directory discovery, security headers, OPTIONS/CORS handling, and WAF detection

This should preserve the current YAML-driven module model while giving module authors safer matching tools for modern web applications.

Problems addressed

1. URL probe modules false-positive on SPA catch-all routes

dir_scan, admin_scan, and pma_scan previously treated 200, 401, or 403 as enough evidence that a path existed. That produces very noisy output against SPAs and catch-all routes that return the same index.html for every unknown path.

For example, Angular/React/Vue/Svelte applications commonly route unknown paths back to the frontend shell with 200 OK. In that situation, status-code-only matching cannot distinguish a real /admin page from /nettacker-random-nonsense.

This PR adds an opt-in baseline_response condition for HTTP modules. When a module uses it, Nettacker performs a low-impact request to a random sibling path and only reports a probe hit if the probe response differs from the random baseline by status code, body length beyond tolerance, or body SHA-1.

Applied to:

  • dir_scan
  • admin_scan
  • pma_scan

2. Header absence was not matchable by YAML header conditions

Several header vulnerability modules tried to detect unsafe header values, but a completely missing header could fail to match and therefore produce no finding. Missing security headers are often the common case, so this silently under-reports real issues.

The HTTP condition matcher now evaluates missing response headers as an empty string ("") instead of a non-string falsey value. That lets existing regex-based YAML conditions explicitly match absence with ^$.

Updated modules:

  • clickjacking_vuln
  • content_type_options_vuln
  • x_xss_protection_vuln

content_security_policy_vuln already had an absence-compatible regex, but it benefits from the engine fix because missing headers can now match consistently.

3. OPTIONS method detection missed CORS preflight method lists

http_options_enabled_vuln only looked at the legacy Allow header. Modern web frameworks frequently expose allowed methods through Access-Control-Allow-Methods on OPTIONS preflight responses instead.

This PR makes either header sufficient evidence for the module:

  • Allow
  • Access-Control-Allow-Methods

4. WAF detection produced false positives from status-code deltas

waf_scan had a generic fallback heuristic: compare the baseline request status code to an XSS-payload request status code, and report "WAF detected" if they differ.

That is too weak as a WAF signal. Normal application routing, caching, redirects, frontend fallbacks, and framework behavior can all produce status differences without a WAF or CDN in front of the app.

This PR removes the status-delta-only heuristic and leaves the existing positive-signature checks in place. WAF findings now require a vendor/header/body/status signature from the existing iterative_response_match database rather than a generic status-code difference.

It also removes the previous typo-bearing fallback log (differenet).

5. Explicit URL ports should not be blocked by default service discovery

When a user provides a URL with an explicit scheme and port, such as http://jshop:3000, or provides -g 3000, Nettacker already has enough user intent to scan that port. The prior flow could run service discovery first, fail to classify the service, and stop with "no live service found" even though the requested HTTP service was reachable.

This PR updates target expansion to preserve explicit URL scheme/port into the parsed runtime options and skip the service-discovery gate for explicit user-provided ports. This means modules such as http_status_scan can run against common dev/test ports like 3000, 4000, 8000, and 8080 without requiring -d as a workaround.

It also fixes the English message:

  • before: no any live service found to scan.
  • after: no live service found to scan.

6. Add a focused CORS misconfiguration module

This PR adds cors_misconfiguration_vuln for common unsafe CORS responses:

  • reflected or wildcard/null Access-Control-Allow-Origin combined with credentials
  • reflected or wildcard/null origins combined with broad methods such as PUT, PATCH, or DELETE

This complements the existing http_cors_vuln module by adding checks for wildcard/reflected-origin and broad-method combinations observed in modern APIs.

Implementation details

HTTP response fingerprints

nettacker/core/lib/http.py now records two extra response fields for HTTP responses:

  • content_length
  • content_sha1

These are exposed as normal YAML matchable response conditions. The YAML schema test was extended so module definitions can use those fields.

Missing header matching

Missing headers are now passed to regex matching as "". This keeps the matching model simple and makes absence explicit in YAML:

headers:
  X-Content-Type-Options:
    regex: ^$|^((?!nosniff).)+$
    reverse: false

Baseline comparison condition

A module can now opt into catch-all filtering with:

baseline_response:
  max_content_length_delta: 64

For such modules, Nettacker requests a random sibling path and compares the probe to that baseline. The condition passes only when at least one of these differs:

  • status code
  • content length beyond max_content_length_delta
  • SHA-1 of the response body

This keeps the feature opt-in so it only affects modules that are vulnerable to catch-all false positives.

Explicit port handling

Nettacker.expand_targets() now uses urllib.parse.urlsplit() for URL targets. It extracts:

  • normalized hostname for scan target grouping
  • explicit URL port into arguments.ports when -g/--ports was not separately supplied
  • explicit URL scheme into arguments.schema when --schema was not separately supplied
  • base path into url_base_path

If arguments.ports is present, the default service-discovery pre-pass is skipped because the user has already selected the port set to scan.

Files changed

Core behavior:

  • nettacker/core/lib/http.py
  • nettacker/core/app.py
  • nettacker/locale/en.yaml

Module definitions:

  • nettacker/modules/scan/dir.yaml
  • nettacker/modules/scan/admin.yaml
  • nettacker/modules/scan/pma.yaml
  • nettacker/modules/scan/waf.yaml
  • nettacker/modules/vuln/clickjacking.yaml
  • nettacker/modules/vuln/content_type_options.yaml
  • nettacker/modules/vuln/http_options_enabled.yaml
  • nettacker/modules/vuln/x_xss_protection.yaml
  • nettacker/modules/vuln/cors_misconfiguration.yaml

Tests:

  • tests/core/lib/test_http.py
  • tests/core/test_app_targets.py
  • tests/test_yaml_schema_and_regex.py

Compatibility notes

  • The baseline comparison is opt-in and only applied to modules that add baseline_response.
  • Existing header regex behavior still works; missing headers are simply represented as an empty string during matching.
  • The WAF module still contains the existing vendor-specific signatures. This PR removes only the generic status-code-delta fallback.
  • Explicit -g/--ports and explicit URL ports now bypass the service-discovery gate for downstream modules. Default scans without explicit ports still use the existing service-discovery pre-pass.

Validation

I ran the focused regression and schema checks in the repository virtualenv:

.venv/bin/python -m pytest -o addopts='' \
  tests/core/lib/test_http.py \
  tests/core/test_app_targets.py \
  tests/test_yaml_schema_and_regex.py -q

Result:

125 passed, 7 skipped, 2 warnings

I also ran Ruff on the changed Python files and new tests:

.venv/bin/python -m ruff check \
  nettacker/core/lib/http.py \
  nettacker/core/app.py \
  tests/core/lib/test_http.py \
  tests/core/test_app_targets.py

Result:

All checks passed!

And checked whitespace:

git diff --check

Result: clean.

Reviewer notes

The most important design question is whether baseline_response belongs in the generic HTTP matcher as implemented here, or whether maintainers would prefer a more explicit condition name or a module-level option. I kept it as an opt-in condition because it fits the existing YAML condition model and avoids changing unaffected modules.

A second review point is CORS severity. The new module currently treats credentialed cross-origin access and broad unsafe methods as reportable. If the project prefers more granular severities for CORS combinations, this module can be split into multiple YAML steps or separate modules.

Finally, the WAF change intentionally favors false-positive reduction over broad heuristic detection. A status-code delta alone is not strong enough evidence for "WAF detected" on modern apps; vendor/header/body signatures remain the safer path.

- Improved target expansion to extract ports and schemas from URLs.
- Added baseline response comparison for HTTP requests to detect changes.
- Introduced new CORS misconfiguration vulnerability detection module.
- Updated various YAML configurations to support new response conditions.
- Added unit tests for baseline response handling and target expansion logic.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4ce9e2d-6b47-4823-b62b-b19a525ca42d

📥 Commits

Reviewing files that changed from the base of the PR and between a2157ee and 770e256.

📒 Files selected for processing (15)
  • nettacker/core/app.py
  • nettacker/core/lib/http.py
  • nettacker/locale/en.yaml
  • nettacker/modules/scan/admin.yaml
  • nettacker/modules/scan/dir.yaml
  • nettacker/modules/scan/pma.yaml
  • nettacker/modules/scan/waf.yaml
  • nettacker/modules/vuln/clickjacking.yaml
  • nettacker/modules/vuln/content_type_options.yaml
  • nettacker/modules/vuln/cors_misconfiguration.yaml
  • nettacker/modules/vuln/http_options_enabled.yaml
  • nettacker/modules/vuln/x_xss_protection.yaml
  • tests/core/lib/test_http.py
  • tests/core/test_app_targets.py
  • tests/test_yaml_schema_and_regex.py

Summary by CodeRabbit

  • New Features

    • Added CORS misconfiguration vulnerability detection.
    • Introduced baseline response comparison for HTTP scanning to improve detection accuracy.
  • Improvements

    • Enhanced URL parsing to auto-extract ports, schemes, and base paths.
    • Strengthened security header checks (clickjacking, XSS protection, content-type options) to detect missing headers.
    • Updated WAF and admin directory scans with baseline response validation.
    • Refined HTTP port scanning orchestration logic.

Walkthrough

This PR introduces HTTP baseline response comparison for condition evaluation and URL-based target parsing with auto-detected ports/schemes. HTTP helpers compute content fingerprints and generate randomized baseline requests; YAML vulnerability and scan modules are configured with baseline response filtering. URL expansion now parses explicit scheme/port/path and toggles skip_service_discovery for port scanning.

Changes

HTTP Baseline Response Support

Layer / File(s) Summary
Baseline Response Helpers & Imports
nettacker/core/lib/http.py
Imports hashlib; defines SIMPLE_RESPONSE_CONDITIONS; adds _content_fingerprint(), _random_baseline_url(), _baseline_response_diff() helpers for baseline comparison.
Response Fingerprinting & Condition Matching
nettacker/core/lib/http.py
perform_request_action() computes content-length and content-sha1 fingerprints. response_conditions_matched() routes baseline_response conditions through diff helper.
HttpEngine Baseline Request Orchestration
nettacker/core/lib/http.py
HttpEngine.run() deep-copies baseline config, constructs randomized baseline requests, retries them, decodes content, and attaches result to primary response.
CORS Misconfiguration Vulnerability Module
nettacker/modules/vuln/cors_misconfiguration.yaml
New module detects unsafe CORS headers via GET/OPTIONS payloads sending Origin: https://evil.example; flags responses permitting untrusted origins or credentialed access.
Baseline Response Configuration in Scans
nettacker/modules/scan/admin.yaml, nettacker/modules/scan/dir.yaml, nettacker/modules/scan/pma.yaml
Admin, dir, and pma scans add baseline_response with max_content_length_delta: 64 to filter responses by content-length variance.
Header Regex Updates for Empty Values
nettacker/modules/vuln/clickjacking.yaml, nettacker/modules/vuln/content_type_options.yaml, nettacker/modules/vuln/x_xss_protection.yaml, nettacker/modules/vuln/http_options_enabled.yaml
Update header regexes to match empty values (^$|...) alongside existing patterns for absence detection.
WAF Scan Template & Response Logic
nettacker/modules/scan/waf.yaml
Request template changed to query-string format with {{query}} and {{param}} injection. Schemes/ports expanded to http+https/80+443. Response-matching logic simplified to direct condition blocks.
HTTP Condition Schema & Test Utilities
tests/test_yaml_schema_and_regex.py, tests/core/lib/test_http.py
Schema extended with content_length, content_sha1, baseline_response fields. Test helpers generate deterministic response dicts with SHA1 hashes.
Baseline Response Behavior Tests
tests/core/lib/test_http.py
Unit tests verify empty header absence matching, identical baseline suppression, content-sha1 diff detection, and HttpEngine baseline follow-up requests with URL rewriting.

URL-based Target Parsing & Port/Scheme Auto-detection

Layer / File(s) Summary
URL Parsing & Base-Path Extraction
nettacker/core/app.py
expand_targets() uses urlsplit() to parse http(s) URLs, extracts normalized hostname, derives base-path with trailing-slash normalization from parsed path.
Port & Scheme Auto-fill
nettacker/core/app.py
Post-processing propagates explicitly found URL ports and schemes into self.arguments.ports and self.arguments.schema when CLI options were not provided.
Port-Scan Orchestration with skip_service_discovery Toggle
nettacker/core/app.py
Port-scan branch explicitly sets skip_service_discovery to True, runs port-scan module, filters targets by results, then restores skip_service_discovery to False.
URL Parsing & Target Expansion Test
tests/core/test_app_targets.py
Test verifies URL http://jshop:3000/shop is expanded into hostname jshop, ports [3000], schema [http], skip_service_discovery True, url_base_path shop/.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • OWASP/Nettacker#1113: Updates skip_service_discovery handling and related scan orchestration logic in app.py.
  • OWASP/Nettacker#1319: Both extend the HTTP condition schema in tests/test_yaml_schema_and_regex.py for new condition field validation.

Suggested labels

enhancement

Suggested reviewers

  • arkid15r
  • securestep9
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Microsoft Presidio Analyzer (2.2.362)
nettacker/core/app.py

Microsoft Presidio Analyzer failed to scan this file

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR validation failed: No linked issue and no valid closing issue reference in PR description

@github-actions github-actions Bot closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant