Skip to content

[BUG] domain Rule Matches Too Broadly #449

@LoneOdDaeth

Description

@LoneOdDaeth

Describe the bug

The domain rule in the YARA ruleset matches unintended strings that are not actual domains. This leads to false positives when scanning files that contain generic words, filenames, or localhost-like addresses.

To Reproduce

Steps to reproduce the behavior:

Run YARA scan with the domain rule enabled.

Scan a file that contains common words, filenames, or IP addresses.

Observe that many non-domain strings are detected.

Example false positives:

test-123
file.txt
localhost
random_text

All these strings are incorrectly flagged as domains.

Expected behavior

The domain rule should only match valid domains, such as example.com, sub.example.net, or test-site.org. It should not match:

Plain text words

Filenames like file.txt

Localhost or internal references

Additional context

The issue is caused by the overly broad regex pattern:

$domain_regex = /([\w.-]+)/ wide ascii

This matches any word that includes dots, hyphens, or alphanumeric characters, leading to many false positives.

Suggested Fix: Update the regex to a stricter pattern that ensures a valid TLD is present:

$domain_regex = /([a-zA-Z0-9-]+.[a-zA-Z]{2,6})/ wide ascii

This ensures only real domains are detected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions