Skip to content

feat: add rule category filtering#1500

Merged
egibs merged 3 commits intochainguard-dev:mainfrom
egibs:rule-categories
May 4, 2026
Merged

feat: add rule category filtering#1500
egibs merged 3 commits intochainguard-dev:mainfrom
egibs:rule-categories

Conversation

@egibs
Copy link
Copy Markdown
Member

@egibs egibs commented May 4, 2026

Closes: #1388

This PR adds a new --rule-category flag which will filter out unlisted behaviors. This is similar to the existing diff sensitivity flag but we instead operate on rule strings instead of integers while also allowing both flags to be specified when running diffs.

e.g., given this contrived example:

# /tmp/mal-demo/mixed.py
import os, urllib.request, base64, socket
def beacon():
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(("example.com", 4444))
    urllib.request.urlopen("http://example.com/x")
def cleanup():
    os.remove("/tmp/notes")
    os.system("rm -rf ~/.cache")
def stash():
    return base64.b64decode("aGVsbG8=")

# /tmp/mal-demo/before.py
import os
def beacon(): pass
def cleanup(): os.remove("/tmp/notes")

the new flag produces:

$ mal --min-risk=any analyze /tmp/mal-demo/mixed.py
🔎 Scanning "/tmp/mal-demo/mixed.py"
├─ 🟡 /tmp/mal-demo/mixed.py [MEDIUM]
│     ≡ data [MEDIUM]
│       🟡 base64/decode — decode base64 strings: base64_decode::b64decode
│       🔵 encoding/base64 — Supports base64 encoded strings
│     ≡ execution [MEDIUM]
│       🔵 imports/python — imports python modules: import base64, import socket, import urllib, import os
│       🟡 program — execute external program: os.system("rm -rf ~/.cache")
│     ≡ filesystem [MEDIUM]
│       🔵 file/delete — deletes files: os.remove(
│       🟡 file/delete_forcibly — Forcibly deletes files recursively: rm -rf
│       🟡 path/tmp — path reference within /tmp: /tmp/notes
│     ≡ networking [MEDIUM]
│       🔵 http — Uses the HTTP protocol: http
│       🟡 socket/connect — initiate a connection on a socket: socket.socket, .connect(
│       🔵 url/embedded — contains embedded HTTP URLs: http://example.com/x
│       🔵 url/parse — Handles URL strings: urllib
│       🟡 url/request — requests resources via URL: urllib.request

$ mal --min-risk=any --rule-category=net analyze /tmp/mal-demo/mixed.py
🔎 Scanning "/tmp/mal-demo/mixed.py"
├─ 🟡 /tmp/mal-demo/mixed.py [MEDIUM]
│     ≡ networking [MEDIUM]
│       🔵 http — Uses the HTTP protocol: http
│       🟡 socket/connect — initiate a connection on a socket: socket.socket, .connect(
│       🔵 url/embedded — contains embedded HTTP URLs: http://example.com/x
│       🔵 url/parse — Handles URL strings: urllib
│       🟡 url/request — requests resources via URL: urllib.request

Multiple categories (union)

$ mal --min-risk=any --rule-category=net --rule-category=data analyze /tmp/mal-demo/mixed.py
🔎 Scanning "/tmp/mal-demo/mixed.py"
├─ 🟡 /tmp/mal-demo/mixed.py [MEDIUM]
│     ≡ data [MEDIUM]
│       🟡 base64/decode — decode base64 strings: base64_decode::b64decode
│       🔵 encoding/base64 — Supports base64 encoded strings
│     ≡ networking [MEDIUM]
│       🔵 http — Uses the HTTP protocol: http
│       🟡 socket/connect — initiate a connection on a socket: socket.socket, .connect(
│       🔵 url/embedded — contains embedded HTTP URLs: http://example.com/x
│       🔵 url/parse — Handles URL strings: urllib
│       🟡 url/request — requests resources via URL: urllib.request

Deeper-prefix category (boundary check)

fs/file/delete matches the exact rule but excludes fs/file/delete_forcibly:

$ mal --min-risk=any --rule-category=fs/file/delete analyze /tmp/mal-demo/mixed.py
🔎 Scanning "/tmp/mal-demo/mixed.py"
├─ 🟡 /tmp/mal-demo/mixed.py [MEDIUM]
│     ≡ filesystem [LOW]
│       🔵 file/delete — deletes files: os.remove(

Composes with --min-risk

$ mal --min-risk=medium --rule-category=net analyze /tmp/mal-demo/mixed.py
🔎 Scanning "/tmp/mal-demo/mixed.py"
├─ 🟡 /tmp/mal-demo/mixed.py [MEDIUM]
│     ≡ networking [MEDIUM]
│       🟡 socket/connect — initiate a connection on a socket: socket.socket, .connect(
│       🟡 url/request — requests resources via URL: urllib.request

diff — filter narrows the diff

$ mal --min-risk=any diff /tmp/mal-demo/before.py /tmp/mal-demo/mixed.py
├─ 🟡 Changed (9 added, 0 removed): /tmp/mal-demo/mixed.py
│     ≡ data [MEDIUM]
│+      🟡 base64/decode — decode base64 strings: base64_decode::b64decode
│+      🔵 encoding/base64 — Supports base64 encoded strings
│     ≡ execution [MEDIUM]
│+      🟡 program — execute external program: os.system("rm -rf ~/.cache")
│     ≡ filesystem [MEDIUM]
│+      🟡 file/delete_forcibly — Forcibly deletes files recursively: rm -rf
│     ≡ networking [MEDIUM]
│+      🔵 http — Uses the HTTP protocol: http
│+      🟡 socket/connect — initiate a connection on a socket: socket.socket, .connect(
│+      🔵 url/embedded — contains embedded HTTP URLs: http://example.com/x
│+      🔵 url/parse — Handles URL strings: urllib
│+      🟡 url/request — requests resources via URL: urllib.request

$ mal --min-risk=any --rule-category=net diff /tmp/mal-demo/before.py /tmp/mal-demo/mixed.py
├─ 🟡 Changed (5 added, 0 removed): /tmp/mal-demo/mixed.py
│     ≡ networking [MEDIUM]
│+      🔵 http — Uses the HTTP protocol: http
│+      🟡 socket/connect — initiate a connection on a socket: socket.socket, .connect(
│+      🔵 url/embedded — contains embedded HTTP URLs: http://example.com/x
│+      🔵 url/parse — Handles URL strings: urllib
│+      🟡 url/request — requests resources via URL: urllib.request

The accompanying tests capture both positive and negative test cases.

@egibs egibs requested a review from stevebeattie May 4, 2026 13:53
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs force-pushed the rule-categories branch from 16e67cd to fc23002 Compare May 4, 2026 13:54
egibs added 2 commits May 4, 2026 08:54
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs enabled auto-merge (squash) May 4, 2026 14:35
@egibs egibs requested review from antitree May 4, 2026 14:47
@egibs egibs merged commit ba96e19 into chainguard-dev:main May 4, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run scan using only specific category of yara rules

2 participants