Preserve in-scope shared-infra graph edges#3141
Merged
Merged
Conversation
Contributor
📊 Performance Benchmark Report
📈 Detailed Results (All Benchmarks)
🎯 Performance Summary✅ No significant performance changes detected (all changes <10%) 🐍 Python Version 3.11.15 |
Re-emit in-scope shared-infra child events as graph-important so neo4j/json keep every parent->child edge. Route graph-important events only to modules that consume them (preserve_graph or accept_dupes), so normal scan modules see no extra churn.
children_emitted is parent-less (#3126), so the in-scope re-emit also fired on same-host re-processing (SRV/wildcard chains), over-emitting graph-important events. Track (parent, rdtype, child) for in-scope edges so only genuinely new cross-parent edges re-emit.
asdf.blacklanternsecurity.com is in-scope shared infra (SRV target of two in-scope _ldap records); both cross-parent edges are now preserved, so it emits two DNS_NAME events instead of one.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dns-children-dedup-key #3141 +/- ##
======================================================
+ Coverage 90% 90% +1%
======================================================
Files 441 441
Lines 38855 38920 +65
======================================================
+ Hits 34768 34833 +65
Misses 4087 4087 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
emit_dns_children re-emits cross-parent edges for in-scope children as graph-important so neo4j/json keep every edge. Skip that when the child type is omitted: it would be dropped at output anyway, and flagging it breaks the rule that a graph-important event is never omitted. Also only compute the scope check and parent-aware edge hash for genuinely new children, so already-seen out-of-scope dups don't pay for a scope lookup on every occurrence.
ausmaster
approved these changes
Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on #3126. That PR dedups DNS children by
(rdtype, child), which kills the cross-parent flood from out-of-scope shared infrastructure (e.g. one Cloudflare nameserver emitted 1,518x across in-scope zones) and wins back queue throughput.The tradeoff it accepts: the dedup is scan-global, so it also collapses in-scope shared infrastructure (a mail server / nameserver / CNAME target that several in-scope domains point at). The graph output then links that shared host to only the first-seen parent, dropping the rest of the real
(parent -> child)edges. In the motivating scan ~160 such in-scope edges were lost (vs ~16,155 out-of-scope edges correctly collapsed).Fix
dnsresolve.emit_dns_children: on a dedup hit, re-emit the cross-parent edge as_graph_importantonly when the child is in-scope. In-scope shared infra keeps every edge; out-of-scope affiliate dups fall through and stay collapsed (the flood is never re-emitted).dnsresolve.handle_event: skip child re-walking for graph-important re-entries so the re-emit doesn't cascade. Resolution still runs (cache hit), so nodedns_childrenstays intact and neo4j node properties aren't clobbered.ScanEgress.forward_event: route graph-important events only to modules that consume them (preserve_graphoraccept_dupes), skipping modules that would just drop them at postcheck. Normal scan modules see zero extra events, so dedup dns children by (rdtype, child) not parent host #3126's churn win is preserved; this also makes the existing orphan-resurrection path cheaper.Tests
TestDNSResolveInScopeSharedInfraGraphFidelity(new): three in-scope parents share two in-scope nameservers; asserts every parent->child edge survives in the realoutput.jsongraph artifact (the same edges neo4j builds).TestDNSResolveSharedNameserverDedup(out-of-scope flood collapses to one) stays green. Together they pin the intended nuance: collapse the affiliate noise, preserve the in-scope edges.