Skip to content

Intern repeated strings in resolved_hosts and dns_children#3006

Merged
TheTechromancer merged 1 commit intoadditional-memory-benchmarksfrom
additional-string-interning
Apr 1, 2026
Merged

Intern repeated strings in resolved_hosts and dns_children#3006
TheTechromancer merged 1 commit intoadditional-memory-benchmarksfrom
additional-string-interning

Conversation

@liquidsec
Copy link
Copy Markdown
Contributor

Summary

IP addresses and DNS record type strings (A, AAAA, CNAME, etc.) repeat heavily across events in a scan. In a typical subdomain enumeration, thousands of events resolve to the same handful of CDN/cloud IPs, and every event carries its own copy of those strings.

sys.intern() deduplicates them so all events sharing the same IPs or rdtype keys reference a single string object. This reduces memory ~10-30% on those fields depending on how much IP overlap exists in the scan.

Changes

  • dnsresolve.py — intern rdtype keys and host values in dns_children and resolved_hosts
  • httpx.py — intern IPs added to _resolved_hosts
  • gowitness.py — intern IPs added to _resolved_hosts

IP addresses and DNS record type strings (A, AAAA, CNAME, etc.)
repeat heavily across events. sys.intern() deduplicates them so
all events sharing the same IPs/rdtypes reference the same string
object, reducing memory ~10-30% on those fields.
@github-actions
Copy link
Copy Markdown
Contributor

📊 Performance Benchmark Report

Comparing additional-memory-benchmarks (baseline) vs additional-string-interning (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 3.93ms 3.93ms -0.0%
Bloom Filter Large Scale Dns Brute Force 17.57ms 17.44ms -0.7%
Large Closest Match Lookup 334.43ms 324.15ms -3.1%
Realistic Closest Match Workload 176.03ms 173.53ms -1.4%
Event Memory Medium Scan 1776 B/event 1776 B/event +0.0%
Event Memory Large Scan 1760 B/event 1760 B/event +0.0%
Event Validation Full Scan Startup Small Batch 378.94ms 369.96ms -2.4%
Event Validation Full Scan Startup Large Batch 521.73ms 526.21ms +0.9%
Make Event Autodetection Small 25.87ms 25.95ms +0.3%
Make Event Autodetection Large 264.65ms 264.51ms -0.1%
Make Event Explicit Types 11.47ms 11.43ms -0.3%
Excavate Single Thread Small 3.939s 3.354s -14.8% 🟢🟢 🚀
Excavate Single Thread Large 9.975s 9.255s -7.2%
Excavate Parallel Tasks Small 4.301s 3.591s -16.5% 🟢🟢 🚀
Excavate Parallel Tasks Large 7.774s 6.999s -10.0%
Is Ip Performance 2.91ms 2.94ms +0.8%
Make Ip Type Performance 10.69ms 10.67ms -0.2%
Mixed Ip Operations 4.17ms 4.19ms +0.4%
Memory Use Web Crawl 257.3 MB 257.4 MB +0.0%
Memory Use Subdomain Enum 19.3 MB 19.3 MB +0.0%
Typical Queue Shuffle 54.18µs 55.68µs +2.8%
Priority Queue Shuffle 599.29µs 608.09µs +1.5%

🎯 Performance Summary

+ 2 improvements 🚀
  20 unchanged ✅

🔍 Significant Changes (>10%)

  • Excavate Single Thread Small: 14.8% 🚀 faster
  • Excavate Parallel Tasks Small: 16.5% 🚀 faster

🐍 Python Version 3.11.15

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91%. Comparing base (590e979) to head (c6d6ac4).
⚠️ Report is 2 commits behind head on additional-memory-benchmarks.

Additional details and impacted files
@@                     Coverage Diff                      @@
##           additional-memory-benchmarks   #3006   +/-   ##
============================================================
- Coverage                            91%     91%   -0%     
============================================================
  Files                               437     437           
  Lines                             37102   37108    +6     
============================================================
+ Hits                              33694   33696    +2     
- Misses                             3408    3412    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheTechromancer TheTechromancer merged commit b5816e4 into additional-memory-benchmarks Apr 1, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants