Automatically dissects weaponized PDFs through 4 analysis layers to reconstruct complete attacker killchains
This framework provides comprehensive analysis of malicious PDF files through four distinct layers:
- Static Dissection: Rebuilds full PDF object dependency graph + decodes 27 nested stream encodings
- JavaScript Deobfuscation: 19 advanced techniques (AST parsing, deadcode removal, eval reconstruction)
- Dynamic Execution: Instrumented VMs with 132 Windows API hooks + memory forensics
- Exploit Chain Reconstruction: ML-powered mapping (CVEβAMSI BypassβReflective PEβC2 callback)
Output: Interactive exploit chain graphs, memory dumps, behavioral timelines, SIEM-ready JSON.
CORE: pikepdf, pdfminer.six, yara-python, pefile, capstone, unicorn-engine, lief, volatility3
ML: transformers(BERT), torch, scikit-learn
DYNAMIC: frida-tools, QEMU VM farm, python-socketio
CLI: typer, rich, asyncio, celery
VISUALIZATION: plotly, networkx
pdfexploitsforge/
βββ core/ # Analysis engines
β βββ static_dissection.py # Object graph + 27 decoder chains
β βββ js_deobfuscator.py # 19 AcroJS deobf techniques
β βββ payload_extractor.py # PE/ELF/shellcode extraction
β βββ exploit_classifier.py # ML chain reconstruction
βββ dynamic/ # VM orchestration
β βββ qemu_orchestrator.py # XP/Win7/Win10 VM farm
β βββ api_monitor.py # 132 Windows API hooks
β βββ memory_forensics.py # Volatility + YARA scans
βββ signatures/ # Detection rules
β βββ yara_rules/ # 1.2k PDF exploit signatures
β βββ regex_patterns.py # JS/PowerShell primitives
βββ ml_models/ # Trained models
β βββ js_malware_bert.pt # JavaScript classifier
β βββ rop_chain_detector.pt # ROP gadget chains
βββ visualizers/ # Attack graphs
β βββ exploit_graph.py # NetworkXβPlotly chains
β βββ object_dependency.py # PDF internal references
βββ output/ # Generated reports
β βββ chain_graph.html # Interactive visualization
β βββ memory.dmp # Volatility dumps
β βββ network_timeline.json # C2 behavioral data
βββ cli.py # Production CLI entrypoint
- Python 3.8+
- QEMU (for dynamic analysis)
- Volatility3
- YARA
git clone https://github.com/your-repo/pdfexploitsforge.git
cd pdfexploitsforge
pip install -r requirements.txt
pip install -e .docker build -t pdfexploitsforge .
docker run -v $(pwd)/samples:/samples pdfexploitsforge analyze /samples/malicious.pdf# Analyze single PDF
pdfexploitsforge analyze malicious.pdf
# With dynamic analysis
pdfexploitsforge analyze malicious.pdf --dynamic --vm-snapshot win7_sp1
# Batch processing
pdfexploitsforge batch ./pdf_samples/ --workers 4from pdfexploitsforge import StaticAnalyzer, JSDeobfuscator, ExploitClassifier
# Static analysis
analyzer = StaticAnalyzer()
results = analyzer.analyze("malicious.pdf")
# JavaScript deobfuscation
deobf = JSDeobfuscator()
js_results = deobf.process(results['javascript'])
# ML exploit classification
classifier = ExploitClassifier()
exploit_chain = classifier.reconstruct_chain(results, js_results, [])- PDF Object Graph: Complete dependency reconstruction
- 27 Stream Decoders: FlateDecode, ASCIIHex, ASCII85, LZW, etc.
- JavaScript Extraction: All embedded JS code with context
- Embedded File Detection: PE/ELF/Office docs
- YARA Integration: 1.2k PDF exploit signatures
- Unicode unescape sequences
- Hexadecimal string decoding
- Base64 string decoding
- URL encoding resolution
- String concatenation resolution
- Character code resolution
- Eval call reconstruction
- Function call resolution
- Dead code removal
- Array access resolution
- Object property access
- Mathematical operations
- Boolean operations
- Conditional expressions
- Loop unrolling
- Variable substitution
- String split/join operations
- Regex pattern resolution
- Escape sequence resolution
- VM Orchestration: XP/Win7/Win10 snapshots
- 132 API Hooks: Kernel32, Ntdll, Advapi32, User32, WinInet
- Memory Forensics: Volatility3 integration
- Network Monitoring: C2 communication detection
- File System Tracking: Creation/modification monitoring
- Registry Analysis: Persistence mechanism detection
- BERT JavaScript Classifier: Malware family identification
- ROP Chain Detector: Neural network-based detection
- CVE Mapping: Automatic vulnerability identification
- Attack Chain Reconstruction: Complete killchain mapping
- MITRE ATT&CK Integration: Technique classification
{
"pdf_file": "malicious.pdf",
"static_analysis": {
"structure": {...},
"javascript": [...],
"embedded_files": [...],
"yara_matches": [...]
},
"javascript_analysis": [...],
"payloads": [...],
"exploit_chain": {
"nodes": [...],
"edges": [...],
"cve_mappings": [...],
"attack_techniques": [...]
},
"dynamic_analysis": {
"api_calls": [...],
"network_activity": [...],
"file_changes": [...],
"memory_dump": {...}
}
}- CVE-2013-2729 (Adobe Reader JavaScript API)
- CVE-2010-0188 (Adobe Reader JBIG2)
- CVE-2009-0927 (Adobe Reader getAnnots)
- CVE-2008-2992 (Adobe Reader util.printf)
- And 50+ more PDF vulnerabilities
- Heap spraying
- ROP chain exploitation
- JavaScript API abuse
- Embedded executable deployment
- Reflective PE loading
- AMSI bypass techniques
- Process injection
- Persistence mechanisms
# Run unit tests
python -m pytest tests/
# Test with sample PDFs
python -m pytest tests/test_samples.py
# Performance benchmarks
python -m pytest tests/test_performance.py --benchmark- Static Analysis: ~2-5 seconds per PDF
- JavaScript Deobfuscation: ~1-3 seconds per script
- Dynamic Analysis: ~5-10 minutes per PDF (VM dependent)
- ML Classification: ~0.5-1 second per sample
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Adobe Security Research Team
- YARA Project
- Volatility Foundation
- MITRE ATT&CK Framework
- PDF Association
- Issues: GitHub Issues
- Documentation: Wiki
- Discussions: GitHub Discussions
