PySemBridge is an experimental framework for representing and compiling Python dynamic semantic bridges for static taint analysis.
The goal is not to let an LLM directly decide whether a vulnerability exists. Instead:
LLM/code analysis proposes Semantic Bridge IR
-> analyzer adapter compiles IR into facts/models/rules
-> the static analyzer re-runs taint propagation
-> complete source-to-sink traces validate the bridge
YASA is the primary backend for the first implementation stage. CodeQL, Pysa, and Semgrep adapters can be added as projection backends later.
pysembridge/
ir/ Semantic Bridge JSON schema and loader
adapters/yasa/ YASA external facts compiler
recognizer/ Dynamic semantic gap recognizers
synthesizer/ LLM-assisted bridge generation components
pipeline/ End-to-end analyzer runners
verifier/ Trace and safe-variant validation components
bridges/ Per-CVE Semantic Bridge IR files
benchmarks/py-bench/ Six Python CVE benchmark projects
experiments/ Reproducible scripts and ignored generated outputs
integrations/yasa/ Integrated YASA-sembridge engine copy
docs/ Tool guide, workflow, and repository structure
Install in editable mode:
python3 -m pip install -e .Scan any Python project for candidate dynamic semantic gaps:
python3 -m pysembridge.cli scan-gaps \
--project /path/to/python/project \
--project-name my-project \
--output experiments/results/my-project.gap-candidates.json \
--include-featuresGenerate a generic candidate Semantic Bridge IR directly from source:
python3 -m pysembridge.cli synthesize-generic-bridge \
--project /path/to/python/project \
--project-name my-project \
--output experiments/results/my-project.generic-bridge.jsonThis source-only mode does not require a CVE manifest or a known broken trace.
It classifies potential Python dynamic semantics such as receiver dispatch,
container element propagation, string construction, attribute indirection,
dynamic calls, descriptors, metaclasses, context managers, iterators,
registered dispatch, async/concurrency scheduling, dynamic code execution, and
typing-model boundaries. The generated bridge is a candidate semantic
hypothesis; analyzer verification is still required before treating it as an
executable vulnerability chain. See docs/recognizer-dynamic-features.md for
the current recognizer coverage.
Compile the included pyload bridge into YASA external facts:
pysembridge compile-yasa \
--bridge bridges/cve-2025-55156-pyload/bridge.json \
--output experiments/results/cve-2025-55156-pyload.yasa-facts.jsonWithout installing the console script:
python3 -m pysembridge.cli compile-yasa \
--bridge bridges/cve-2025-55156-pyload/bridge.json \
--output experiments/results/cve-2025-55156-pyload.yasa-facts.jsonEquivalent script:
bash experiments/scripts/compile_pyload_yasa.shRun the current end-to-end YASA-sembridge pipeline:
python3 -m pysembridge.cli run-yasa \
--project /home/ubuntu/llm-yasa-repair/py-bench/cve-2025-55156-pyload \
--project-name cve-2025-55156-pyload \
--output-dir experiments/results/tool-pipeline/cve-2025-55156-pyload \
--yasa-dir /home/ubuntu/llm-yasa-repair/YASA-Engine-sembridge \
--rule-config /home/ubuntu/llm-yasa-repair/py-result/tool-rules/yasa/cve-2025-55156-pyload-precise.json \
--source url \
--sink self.c.execute.arg0 \
--expected-sink self.c.execute \
--expected-trace-contains file_database.py \
--expected-trace-contains statusesThe command performs:
AST feature recognition
-> semantic gap classification
-> bridge synthesis
-> bridge verification
-> YASA facts compilation
-> YASA-sembridge scan
-> SARIF complete-trace verification
This repository currently contains:
- Tool-independent Semantic Bridge IR schema.
- AST-based semantic gap recognizer covering major Python dynamic feature families.
- Source-only
scan-gapscandidate generation for arbitrary Python projects. - Generic candidate Semantic Bridge IR synthesis for recognized dynamic feature families.
- Generic auto synthesis pipeline plus one executable pyload-like synthesizer.
- YASA facts compiler.
- YASA-sembridge end-to-end pipeline runner.
- Bridge and SARIF trace verifiers.
The current recognizer groups feature hits into ten broad semantic gap families;
see docs/recognizer-dynamic-features.md for the maintained coverage list.
The current YASA integration uses report-level completion: YASA emits a baseline boundary finding, then PySemBridge facts append an enhanced complete-chain SARIF finding. Analyzer-level propagation injection is the next deeper integration stage.
The repository includes lightweight regression tests that can be run without installing the optional YASA backend:
python3 -m unittest discover -s tests -vCurrent expected result:
Ran 5 tests
OK
The tests cover Semantic Bridge IR schema validation and AST recognizer coverage for representative dynamic Python features. The CLI entry point can also be checked directly:
python3 -m pysembridge.cli --helpFor analyzer-level experiments, run-yasa records generated bridge files, YASA
facts, SARIF output, and pipeline summaries under experiments/results/.
Generated results are intentionally kept out of version control.
Before changing the recognizer, synthesizer, or IR loader, run the regression suite and make sure the CLI still loads:
python3 -m unittest discover -s tests -v
python3 -m pysembridge.cli --helpThe tests are intentionally small and fast. They check the core IR validation
path and representative AST feature extraction behavior; backend-specific YASA
runs are kept as reproducible scripts and generated reports under
experiments/.
See docs/repository-structure.md for the repository layout,
docs/recognizer-dynamic-features.md for AST recognizer coverage,
docs/tool-development-flow-yasa.md for the complete PySemBridge workflow,
docs/security-test-report.md for the security test record, and
docs/yasa-sembridge-tool-guide.md for the integrated YASA-sembridge usage
guide. The full modified YASA engine is checked in under
integrations/yasa/YASA-Engine-sembridge/.