EN: One Sentence. Any Structured Deliverable. ZH: 一句话核心输入,生成任意专业文本交付物。
Evidence-gated generation and revision infrastructure for high-stakes text deliverables.
For operators and content owners, the outcome is simple:
- no speculative edits in final documents;
- only material revisions go live (data, metrics, key terms, risk language);
- every updated answer is source-traceable;
- delivery remains in native
.docxwith tracked changes.
For builders and integrators, the system is explicit and deterministic:
Evidence Gate: hard fail on required-source misses.MECE Decomposition: claim/sub-question split before revision.DOCX Revision Engine: tracked changes viaw:del+w:ins.Audit Contracts: fixed artifacts for gate report, change audit, and Q-source map.Run Governance: isolated run directories, manifest logging, and retention controls.
This product intentionally does not do:
- prose polishing;
- cosmetic rewrites;
- unsupported factual expansion.
- Do not guess.
- Evidence first, revision second.
- If evidence is missing, explicitly write:
not available in currently verifiable fulltext.
- New data appears or existing data changes.
- Key metrics, thresholds, or definitions change.
- Official announcements or regulatory updates change conclusions.
- Critical keywords, terms, or framing change.
- Material risk language or scope constraints change.
- Expansion for style.
- Cosmetic rewriting.
- Synonym swaps that do not change facts.
- Legal/Compliance: regulatory FAQs, contract Q&A, filing/review Q&A, policy interpretation notes.
- Consulting/Enterprise: diligence FAQs, bid Q&A, management Q&A, external messaging FAQs.
- Medical/Research: paper FAQs, reviewer response Q&A, clinical/regulatory Q&A.
- IR/Public Affairs: earnings Q&A, risk disclosure Q&A, public response FAQs.
- Tech/Operations: product compliance FAQs, security FAQs, SOP Q&A.
Primary output format: .docx with tracked changes.
Evidence inputs: verifiable fulltext from announcements, PDFs, papers, posters, and similar sources.
- Define problem and scope: clarify user intent, audience, time anchor, and no-change boundaries.
- Decompose with MECE: split each target question into mutually exclusive and collectively exhaustive sub-questions.
- Run source gate: verify required sources and fulltext evidence for each sub-question.
- Decide revisions: revise only targets with sufficient evidence.
- Write DOCX changes: apply tracked changes (
w:del+w:ins) and preserve source footnotes. - Export audit trail: generate source gate report and full Q-to-source mapping.
Requirements:
- Python 3.11 runtime for full parser stack (PPT/PDF/DOCX/image OCR)
One-time runtime setup (installs compatible Python + parser dependencies):
bash scripts/setup_runtime_py311.shRecommended entrypoint (run-scoped governance):
.venv311/bin/python scripts/run_revise_pipeline_v2.py \
--input-docx "/absolute/path/to/original.docx" \
--patch-spec "config/revision_patch_spec_template.json"This automatically runs:
- source gate check
- DOCX revision
- Q-source map export
- manifest writing and run index update
Revision plans are supplied via JSON patch spec:
- template:
config/revision_patch_spec_template.json - each patch must include anchor, replacement, reason, and source footnote refs.
Source gate configuration:
- default config path:
config/revise_sources.json - define at least one
required_sourcesentry (empty required sources are treated as gate failure). - supported local source types:
local_pdf,local_docx,local_pptx,local_image - optional source fields:
must_include_any,location_hints,extract_mode,ocr_mode - image OCR in
ocr_mode=dualattempts both PaddleOCR and EasyOCR (attempt trace is written toextraction_detail).
Runtime selection:
- pipeline scripts prefer
.venv311/bin/pythonautomatically when present. - override explicitly with env var
REVISE_RUNTIME_PYTHON=/abs/path/to/python.
SOP claim-level gate (recommended before revision):
.venv311/bin/python scripts/check_revision_sop.py \
--claim-spec "config/revision_claim_spec_template.json" \
--gate-report "/absolute/path/to/source_gate_report.json" \
--output-csv "/absolute/path/to/sop_claim_matrix.csv"If your network requires enterprise root certificates, provide a CA bundle:
.venv311/bin/python scripts/run_revise_pipeline_v2.py \
--input-docx "/absolute/path/to/original.docx" \
--ca-bundle "/absolute/path/to/corp_root_ca.pem"Diagnostic-only switch (not recommended for normal use):
--allow-insecure-tls
Each run writes into: runs/<run_id>/
Core artifacts:
source_gate_report_<run_id>.jsonrevision_change_audit_<run_id>.csvq_source_map_<run_id>.csvrevised_<run_id>.docxrevise_sync_manifest_<run_id>.tsvdeleted_docx_manifest_<run_id>.tsvartifact_manifest_<run_id>.tsv
Global index:
reports/run_index.tsv
| Path | Purpose |
|---|---|
scripts/revise_docx.py |
Main DOCX reviser (tracked changes + footnotes) |
scripts/check_revise_sources.py |
Source gate checker (required/optional checks) |
scripts/evidence_extractors.py |
Multi-format local evidence extraction (PDF/DOCX/PPTX/image) |
scripts/check_revision_sop.py |
Claim-level SOP gate (material + confidence checks) |
scripts/run_revise_pipeline.py |
Legacy pipeline entrypoint (explicit in/out paths) |
scripts/run_revise_pipeline_v2.py |
Recommended entrypoint (run_id dirs, manifests, index) |
scripts/build_q_source_map.py |
Export full Q-to-source CSV |
scripts/query_q_source.py |
Query sources for one question |
scripts/update_run_index.py |
Update reports/run_index.tsv |
scripts/housekeeping.py |
Hot/cold retention and cleanup |
config/revise_sources.json |
Source gate rules |
config/revision_patch_spec_template.json |
Generic revision patch spec template |
config/revision_claim_spec_template.json |
Claim-level SOP gate template |
config/source_registry.yaml |
Source registry snapshot |
docs/SOP_endpoint_extraction_standard.md |
SOP baseline |
- Fulltext-first.
- Abstract-only evidence is insufficient for core claim revisions.
- Any required-source failure blocks revision by default.
- Every change must be auditable, traceable, and reviewable.