fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts#120
Merged
mayakost merged 3 commits intofeat/diagram-skillfrom Apr 7, 2026
Merged
Conversation
…ook scripts Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py. This prevents path traversal sequences (e.g., "../../etc/passwd.drawio") in hook input from escaping the expected directory. Key ordering: symlink check runs on the original unresolved path (before resolve(), which follows symlinks and would make is_symlink() always False), then resolve() canonicalizes, then all subsequent file I/O uses only the resolved `path` object — never the raw `file_path`. Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cc60c6b to
eceb39a
Compare
- Canonicalize file_path with Path.resolve() before file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal (CWE-22) from hook input - Move symlink check before resolve() (resolve follows symlinks, so is_symlink() on resolved path is always False) - Use resolved `path` object for all post-resolve file I/O - Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep) Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fb523ef to
5728d7d
Compare
A symlink .drawio file is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file." Exit 1 gives CLI callers and CI an honest signal that the file was refused. The hook harness (validate-drawio.sh) uses || true so this is safe there. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
krokoko
approved these changes
Apr 7, 2026
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 7, 2026
* add diagram skill * fix: resolve CI failures — use defusedxml for safe XML parsing, fix markdown lint Replace xml.etree.ElementTree with defusedxml.ElementTree in all Python scripts to address Bandit B314/B405 and Semgrep XXE findings. Add nosec suppression to subprocess import in drawio_url.py (browser URL opener only). Fix 106 markdownlint errors (MD022/MD031/MD032 blank lines, MD001 heading levels) across 7 reference files and SKILL.md. Trim SKILL.md from 515 to 495 lines to satisfy the 500-line SKILL001 limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add architecture diagram skill * fix: split SKILL.md into references to pass skill-length lint rule Move detailed XML generation rules and layout guidelines from SKILL.md (494 lines) into references/xml-generation-rules.md and references/layout-guidelines.md. SKILL.md is now 219 lines, well under the 300-line warning threshold enforced by SKILL001. * fix: apply dprint formatting to pass fmt:check Run dprint fmt on 7 files (JSON and Markdown) that were not conforming to the project's formatting rules. * fix(diagram-skill): remove defusedxml dependency, fix code quality issues - Replace defusedxml with stdlib xml.etree.ElementTree in all 5 Python scripts and validate-drawio.sh (XML is self-generated, no XXE risk) - Remove pip3 auto-install from PostToolUse hook - Remove unused import sys in fix_step_badges.py - Remove unused style variable in post_process_drawio.py fix_legend_size - Add explanatory comment to except block in post_process_drawio.py * refactor(diagram-skill): consolidate references, delete outdated examples Delete 5 files (-2039 lines): - example-diagrams.md (803 lines): outdated XML examples conflicting with .drawio refs - aws-diagram-guidelines.md (536 lines): internal wiki copy, all rules in style-guide.md - xml-structure.md (209 lines): merged unique parts into group-styles.md + layout-guidelines.md - xml-generation-rules.md (186 lines): trimmed and renamed to xml-rules.md (89 lines) - aws4-shapes.md (231 lines): split into aws4-shapes-services.md (69) + aws4-shapes-resources.md (100) All reference files now under 100-line DESIGN_GUIDELINES limit. All shape names preserved in the split files (zero shape deletions). All cross-references updated to point to new file names. * refactor(diagram-skill): restructure SKILL.md for progressive disclosure - Remove eager "Study" paragraph that loaded ~135k tokens upfront - Move reference loading into Step 3 where references are actually needed - Add example selection table: agent picks 1-2 most relevant .drawio examples per diagram type (reduces ~95k to ~10-18k tokens) - Update cross-references to renamed files (xml-rules.md, aws4-shapes-services.md) - SKILL.md: 226 lines (under 300 limit) * fix(diagram-skill): remove white background and edge label backgrounds for dark mode - Remove background="#FFFFFF" from mxGraphModel XML template in SKILL.md - Change rule from "ALWAYS set background=#FFFFFF" to "NEVER set background" - Change edge label rule from labelBackgroundColor=#ffffff to labelBackgroundColor=none in layout-guidelines.md and xml-rules.md - Both white backgrounds break dark mode adaptive contrast * feat(diagram-skill): support non-AWS architecture diagrams - New references/general-icons.md: maps 40+ non-AWS technologies (PostgreSQL, Docker, Kubernetes, CoreML, HuggingFace, GitHub, etc.) to draw.io shapes with category-appropriate tint colors - Mode A codebase analysis now detects non-AWS tech when no AWS infrastructure files are found (Dockerfiles, DB configs, ML frameworks) - Mixed architectures supported: AWS icons for AWS services, general icons for non-AWS, same layout rules for both - Step 3 loads general-icons.md only when non-AWS services are present * fix(diagram-skill): restore defusedxml, add README, fix markdown lint - Restore defusedxml.ElementTree imports (CI scanners require it) - Replace silent pip3 auto-install with informative error message when defusedxml is missing — no more modifying user's environment - Add scripts/requirements.txt declaring defusedxml>=0.7.1 - Create plugin README.md per DESIGN_GUIDELINES (prerequisites, skills, MCP servers, installation, examples) - Document defusedxml dependency in post-processing.md - Fix 20 markdownlint errors (MD022, MD029, MD031, MD032, MD060) - Apply dprint formatting to table separators * fix(diagram-skill): apply dprint formatting to group-styles.md table * fix(diagram-skill): align validator with skill rules, fix template and post-processor bugs - Reject hardcoded background attribute on mxGraphModel (breaks dark mode) - Reject bare <mxGraphModel> without <mxfile> wrapper per skill contract - Lower font size threshold from 11px to 10px to match skill's font hierarchy - Remove stale fontColor=#232F3E warning (redundant with style guide) - Fix legend-panel parent ID typo in xml-templates-structure.md - Fix .upper() bug in badge skip condition (dead code) - Replace blunt "Do NOT read .drawio" rule with Reference Priority section - Remove duplicate developer_tools entry in aws4-shapes.json that shadowed xray * fix(diagram-skill): harden scripts against injection, error suppression, and resource exhaustion - Replace JSON escaping fallback with safe static message (CWE-838) - Use importlib.util for sibling imports instead of sys.path manipulation (CWE-426) - Fix defusedxml ET.indent() bug by using stdlib for write-time formatting - Replace bare except with specific exception types in hook JSON parser (CWE-396) - Add 2MB file size guard to URL generator before reading into memory (CWE-400) * fix(diagram-skill): add nosec/nosemgrep suppression for stdlib indent() import The xml.etree.ElementTree import is only used for indent() (write-time formatting), not XML parsing. All parsing uses defusedxml. Add inline suppression for Bandit B405 and Semgrep XXE rules to match the existing pattern in drawio_url.py. * fix(diagram-skill): use bare nosemgrep suppression for stdlib indent() import The specific rule ID nosemgrep didn't match because Semgrep uses a composite rule ID. Bare # nosemgrep suppresses all rules on the line. The justification is in the comment: only used for indent(), not parsing. * fix(diagram-skill): use full semgrep rule IDs for nosemgrep suppression * fix(diagram-skill): remove stdlib xml.etree.ElementTree import entirely Drop the indent() call and StdET import that triggered Bandit B405 and Semgrep XXE rules. XML indentation is cosmetic — draw.io and agents handle non-indented XML fine. Added a comment noting that a custom indent helper can be added later if human-readable XML is needed. * fix(diagram-skill): sanitize XML attributes in validator error messages Prevent prompt injection via crafted draw.io cell attributes that flow through the PostToolUse hook into agent systemMessage context. Adds _sanitize_attr() which strips control characters, collapses whitespace, restricts to safe characters, and truncates at 80 chars. Applied to all user-controlled attribute values in error/warning messages. * fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage (#119) Follow-on security hardening for the architecture diagram skill (PR#103): - Add 2MB file size checks before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py guard) - Add iterparse-based XML depth (50) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files - Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow write attacks - Wrap fixer pipelines in top-level try/except to prevent unhandled exception tracebacks from leaking file paths and source code into the hook systemMessage (stderr captured via 2>&1) - Add timeout wrappers (10s/5s) around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts (#120) * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py. This prevents path traversal sequences (e.g., "../../etc/passwd.drawio") in hook input from escaping the expected directory. Key ordering: symlink check runs on the original unresolved path (before resolve(), which follows symlinks and would make is_symlink() always False), then resolve() canonicalizes, then all subsequent file I/O uses only the resolved `path` object — never the raw `file_path`. Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths and fix nosemgrep rule IDs - Canonicalize file_path with Path.resolve() before file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal (CWE-22) from hook input - Move symlink check before resolve() (resolve follows symlinks, so is_symlink() on resolved path is always False) - Use resolved `path` object for all post-resolve file I/O - Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep) Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): exit non-zero when refusing symlink drawio files A symlink .drawio file is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file." Exit 1 gives CLI callers and CI an honest signal that the file was refused. The hook harness (validate-drawio.sh) uses || true so this is safe there. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): use os.path.realpath() for semgrep-recognized taint sanitization Replace Path.resolve() with os.path.realpath() for path canonicalization. Both are functionally identical, but semgrep's hooks-path-traversal rule only recognizes os.path.realpath() as a taint sanitizer. This eliminates the CI semgrep finding without needing nosemgrep inline suppressions. Also replaces Path.is_symlink() with os.path.islink() for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): sync Codex plugin manifest with v1.2.0 diagram skill changes * Update plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> --------- Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Sam Castro <scoropeza@gmail.com> Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
file_pathwithPath.resolve()before any file operations inpost_process_drawio.pyandvalidate_drawio.pyto prevent path traversal sequences (e.g.,../../etc/passwd.drawio) in hook input from escaping the expected directoryresolve()—resolve()follows symlinks, sois_symlink()on the resolved path would always beFalsepathobject for all post-resolve file I/O (never reconstruct from the taintedfile_pathstring).drawiofiles — a symlink is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file". The hook harness (validate-drawio.sh) uses|| trueso this is safe there, but CLI callers and CI get an honest signal that the file was refusednosemgrepinline comments to use full rule IDs (truncated IDs were silently ignored by semgrep)ai.ai-best-practices.hooks-path-traversalExit code behavior
.drawiofileEdit/Writecall, so non-drawio files are the 99.9% case. Exit 0 means "not my file, nothing to do" which is the standard hook contract for irrelevant files..drawiofile|| true.Test plan
.drawiovalidation still works with both relative and absolute pathsln -s /tmp/target.drawio test.drawio && python3 post_process_drawio.py test.drawio; echo $?→ should print 1mise run build— passes cleanlyGenerated with Claude Code
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.