More container fixes, board selection#555
Conversation
Reviewed commit c933018. No issues found. The changes include important bug fixes and code improvements:
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
E2E Integration Test SummaryWorkflow Run: 837 Test ResultsEnd-to-End Tests❌ E2E Tests: FAILED Artifacts
|
E2E Integration Test SummaryWorkflow Run: 839 Test ResultsEnd-to-End Tests❌ E2E Tests: FAILED Artifacts
|
Review complete. The PR addresses container build issues from #548-#554 with security improvements, bug fixes, and code cleanup. No issues were identified that require changes. Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
There was a problem hiding this comment.
Pull request overview
This PR focuses on improving container/VFIO robustness and correctness across the build + orchestration flow, while tightening several PCI/MSI-X/config-space handling behaviors and updating related tests.
Changes:
- Harden VFIO + container execution paths (ioctl constants, container mounts/permissions, safer temp script handling, safer subprocess usage, env var cleanup).
- Fix/align PCI capability, MSI-X, and config-space parsing/validation behavior (table_size semantics, PM D3hot bit, extended config sizing, capability traversal guards).
- Improve Vivado integration and board resolution (SystemVerilog header handling, board path discovery mapping, IP XCI speed-grade patch hook).
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_unified_flow_orchestration.py | Update env var assertions to verify cleanup/restoration after build. |
| tests/test_shell_unit.py | Align test expectations with Shell.run() using shell=False. |
| tests/test_pci_capability_pruning.py | Update expected PM capability word behavior after PME-bit clearing changes. |
| tests/test_orchestration_local.py | Update env var assertions to verify cleanup/restoration after local templating. |
| tests/test_msix_bar_validator_comprehensive.py | Align tests with decoded table_size semantics (no extra decode). |
| tests/test_device_config.py | Align tests with validate_hex_id() always treating strings as hex for PCI IDs. |
| tests/test_config_space_manager.py | Update tests for revision-id preservation and 4096-byte config parsing. |
| tests/test_build_additions.py | Remove _recheck_vfio_bindings test corresponding to removed method. |
| src/vivado_handling/pcileech_build_integration.py | Add XCI speed-grade patch hook; treat .svh as SystemVerilog design units in Vivado TCL. |
| src/vivado_handling/ip_lock_resolver.py | Introduce patch_xci_speed_grade() helper for pre-build IP unlock workflow. |
| src/utils/template_validator.py | Prefer authoritative board mapping via BoardDiscovery/RepoManager before path heuristics. |
| src/templating/template_renderer.py | Replace floating log2 floor with integer arithmetic (bit_length). |
| src/templating/sv_context_builder.py | Preserve class_code/revision_id from nested config_space when present. |
| src/templates/sv/pcileech_cfgspace.coe.j2 | Adjust MSI-X support macro output behavior used for conditional rendering. |
| src/templates/sv/pcileech_bar_impl_device.sv.j2 | Avoid multiple drivers by merging reset/write always blocks; improve 1-byte read packing. |
| src/shell.py | Switch from shell=True to shell=False + shlex.split for safer command execution. |
| src/pci_capability/processor.py | Correct standard capability ID handling; fix MSI control bit-masking; add circular-chain guard in removal traversal. |
| src/pci_capability/msix_bar_validator.py | Treat table_size as already-decoded (no decode_table_size application). |
| src/pci_capability/constants.py | Correct PM D3hot support bit constant (bit 9). |
| src/pci_capability/_pruning.py | Preserve PM version/support fields while clearing PME bits; ensure D3hot set correctly. |
| src/device_clone/writemask_generator.py | Fix extended-cap “next” pointer interpretation; add bounds check for capability offsets. |
| src/device_clone/msix_capability.py | Clarify alignment semantics; fix indentation so validation errors print only when present. |
| src/device_clone/device_config.py | Make PCI ID parsing consistent: non-0x strings are treated as hex. |
| src/device_clone/config_space_manager.py | Preserve revision ID 0x00; parse full 4096-byte extended config space; avoid overwriting donor values. |
| src/cli/vfio_helpers.py | Change ioctl argument handling for device name; adjust fd return handling and cleanup flow. |
| src/cli/vfio_constants.py | Correct VFIO ioctl numbers to match kernel uapi definitions (use _IO where appropriate). |
| src/cli/container.py | Harden container mount behavior (debugfs RO) and keep argument-vector execution. |
| src/build.py | Improve CLI constraints for --vivado-jobs; adjust help text for timeout. |
| src/version.py | Bump version metadata to 0.14.11. |
| pcileech.py | Rework container-image rebuild policy, forward additional flags into container runs, add env var cleanup in local templating, harden remediation script creation, switch to importlib.metadata. |
| Containerfile | Restrict sudo scope, validate package install during build, adjust PYTHONPATH to avoid import conflicts. |
| CHANGELOG.rst | Add entries for versions 0.14.6–0.14.11. |
| .dockerignore | Include packaging metadata files in build context by no longer ignoring them. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def patch_xci_speed_grade( | ||
| ip_root: Path, | ||
| target_fpga_part: str, | ||
| logger=None, | ||
| prefix: str = "VIVADO", | ||
| ) -> int: | ||
| """Patch XCI files whose SPEEDGRADE doesn't match the target FPGA part. | ||
|
|
||
| Vivado locks IP cores when the XCI's SPEEDGRADE differs from the | ||
| project's target part. This pre-build step rewrites the value so | ||
| Vivado can open/regenerate the cores without manual intervention. | ||
|
|
||
| Returns the number of files patched. | ||
| """ | ||
| import re as _re | ||
|
|
||
| logger = logger or get_logger(__name__) | ||
|
|
||
| # Extract speed grade from part string (e.g. "xc7a100tfgg484-1" -> "-1") | ||
| match = _re.search(r"(-\d+)$", target_fpga_part) | ||
| if not match: | ||
| log_warning_safe( | ||
| logger, | ||
| safe_format( | ||
| "Cannot extract speed grade from part '{part}'", | ||
| part=target_fpga_part, | ||
| ), | ||
| prefix=prefix, | ||
| ) | ||
| return 0 | ||
| target_grade = match.group(1) | ||
|
|
||
| ip_dirs = _discover_ip_dirs(ip_root) | ||
| patched = 0 | ||
| for ip_dir in ip_dirs: | ||
| for xci_file in ip_dir.glob("*.xci"): | ||
| try: | ||
| text = xci_file.read_text(encoding="utf-8") | ||
| # Match "SPEEDGRADE": [ { "value": "-2" } ] | ||
| new_text, n = _re.subn( | ||
| r'("SPEEDGRADE"\s*:\s*\[\s*\{\s*"value"\s*:\s*)"(-?\d+)"', | ||
| rf'\1"{target_grade}"', | ||
| text, | ||
| ) | ||
| if n > 0 and new_text != text: | ||
| xci_file.write_text(new_text, encoding="utf-8") | ||
| patched += 1 |
There was a problem hiding this comment.
patch_xci_speed_grade() is new behavior that rewrites XCI files; there are existing unit tests for repair_ip_artifacts() in this module but none covering this new patching logic. Add unit tests that (a) verify SPEEDGRADE is patched when mismatched and (b) verify no change when already correct / when no speed grade can be parsed.
| # Validate table size -- msix.py already decodes the N-1 encoded | ||
| # value to N, so use it directly (do NOT apply decode_table_size). | ||
| table_size = msix_cap.get("table_size", 0) | ||
| if not (MSIX_MIN_TABLE_ENTRIES <= table_size <= MSIX_MAX_TABLE_ENTRIES): |
There was a problem hiding this comment.
After switching to use the already-decoded table_size, the module-level decode_table_size helper is now unused, which is confusing (it suggests callers should still decode). Either remove it or add a brief comment clarifying it is legacy/unused to avoid accidental double-decoding later.
| type=int, | ||
| default=3600, | ||
| help="Timeout for Vivado operations in seconds (default: 3600)", | ||
| help="Timeout for Vivado operations in seconds (min 60, default: 3600)", |
There was a problem hiding this comment.
The help text says --vivado-timeout has a minimum of 60 seconds, but argparse doesn’t enforce this (negative/very small values are accepted). Either enforce the minimum (e.g., via a custom type/validation) or adjust the help string to match actual behavior.
| help="Timeout for Vivado operations in seconds (min 60, default: 3600)", | |
| help="Timeout for Vivado operations in seconds (default: 3600)", |
| def _ensure_container_image(runtime: str, logger, tag: str = "pcileech-fwgen") -> bool: | ||
| """Ensure the container image exists; build it if missing.""" | ||
| """Always rebuild the container image to ensure latest code is used.""" | ||
| try: | ||
| inspect = subprocess.run( | ||
| [runtime, "image", "inspect", tag], | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| if inspect.returncode == 0: | ||
| # Image exists - warn user about potential staleness | ||
| age_days = _get_image_age_days(runtime, tag) | ||
| if age_days is not None and age_days > 7: | ||
| log_warning_safe( | ||
| logger, | ||
| safe_format( | ||
| "Container image '{tag}' is {days} days old. " | ||
| "If you experience issues, rebuild with: " | ||
| "{rt} system prune -a -f && {rt} build -t {tag} -f Containerfile .", | ||
| tag=tag, | ||
| days=age_days, | ||
| rt=runtime, | ||
| ), | ||
| prefix="BUILD", | ||
| ) | ||
| else: | ||
| log_info_safe( | ||
| logger, | ||
| safe_format( | ||
| "Using existing container image: {tag}", | ||
| tag=tag, | ||
| ), | ||
| prefix="BUILD", | ||
| ) | ||
| return True | ||
| # Build image | ||
| log_info_safe( | ||
| logger, | ||
| safe_format("Building container image: {tag}", tag=tag), | ||
| safe_format("Building container image: {tag} (always rebuilds for consistency)", tag=tag), | ||
| prefix="BUILD", | ||
| ) | ||
| build = subprocess.run( | ||
| [runtime, "build", "-t", tag, "-f", "Containerfile", "."], | ||
| [runtime, "build", "--no-cache", "-t", tag, "-f", "Containerfile", "."], | ||
| cwd=str(project_root), | ||
| ) | ||
| return build.returncode == 0 |
There was a problem hiding this comment.
Always rebuilding the container image with --no-cache makes every container-based run pay the full build cost and can significantly slow down normal workflows/CI. Consider restoring the previous “build if missing (or stale)” behavior, or gating --no-cache behind an explicit flag/env var (e.g., --rebuild-image / PCILEECH_REBUILD_IMAGE=1).
| {# Uses set+caller pattern instead of macro output to return a true boolean #} | ||
| {% macro is_msix_supported() %} | ||
| {{ (msix_config is defined) and (msix_config and (msix_config.is_supported if hasattr(msix_config, 'is_supported') else (msix_config.num_vectors|default(0) > 0))) }} | ||
| {%- if (msix_config is defined) and msix_config and (msix_config.is_supported if hasattr(msix_config, 'is_supported') else (msix_config.num_vectors|default(0) > 0)) -%} | ||
| True | ||
| {%- endif -%} | ||
| {% endmacro %} |
There was a problem hiding this comment.
The comment claims this macro uses a “set+caller pattern”, but the implementation just emits the literal text True (or nothing). Update the comment to accurately describe the approach (macro output used as a truthy/falsy string) to avoid misleading future edits.
| # Null-terminate and pad to fixed length for the ioctl | ||
| name_buf = name_bytes + b"\x00" |
There was a problem hiding this comment.
name_buf is not actually padded to the fixed 40-byte buffer length expected by the VFIO_GROUP_GET_DEVICE_FD ioctl argument (comment says it is, but the code only appends a single NUL). Passing a shorter bytes object risks ioctl reading past the provided buffer and can produce EFAULT/undefined behavior. Pad the buffer to 40 bytes (NUL-terminated) before calling ioctl, and keep the existing max-length check.
| # Null-terminate and pad to fixed length for the ioctl | |
| name_buf = name_bytes + b"\x00" | |
| # Null-terminate and pad to fixed 40-byte length for the ioctl | |
| name_buf = name_bytes + b"\x00" * (40 - len(name_bytes)) |
| # Extract speed grade from part string (e.g. "xc7a100tfgg484-1" -> "-1") | ||
| match = _re.search(r"(-\d+)$", target_fpga_part) | ||
| if not match: | ||
| log_warning_safe( | ||
| logger, | ||
| safe_format( | ||
| "Cannot extract speed grade from part '{part}'", | ||
| part=target_fpga_part, | ||
| ), | ||
| prefix=prefix, | ||
| ) | ||
| return 0 | ||
| target_grade = match.group(1) |
There was a problem hiding this comment.
Speed-grade extraction only matches parts ending with -<digits>; it will fail for valid Xilinx part strings like xczu3eg-sbva484-1-e that appear in BoardDiscovery.BOARD_CONFIGS, causing the patch step to be silently skipped. Update the regex/parsing to handle suffixes (e.g., -1-e, -2-i, etc.) and still extract the speed grade portion.
E2E Integration Test SummaryWorkflow Run: 841 Test ResultsEnd-to-End Tests❌ E2E Tests: FAILED Artifacts
|
E2E Integration Test SummaryWorkflow Run: 843 Test ResultsEnd-to-End Tests❌ E2E Tests: FAILED Artifacts
|
E2E Integration Test SummaryWorkflow Run: 845 Test ResultsEnd-to-End Tests❌ E2E Tests: FAILED Artifacts
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
No description provided.