Refactor: benchmark#233
Conversation
There was a problem hiding this comment.
Pull request overview
Upgrades the project’s pytorch_kinematics dependency version in pyproject.toml to move the codebase onto a newer release of the kinematics library.
Changes:
- Bump
pytorch_kinematicsfrom0.7.6to0.10.0in project dependencies.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "casadi", | ||
| "qpsolvers[osqp]==4.8.1", | ||
| "pytorch_kinematics==0.7.6", | ||
| "pytorch_kinematics==0.10.0", |
There was a problem hiding this comment.
This bumps pytorch_kinematics to 0.10.0, but the user-facing warning in embodichain/lab/sim/utility/import_utils.py still instructs installing pytorch_kinematics==0.7.6. Please update that message (and any other pinned references) so install guidance stays consistent with the project dependency.
| "casadi", | ||
| "qpsolvers[osqp]==4.8.1", | ||
| "pytorch_kinematics==0.7.6", | ||
| "pytorch_kinematics==0.10.0", |
There was a problem hiding this comment.
PR description/checklist claims documentation updates and new tests were added, but this PR diff only changes a dependency version. If docs/tests are required for this upgrade, please include those changes here (or adjust the PR description/checklist to match what’s actually being changed).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "pin-pink", | ||
| "casadi", | ||
| "qpsolvers[osqp]==4.8.1", | ||
| "pytorch_kinematics==0.7.6", | ||
| "pytorch_kinematics==0.10.0", | ||
| "polars==1.31.0", | ||
| "PyYAML>=6.0", |
There was a problem hiding this comment.
psutil is now imported by multiple benchmark scripts (workspace_analyzer/opw_solver/pytorch_kinematic), but it is not listed in project.dependencies (and there are no optional deps defined). Add psutil to dependencies or introduce an optional benchmark extra and document that these scripts require it; otherwise running the scripts will fail with ModuleNotFoundError.
| from __future__ import annotations | ||
|
|
||
| """Benchmark script for workspace analyzer performance optimizations. | ||
|
|
||
| Measures each optimization independently across multiple sample sizes. | ||
| Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer | ||
| """ | ||
|
|
There was a problem hiding this comment.
The module docstring is no longer the first statement because from __future__ import annotations comes before it. This means the triple-quoted string won’t be treated as the module docstring (__doc__ will be None) and it also violates the intended “header + module docstring” convention. Move the future import below the docstring (future imports are allowed after the module docstring).
| from __future__ import annotations | |
| """Benchmark script for workspace analyzer performance optimizations. | |
| Measures each optimization independently across multiple sample sizes. | |
| Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer | |
| """ | |
| """Benchmark script for workspace analyzer performance optimizations. | |
| Measures each optimization independently across multiple sample sizes. | |
| Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer | |
| """ | |
| from __future__ import annotations |
| def _write_markdown_report( | ||
| benchmark_name: str, | ||
| perf_rows: list[dict[str, object]], | ||
| metric_rows: list[dict[str, object]], | ||
| notes: list[str] | None = None, | ||
| ) -> Path: | ||
| """Write benchmark results to a markdown report with two tables.""" | ||
| output_dir = Path("outputs/benchmarks") | ||
| output_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
| report_path = output_dir / f"{benchmark_name}_{timestamp}.md" | ||
|
|
||
| lines: list[str] = [ | ||
| f"# {benchmark_name} Benchmark Report", | ||
| "", | ||
| f"Generated at: {datetime.now().isoformat(timespec='seconds')}", | ||
| "", | ||
| "## Time & Memory", | ||
| "", | ||
| ] | ||
| lines.extend(_format_markdown_table(perf_rows)) | ||
| lines.extend(["", "## Success & Other Metrics", ""]) | ||
| lines.extend(_format_markdown_table(metric_rows)) |
There was a problem hiding this comment.
This report writer explicitly generates only two tables (Time & Memory, Success & Other Metrics). In this PR, .claude/skills/benchmark/SKILL.md states benchmark reports must contain exactly three tables including a Leaderboard. Either add a leaderboard section/table here (even if it’s a simple ranking or N/A placeholder) or relax/update the documented convention so the code and guidance stay consistent.
| def _write_markdown_report( | ||
| benchmark_name: str, | ||
| perf_rows: list[dict[str, object]], | ||
| metric_rows: list[dict[str, object]], | ||
| notes: list[str] | None = None, | ||
| ) -> Path: | ||
| """Write benchmark results to a markdown report with two tables.""" | ||
| output_dir = Path("outputs/benchmarks") | ||
| output_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
| report_path = output_dir / f"{benchmark_name}_{timestamp}.md" | ||
|
|
||
| lines: list[str] = [ | ||
| f"# {benchmark_name} Benchmark Report", | ||
| "", | ||
| f"Generated at: {datetime.now().isoformat(timespec='seconds')}", | ||
| "", | ||
| "## Time & Memory", | ||
| "", | ||
| ] | ||
| lines.extend(_format_markdown_table(perf_rows)) | ||
| lines.extend(["", "## Success & Other Metrics", ""]) | ||
| lines.extend(_format_markdown_table(metric_rows)) | ||
|
|
There was a problem hiding this comment.
_write_markdown_report is documented/implemented as producing only two tables, but the benchmark-writing guidance added in this PR (.claude/skills/benchmark/SKILL.md) requires exactly three tables (including a Leaderboard). Please either include the leaderboard table here or update the guidance so the repository convention is unambiguous.
| before = memory_snapshot() | ||
| start_time = time.perf_counter() | ||
| ik_success, ik_qpos = solver.get_ik( | ||
| fk_xpos, joint_seed=qpos, return_all_solutions=False |
There was a problem hiding this comment.
run_ik_once passes joint_seed=qpos into solver.get_ik(...), but PytorchSolver.get_ik expects the argument name qpos_seed. Because the solver signature has **kwargs, this typo will be silently ignored and the benchmark will run with the default (zero) seed, skewing both timing and success metrics. Use the correct keyword so the intended seed is applied.
| fk_xpos, joint_seed=qpos, return_all_solutions=False | |
| fk_xpos, qpos_seed=qpos, return_all_solutions=False |
| metric_rows.append( | ||
| { | ||
| "sample_size": n_sample, | ||
| "impl": "CPU", | ||
| "success_rate": f"{cpu_result['success_rate']:.2%}", | ||
| "translation_err_mm": f"{cpu_result['translation_err_mm']:.6f}", | ||
| "rotation_err_deg": f"{cpu_result['rotation_err_deg']:.6f}", | ||
| } | ||
| ) |
There was a problem hiding this comment.
metric_rows stores success_rate as a percentage string (e.g. "80.00%"), but build_leaderboard_rows() later does float(row["success_rate"]), which will raise ValueError when building the leaderboard. Keep success_rate numeric (0–1) in metric_rows, or strip/parse the percent sign before converting.
| """Benchmark script for Pytorch Kinematic solver Warp CUDA vs CPU implementation. | ||
|
|
||
| Measures FK/IK wall-clock latency, pose accuracy, success rate, and memory usage. | ||
| Run: python -m scripts.benchmark.robotics.kinematic_solver.pytorch_kinematic | ||
| """ |
There was a problem hiding this comment.
The module docstring says this benchmarks a “Warp CUDA vs CPU implementation”, but the script uses PytorchSolver on CPU vs PyTorch CUDA and does not use Warp. Update the docstring to match what’s actually being benchmarked to avoid confusion for people running the script.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # -- robotics-kinematic-solver ------------------------------------------- | ||
| robotics_ks_parser = subparsers.add_parser( | ||
| "robotics-kinematic-solver", | ||
| help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).", |
There was a problem hiding this comment.
The subcommand help text still says it benchmarks only the OPW solver, but the CLI now supports both OPW and PyTorch solvers via --solvers. Update the help string to reflect the expanded scope so users aren’t misled.
| help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).", | |
| help="Benchmark kinematic solvers (OPW/PyTorch) for FK/IK accuracy and speed.", |
| if isinstance(lower_qpos_limits, list) or isinstance( | ||
| lower_qpos_limits, np.ndarray | ||
| ): | ||
| self.lower_qpos_limits = torch.tensor( | ||
| lower_qpos_limits, dtype=float, device=self.device | ||
| ) | ||
| elif isinstance(lower_qpos_limits, torch.Tensor): | ||
| self.lower_qpos_limits = lower_qpos_limits.clone().to(device=self.device) | ||
| else: | ||
| logger.log_error( | ||
| f"Invalid type for lower_qpos_limits: {type(lower_qpos_limits)}. Must be list, np.ndarray, or torch.Tensor." | ||
| ) | ||
|
|
||
| if isinstance(upper_qpos_limits, list) or isinstance( | ||
| upper_qpos_limits, np.ndarray | ||
| ): | ||
| self.upper_qpos_limits = torch.tensor( | ||
| upper_qpos_limits, dtype=float, device=self.device | ||
| ) | ||
| elif isinstance(upper_qpos_limits, torch.Tensor): | ||
| self.upper_qpos_limits = upper_qpos_limits.clone().to(device=self.device) | ||
| else: | ||
| logger.log_error( | ||
| f"Invalid type for upper_qpos_limits: {type(upper_qpos_limits)}. Must be list, np.ndarray, or torch.Tensor." | ||
| ) | ||
|
|
||
| return True |
There was a problem hiding this comment.
set_qpos_limits() logs an error for invalid lower_qpos_limits / upper_qpos_limits types but still returns True, and it may leave one/both of the *_qpos_limits attributes unchanged/None. This can lead to downstream IK sampling using stale or missing limits. Consider returning False (or raising) as soon as an invalid type is detected, and only returning True once both tensors have been successfully set (optionally validating shape matches DOF).
| start = time.perf_counter() | ||
| ik_success, ik_qpos = solver.get_ik( | ||
| fk_xpos, | ||
| joint_seed=qpos_seed, |
There was a problem hiding this comment.
PytorchSolver.get_ik() expects the seed argument as qpos_seed (see solver signature), but this benchmark calls it with joint_seed=..., which will raise TypeError: got an unexpected keyword argument 'joint_seed'. Pass the seed via qpos_seed= (or positionally) to keep the benchmark runnable.
| joint_seed=qpos_seed, | |
| qpos_seed=qpos_seed, |
| def _build_leaderboard_rows( | ||
| metric_rows: list[dict[str, object]], | ||
| ) -> list[dict[str, object]]: | ||
| """Aggregate and rank algorithms by overall success rate.""" | ||
| aggregate: dict[str, dict[str, float]] = {} | ||
| for row in metric_rows: | ||
| impl = str(row["impl"]) | ||
| if impl not in aggregate: | ||
| aggregate[impl] = { | ||
| "success_sum": 0.0, | ||
| "t_err_sum": 0.0, | ||
| "r_err_sum": 0.0, | ||
| "count": 0.0, | ||
| } | ||
|
|
||
| aggregate[impl]["success_sum"] += float(row["success_rate"]) | ||
| aggregate[impl]["t_err_sum"] += float(row["translation_err_mm"]) | ||
| aggregate[impl]["r_err_sum"] += float(row["rotation_err_deg"]) | ||
| aggregate[impl]["count"] += 1.0 |
There was a problem hiding this comment.
_build_leaderboard_rows() assumes every metric_rows entry has numeric success_rate, translation_err_mm, and rotation_err_deg. However benchmark_opw_solver() returns a skipped row with success_rate: 'N/A' (and no translation/rotation fields) when CUDA is unavailable, which will cause float(...)/key lookups to fail and crash report generation on CPU-only machines. Filter out skipped/non-numeric rows (or normalize them to numeric defaults) before aggregating.
| import numpy as np | ||
| import psutil | ||
| import torch | ||
|
|
There was a problem hiding this comment.
This script imports psutil unconditionally, but the repository dependency manifests don’t appear to declare psutil (and other modules use an optional try/except ImportError pattern). To avoid ModuleNotFoundError for users running the benchmark, either add psutil to the project/dev dependencies or make the import optional and gracefully disable CPU memory reporting when unavailable.
| import numpy as np | ||
| import psutil | ||
| import torch |
There was a problem hiding this comment.
This benchmark imports psutil unconditionally. If psutil isn’t installed (it doesn’t seem to be listed in the repo’s dependency files), running the benchmark will fail with ModuleNotFoundError. Consider using the existing optional-import pattern used elsewhere in the codebase (e.g., try/except ImportError) or ensure psutil is added to the dev dependencies.
| def _write_markdown_report( | ||
| benchmark_name: str, | ||
| perf_rows: list[dict[str, object]], | ||
| metric_rows: list[dict[str, object]], | ||
| notes: list[str] | None = None, | ||
| ) -> Path: | ||
| """Write benchmark results to a markdown report with two tables.""" | ||
| output_dir = Path("outputs/benchmarks") | ||
| output_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
| report_path = output_dir / f"{benchmark_name}_{timestamp}.md" | ||
|
|
||
| lines: list[str] = [ | ||
| f"# {benchmark_name} Benchmark Report", | ||
| "", | ||
| f"Generated at: {datetime.now().isoformat(timespec='seconds')}", | ||
| "", | ||
| "## Time & Memory", | ||
| "", | ||
| ] | ||
| lines.extend(_format_markdown_table(perf_rows)) | ||
| lines.extend(["", "## Success & Other Metrics", ""]) | ||
| lines.extend(_format_markdown_table(metric_rows)) | ||
|
|
There was a problem hiding this comment.
This benchmark writes a markdown report with only two tables ("Time & Memory" and "Success & Other Metrics") and even notes it contains exactly two tables. However, the new benchmark skill doc added in this PR states benchmark reports must contain exactly three tables, including a "Leaderboard" table (see .claude/skills/benchmark/SKILL.md around the "Save Results to One Markdown Report" section). Please either add a leaderboard table to this report (and compute leaderboard rows) or adjust the documented convention so they match.
Description
Type of change
Checklist
black .command to format the code base.