Skip to content

Refactor: benchmark#233

Merged
matafela merged 6 commits into
mainfrom
cj/upgrade-pytorch-kinematics
Apr 20, 2026
Merged

Refactor: benchmark#233
matafela merged 6 commits into
mainfrom
cj/upgrade-pytorch-kinematics

Conversation

@matafela
Copy link
Copy Markdown
Collaborator

@matafela matafela commented Apr 16, 2026

Description

  • Refactor benchmark.
  • add pytorch kinematic solver to benchmark.

Type of change

  • Enhancement (non-breaking change which improves an existing functionality)

Checklist

  • I have run the black . command to format the code base.
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • Dependencies have been updated, if applicable.

Copilot AI review requested due to automatic review settings April 16, 2026 03:58
@matafela matafela marked this pull request as draft April 16, 2026 03:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Upgrades the project’s pytorch_kinematics dependency version in pyproject.toml to move the codebase onto a newer release of the kinematics library.

Changes:

  • Bump pytorch_kinematics from 0.7.6 to 0.10.0 in project dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml Outdated
"casadi",
"qpsolvers[osqp]==4.8.1",
"pytorch_kinematics==0.7.6",
"pytorch_kinematics==0.10.0",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bumps pytorch_kinematics to 0.10.0, but the user-facing warning in embodichain/lab/sim/utility/import_utils.py still instructs installing pytorch_kinematics==0.7.6. Please update that message (and any other pinned references) so install guidance stays consistent with the project dependency.

Copilot uses AI. Check for mistakes.
Comment thread pyproject.toml Outdated
"casadi",
"qpsolvers[osqp]==4.8.1",
"pytorch_kinematics==0.7.6",
"pytorch_kinematics==0.10.0",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/checklist claims documentation updates and new tests were added, but this PR diff only changes a dependency version. If docs/tests are required for this upgrade, please include those changes here (or adjust the PR description/checklist to match what’s actually being changed).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 17, 2026 07:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
Comment on lines 36 to 41
"pin-pink",
"casadi",
"qpsolvers[osqp]==4.8.1",
"pytorch_kinematics==0.7.6",
"pytorch_kinematics==0.10.0",
"polars==1.31.0",
"PyYAML>=6.0",
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psutil is now imported by multiple benchmark scripts (workspace_analyzer/opw_solver/pytorch_kinematic), but it is not listed in project.dependencies (and there are no optional deps defined). Add psutil to dependencies or introduce an optional benchmark extra and document that these scripts require it; otherwise running the scripts will fail with ModuleNotFoundError.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to 24
from __future__ import annotations

"""Benchmark script for workspace analyzer performance optimizations.

Measures each optimization independently across multiple sample sizes.
Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer
"""

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring is no longer the first statement because from __future__ import annotations comes before it. This means the triple-quoted string won’t be treated as the module docstring (__doc__ will be None) and it also violates the intended “header + module docstring” convention. Move the future import below the docstring (future imports are allowed after the module docstring).

Suggested change
from __future__ import annotations
"""Benchmark script for workspace analyzer performance optimizations.
Measures each optimization independently across multiple sample sizes.
Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer
"""
"""Benchmark script for workspace analyzer performance optimizations.
Measures each optimization independently across multiple sample sizes.
Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer
"""
from __future__ import annotations

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +140
def _write_markdown_report(
benchmark_name: str,
perf_rows: list[dict[str, object]],
metric_rows: list[dict[str, object]],
notes: list[str] | None = None,
) -> Path:
"""Write benchmark results to a markdown report with two tables."""
output_dir = Path("outputs/benchmarks")
output_dir.mkdir(parents=True, exist_ok=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
report_path = output_dir / f"{benchmark_name}_{timestamp}.md"

lines: list[str] = [
f"# {benchmark_name} Benchmark Report",
"",
f"Generated at: {datetime.now().isoformat(timespec='seconds')}",
"",
"## Time & Memory",
"",
]
lines.extend(_format_markdown_table(perf_rows))
lines.extend(["", "## Success & Other Metrics", ""])
lines.extend(_format_markdown_table(metric_rows))
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This report writer explicitly generates only two tables (Time & Memory, Success & Other Metrics). In this PR, .claude/skills/benchmark/SKILL.md states benchmark reports must contain exactly three tables including a Leaderboard. Either add a leaderboard section/table here (even if it’s a simple ranking or N/A placeholder) or relax/update the documented convention so the code and guidance stay consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +109
def _write_markdown_report(
benchmark_name: str,
perf_rows: list[dict[str, object]],
metric_rows: list[dict[str, object]],
notes: list[str] | None = None,
) -> Path:
"""Write benchmark results to a markdown report with two tables."""
output_dir = Path("outputs/benchmarks")
output_dir.mkdir(parents=True, exist_ok=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
report_path = output_dir / f"{benchmark_name}_{timestamp}.md"

lines: list[str] = [
f"# {benchmark_name} Benchmark Report",
"",
f"Generated at: {datetime.now().isoformat(timespec='seconds')}",
"",
"## Time & Memory",
"",
]
lines.extend(_format_markdown_table(perf_rows))
lines.extend(["", "## Success & Other Metrics", ""])
lines.extend(_format_markdown_table(metric_rows))

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_write_markdown_report is documented/implemented as producing only two tables, but the benchmark-writing guidance added in this PR (.claude/skills/benchmark/SKILL.md) requires exactly three tables (including a Leaderboard). Please either include the leaderboard table here or update the guidance so the repository convention is unambiguous.

Copilot uses AI. Check for mistakes.
before = memory_snapshot()
start_time = time.perf_counter()
ik_success, ik_qpos = solver.get_ik(
fk_xpos, joint_seed=qpos, return_all_solutions=False
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_ik_once passes joint_seed=qpos into solver.get_ik(...), but PytorchSolver.get_ik expects the argument name qpos_seed. Because the solver signature has **kwargs, this typo will be silently ignored and the benchmark will run with the default (zero) seed, skewing both timing and success metrics. Use the correct keyword so the intended seed is applied.

Suggested change
fk_xpos, joint_seed=qpos, return_all_solutions=False
fk_xpos, qpos_seed=qpos, return_all_solutions=False

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +298
metric_rows.append(
{
"sample_size": n_sample,
"impl": "CPU",
"success_rate": f"{cpu_result['success_rate']:.2%}",
"translation_err_mm": f"{cpu_result['translation_err_mm']:.6f}",
"rotation_err_deg": f"{cpu_result['rotation_err_deg']:.6f}",
}
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric_rows stores success_rate as a percentage string (e.g. "80.00%"), but build_leaderboard_rows() later does float(row["success_rate"]), which will raise ValueError when building the leaderboard. Keep success_rate numeric (0–1) in metric_rows, or strip/parse the percent sign before converting.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +21
"""Benchmark script for Pytorch Kinematic solver Warp CUDA vs CPU implementation.

Measures FK/IK wall-clock latency, pose accuracy, success rate, and memory usage.
Run: python -m scripts.benchmark.robotics.kinematic_solver.pytorch_kinematic
"""
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says this benchmarks a “Warp CUDA vs CPU implementation”, but the script uses PytorchSolver on CPU vs PyTorch CUDA and does not use Warp. Update the docstring to match what’s actually being benchmarked to avoid confusion for people running the script.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 17, 2026 08:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# -- robotics-kinematic-solver -------------------------------------------
robotics_ks_parser = subparsers.add_parser(
"robotics-kinematic-solver",
help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).",
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subcommand help text still says it benchmarks only the OPW solver, but the CLI now supports both OPW and PyTorch solvers via --solvers. Update the help string to reflect the expanded scope so users aren’t misled.

Suggested change
help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).",
help="Benchmark kinematic solvers (OPW/PyTorch) for FK/IK accuracy and speed.",

Copilot uses AI. Check for mistakes.
Comment on lines +316 to 342
if isinstance(lower_qpos_limits, list) or isinstance(
lower_qpos_limits, np.ndarray
):
self.lower_qpos_limits = torch.tensor(
lower_qpos_limits, dtype=float, device=self.device
)
elif isinstance(lower_qpos_limits, torch.Tensor):
self.lower_qpos_limits = lower_qpos_limits.clone().to(device=self.device)
else:
logger.log_error(
f"Invalid type for lower_qpos_limits: {type(lower_qpos_limits)}. Must be list, np.ndarray, or torch.Tensor."
)

if isinstance(upper_qpos_limits, list) or isinstance(
upper_qpos_limits, np.ndarray
):
self.upper_qpos_limits = torch.tensor(
upper_qpos_limits, dtype=float, device=self.device
)
elif isinstance(upper_qpos_limits, torch.Tensor):
self.upper_qpos_limits = upper_qpos_limits.clone().to(device=self.device)
else:
logger.log_error(
f"Invalid type for upper_qpos_limits: {type(upper_qpos_limits)}. Must be list, np.ndarray, or torch.Tensor."
)

return True
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_qpos_limits() logs an error for invalid lower_qpos_limits / upper_qpos_limits types but still returns True, and it may leave one/both of the *_qpos_limits attributes unchanged/None. This can lead to downstream IK sampling using stale or missing limits. Consider returning False (or raising) as soon as an invalid type is detected, and only returning True once both tensors have been successfully set (optionally validating shape matches DOF).

Copilot uses AI. Check for mistakes.
start = time.perf_counter()
ik_success, ik_qpos = solver.get_ik(
fk_xpos,
joint_seed=qpos_seed,
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PytorchSolver.get_ik() expects the seed argument as qpos_seed (see solver signature), but this benchmark calls it with joint_seed=..., which will raise TypeError: got an unexpected keyword argument 'joint_seed'. Pass the seed via qpos_seed= (or positionally) to keep the benchmark runnable.

Suggested change
joint_seed=qpos_seed,
qpos_seed=qpos_seed,

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +136
def _build_leaderboard_rows(
metric_rows: list[dict[str, object]],
) -> list[dict[str, object]]:
"""Aggregate and rank algorithms by overall success rate."""
aggregate: dict[str, dict[str, float]] = {}
for row in metric_rows:
impl = str(row["impl"])
if impl not in aggregate:
aggregate[impl] = {
"success_sum": 0.0,
"t_err_sum": 0.0,
"r_err_sum": 0.0,
"count": 0.0,
}

aggregate[impl]["success_sum"] += float(row["success_rate"])
aggregate[impl]["t_err_sum"] += float(row["translation_err_mm"])
aggregate[impl]["r_err_sum"] += float(row["rotation_err_deg"])
aggregate[impl]["count"] += 1.0
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_leaderboard_rows() assumes every metric_rows entry has numeric success_rate, translation_err_mm, and rotation_err_deg. However benchmark_opw_solver() returns a skipped row with success_rate: 'N/A' (and no translation/rotation fields) when CUDA is unavailable, which will cause float(...)/key lookups to fail and crash report generation on CPU-only machines. Filter out skipped/non-numeric rows (or normalize them to numeric defaults) before aggregating.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +35
import numpy as np
import psutil
import torch

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script imports psutil unconditionally, but the repository dependency manifests don’t appear to declare psutil (and other modules use an optional try/except ImportError pattern). To avoid ModuleNotFoundError for users running the benchmark, either add psutil to the project/dev dependencies or make the import optional and gracefully disable CPU memory reporting when unavailable.

Copilot uses AI. Check for mistakes.
Comment on lines 30 to 32
import numpy as np
import psutil
import torch
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark imports psutil unconditionally. If psutil isn’t installed (it doesn’t seem to be listed in the repo’s dependency files), running the benchmark will fail with ModuleNotFoundError. Consider using the existing optional-import pattern used elsewhere in the codebase (e.g., try/except ImportError) or ensure psutil is added to the dev dependencies.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +141
def _write_markdown_report(
benchmark_name: str,
perf_rows: list[dict[str, object]],
metric_rows: list[dict[str, object]],
notes: list[str] | None = None,
) -> Path:
"""Write benchmark results to a markdown report with two tables."""
output_dir = Path("outputs/benchmarks")
output_dir.mkdir(parents=True, exist_ok=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
report_path = output_dir / f"{benchmark_name}_{timestamp}.md"

lines: list[str] = [
f"# {benchmark_name} Benchmark Report",
"",
f"Generated at: {datetime.now().isoformat(timespec='seconds')}",
"",
"## Time & Memory",
"",
]
lines.extend(_format_markdown_table(perf_rows))
lines.extend(["", "## Success & Other Metrics", ""])
lines.extend(_format_markdown_table(metric_rows))

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark writes a markdown report with only two tables ("Time & Memory" and "Success & Other Metrics") and even notes it contains exactly two tables. However, the new benchmark skill doc added in this PR states benchmark reports must contain exactly three tables, including a "Leaderboard" table (see .claude/skills/benchmark/SKILL.md around the "Save Results to One Markdown Report" section). Please either add a leaderboard table to this report (and compute leaderboard rows) or adjust the documented convention so they match.

Copilot uses AI. Check for mistakes.
@matafela matafela marked this pull request as ready for review April 17, 2026 09:33
@matafela matafela changed the title Draft: Upgrade pytorch kinematics Refactor: benchmark Apr 17, 2026
@matafela matafela merged commit 50843a5 into main Apr 20, 2026
11 of 16 checks passed
@matafela matafela deleted the cj/upgrade-pytorch-kinematics branch April 20, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants