Refactor: benchmark by matafela · Pull Request #233 · DexForce/EmbodiChain

matafela · 2026-04-16T03:58:41Z

Description

Refactor benchmark.
add pytorch kinematic solver to benchmark.

Type of change

Enhancement (non-breaking change which improves an existing functionality)

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated, if applicable.

Copilot

Pull request overview

Upgrades the project’s pytorch_kinematics dependency version in pyproject.toml to move the codebase onto a newer release of the kinematics library.

Changes:

Bump pytorch_kinematics from 0.7.6 to 0.10.0 in project dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T04:00:35Z

  "casadi",
  "qpsolvers[osqp]==4.8.1",
-  "pytorch_kinematics==0.7.6",
+  "pytorch_kinematics==0.10.0",


This bumps pytorch_kinematics to 0.10.0, but the user-facing warning in embodichain/lab/sim/utility/import_utils.py still instructs installing pytorch_kinematics==0.7.6. Please update that message (and any other pinned references) so install guidance stays consistent with the project dependency.

Copilot · 2026-04-16T04:00:36Z

  "casadi",
  "qpsolvers[osqp]==4.8.1",
-  "pytorch_kinematics==0.7.6",
+  "pytorch_kinematics==0.10.0",


PR description/checklist claims documentation updates and new tests were added, but this PR diff only changes a dependency version. If docs/tests are required for this upgrade, please include those changes here (or adjust the PR description/checklist to match what’s actually being changed).

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T07:57:36Z

  "pin-pink",
  "casadi",
  "qpsolvers[osqp]==4.8.1",
-  "pytorch_kinematics==0.7.6",
+  "pytorch_kinematics==0.10.0",
  "polars==1.31.0",
  "PyYAML>=6.0",


psutil is now imported by multiple benchmark scripts (workspace_analyzer/opw_solver/pytorch_kinematic), but it is not listed in project.dependencies (and there are no optional deps defined). Add psutil to dependencies or introduce an optional benchmark extra and document that these scripts require it; otherwise running the scripts will fail with ModuleNotFoundError.

Copilot · 2026-04-17T07:57:36Z

+from __future__ import annotations
+
 """Benchmark script for workspace analyzer performance optimizations.

 Measures each optimization independently across multiple sample sizes.
 Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer
 """



The module docstring is no longer the first statement because from __future__ import annotations comes before it. This means the triple-quoted string won’t be treated as the module docstring (__doc__ will be None) and it also violates the intended “header + module docstring” convention. Move the future import below the docstring (future imports are allowed after the module docstring).

Suggested change

from __future__ import annotations

"""Benchmark script for workspace analyzer performance optimizations.

Measures each optimization independently across multiple sample sizes.

Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer

"""

"""Benchmark script for workspace analyzer performance optimizations.

Measures each optimization independently across multiple sample sizes.

Run: python -m scripts.benchmark.workspace_analyzer.benchmark_workspace_analyzer

"""

from __future__ import annotations

Copilot · 2026-04-17T07:57:36Z

+def _write_markdown_report(
+    benchmark_name: str,
+    perf_rows: list[dict[str, object]],
+    metric_rows: list[dict[str, object]],
+    notes: list[str] | None = None,
+) -> Path:
+    """Write benchmark results to a markdown report with two tables."""
+    output_dir = Path("outputs/benchmarks")
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    report_path = output_dir / f"{benchmark_name}_{timestamp}.md"
+
+    lines: list[str] = [
+        f"# {benchmark_name} Benchmark Report",
+        "",
+        f"Generated at: {datetime.now().isoformat(timespec='seconds')}",
+        "",
+        "## Time & Memory",
+        "",
+    ]
+    lines.extend(_format_markdown_table(perf_rows))
+    lines.extend(["", "## Success & Other Metrics", ""])
+    lines.extend(_format_markdown_table(metric_rows))


This report writer explicitly generates only two tables (Time & Memory, Success & Other Metrics). In this PR, .claude/skills/benchmark/SKILL.md states benchmark reports must contain exactly three tables including a Leaderboard. Either add a leaderboard section/table here (even if it’s a simple ranking or N/A placeholder) or relax/update the documented convention so the code and guidance stay consistent.

Copilot · 2026-04-17T07:57:37Z

+def _write_markdown_report(
+    benchmark_name: str,
+    perf_rows: list[dict[str, object]],
+    metric_rows: list[dict[str, object]],
+    notes: list[str] | None = None,
+) -> Path:
+    """Write benchmark results to a markdown report with two tables."""
+    output_dir = Path("outputs/benchmarks")
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    report_path = output_dir / f"{benchmark_name}_{timestamp}.md"
+
+    lines: list[str] = [
+        f"# {benchmark_name} Benchmark Report",
+        "",
+        f"Generated at: {datetime.now().isoformat(timespec='seconds')}",
+        "",
+        "## Time & Memory",
+        "",
+    ]
+    lines.extend(_format_markdown_table(perf_rows))
+    lines.extend(["", "## Success & Other Metrics", ""])
+    lines.extend(_format_markdown_table(metric_rows))
+


_write_markdown_report is documented/implemented as producing only two tables, but the benchmark-writing guidance added in this PR (.claude/skills/benchmark/SKILL.md) requires exactly three tables (including a Leaderboard). Please either include the leaderboard table here or update the guidance so the repository convention is unambiguous.

Copilot · 2026-04-17T07:57:37Z

+    before = memory_snapshot()
+    start_time = time.perf_counter()
+    ik_success, ik_qpos = solver.get_ik(
+        fk_xpos, joint_seed=qpos, return_all_solutions=False


run_ik_once passes joint_seed=qpos into solver.get_ik(...), but PytorchSolver.get_ik expects the argument name qpos_seed. Because the solver signature has **kwargs, this typo will be silently ignored and the benchmark will run with the default (zero) seed, skewing both timing and success metrics. Use the correct keyword so the intended seed is applied.

Suggested change

fk_xpos, joint_seed=qpos, return_all_solutions=False

fk_xpos, qpos_seed=qpos, return_all_solutions=False

Copilot · 2026-04-17T07:57:37Z

+        metric_rows.append(
+            {
+                "sample_size": n_sample,
+                "impl": "CPU",
+                "success_rate": f"{cpu_result['success_rate']:.2%}",
+                "translation_err_mm": f"{cpu_result['translation_err_mm']:.6f}",
+                "rotation_err_deg": f"{cpu_result['rotation_err_deg']:.6f}",
+            }
+        )


metric_rows stores success_rate as a percentage string (e.g. "80.00%"), but build_leaderboard_rows() later does float(row["success_rate"]), which will raise ValueError when building the leaderboard. Keep success_rate numeric (0–1) in metric_rows, or strip/parse the percent sign before converting.

Copilot · 2026-04-17T07:57:38Z

+"""Benchmark script for Pytorch Kinematic solver Warp CUDA vs CPU implementation.
+
+Measures FK/IK wall-clock latency, pose accuracy, success rate, and memory usage.
+Run: python -m scripts.benchmark.robotics.kinematic_solver.pytorch_kinematic
+"""


The module docstring says this benchmarks a “Warp CUDA vs CPU implementation”, but the script uses PytorchSolver on CPU vs PyTorch CUDA and does not use Warp. Update the docstring to match what’s actually being benchmarked to avoid confusion for people running the script.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T08:34:04Z

    # -- robotics-kinematic-solver -------------------------------------------
    robotics_ks_parser = subparsers.add_parser(
        "robotics-kinematic-solver",
        help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).",


The subcommand help text still says it benchmarks only the OPW solver, but the CLI now supports both OPW and PyTorch solvers via --solvers. Update the help string to reflect the expanded scope so users aren’t misled.

Suggested change

help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).",

help="Benchmark kinematic solvers (OPW/PyTorch) for FK/IK accuracy and speed.",

Copilot · 2026-04-17T08:34:05Z

+        if isinstance(lower_qpos_limits, list) or isinstance(
+            lower_qpos_limits, np.ndarray
+        ):
+            self.lower_qpos_limits = torch.tensor(
+                lower_qpos_limits, dtype=float, device=self.device
+            )
+        elif isinstance(lower_qpos_limits, torch.Tensor):
+            self.lower_qpos_limits = lower_qpos_limits.clone().to(device=self.device)
+        else:
+            logger.log_error(
+                f"Invalid type for lower_qpos_limits: {type(lower_qpos_limits)}. Must be list, np.ndarray, or torch.Tensor."
+            )
+
+        if isinstance(upper_qpos_limits, list) or isinstance(
+            upper_qpos_limits, np.ndarray
+        ):
+            self.upper_qpos_limits = torch.tensor(
+                upper_qpos_limits, dtype=float, device=self.device
+            )
+        elif isinstance(upper_qpos_limits, torch.Tensor):
+            self.upper_qpos_limits = upper_qpos_limits.clone().to(device=self.device)
+        else:
+            logger.log_error(
+                f"Invalid type for upper_qpos_limits: {type(upper_qpos_limits)}. Must be list, np.ndarray, or torch.Tensor."
+            )
+
        return True


set_qpos_limits() logs an error for invalid lower_qpos_limits / upper_qpos_limits types but still returns True, and it may leave one/both of the *_qpos_limits attributes unchanged/None. This can lead to downstream IK sampling using stale or missing limits. Consider returning False (or raising) as soon as an invalid type is detected, and only returning True once both tensors have been successfully set (optionally validating shape matches DOF).

Copilot · 2026-04-17T08:34:05Z

+    start = time.perf_counter()
+    ik_success, ik_qpos = solver.get_ik(
+        fk_xpos,
+        joint_seed=qpos_seed,


PytorchSolver.get_ik() expects the seed argument as qpos_seed (see solver signature), but this benchmark calls it with joint_seed=..., which will raise TypeError: got an unexpected keyword argument 'joint_seed'. Pass the seed via qpos_seed= (or positionally) to keep the benchmark runnable.

Suggested change

joint_seed=qpos_seed,

qpos_seed=qpos_seed,

Copilot · 2026-04-17T08:34:05Z

+def _build_leaderboard_rows(
+    metric_rows: list[dict[str, object]],
+) -> list[dict[str, object]]:
+    """Aggregate and rank algorithms by overall success rate."""
+    aggregate: dict[str, dict[str, float]] = {}
+    for row in metric_rows:
+        impl = str(row["impl"])
+        if impl not in aggregate:
+            aggregate[impl] = {
+                "success_sum": 0.0,
+                "t_err_sum": 0.0,
+                "r_err_sum": 0.0,
+                "count": 0.0,
+            }
+
+        aggregate[impl]["success_sum"] += float(row["success_rate"])
+        aggregate[impl]["t_err_sum"] += float(row["translation_err_mm"])
+        aggregate[impl]["r_err_sum"] += float(row["rotation_err_deg"])
+        aggregate[impl]["count"] += 1.0


_build_leaderboard_rows() assumes every metric_rows entry has numeric success_rate, translation_err_mm, and rotation_err_deg. However benchmark_opw_solver() returns a skipped row with success_rate: 'N/A' (and no translation/rotation fields) when CUDA is unavailable, which will cause float(...)/key lookups to fail and crash report generation on CPU-only machines. Filter out skipped/non-numeric rows (or normalize them to numeric defaults) before aggregating.

Copilot · 2026-04-17T08:34:06Z

+import numpy as np
+import psutil
+import torch
+


This script imports psutil unconditionally, but the repository dependency manifests don’t appear to declare psutil (and other modules use an optional try/except ImportError pattern). To avoid ModuleNotFoundError for users running the benchmark, either add psutil to the project/dev dependencies or make the import optional and gracefully disable CPU memory reporting when unavailable.

Copilot · 2026-04-17T08:34:06Z

 import numpy as np
+import psutil
 import torch


This benchmark imports psutil unconditionally. If psutil isn’t installed (it doesn’t seem to be listed in the repo’s dependency files), running the benchmark will fail with ModuleNotFoundError. Consider using the existing optional-import pattern used elsewhere in the codebase (e.g., try/except ImportError) or ensure psutil is added to the dev dependencies.

Copilot · 2026-04-17T08:34:06Z

+def _write_markdown_report(
+    benchmark_name: str,
+    perf_rows: list[dict[str, object]],
+    metric_rows: list[dict[str, object]],
+    notes: list[str] | None = None,
+) -> Path:
+    """Write benchmark results to a markdown report with two tables."""
+    output_dir = Path("outputs/benchmarks")
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    report_path = output_dir / f"{benchmark_name}_{timestamp}.md"
+
+    lines: list[str] = [
+        f"# {benchmark_name} Benchmark Report",
+        "",
+        f"Generated at: {datetime.now().isoformat(timespec='seconds')}",
+        "",
+        "## Time & Memory",
+        "",
+    ]
+    lines.extend(_format_markdown_table(perf_rows))
+    lines.extend(["", "## Success & Other Metrics", ""])
+    lines.extend(_format_markdown_table(metric_rows))
+


This benchmark writes a markdown report with only two tables ("Time & Memory" and "Success & Other Metrics") and even notes it contains exactly two tables. However, the new benchmark skill doc added in this PR states benchmark reports must contain exactly three tables, including a "Leaderboard" table (see .claude/skills/benchmark/SKILL.md around the "Save Results to One Markdown Report" section). Please either add a leaderboard table to this report (and compute leaderboard rows) or adjust the documented convention so they match.

upgrade

d852752

Copilot AI review requested due to automatic review settings April 16, 2026 03:58

matafela marked this pull request as draft April 16, 2026 03:59

Copilot started reviewing on behalf of matafela April 16, 2026 03:59 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

chenjian added 2 commits April 16, 2026 18:39

update benchmark and skill

2b8be61

add pytorch kinematic benchmark

f16b96f

Copilot AI review requested due to automatic review settings April 17, 2026 07:51

Copilot started reviewing on behalf of matafela April 17, 2026 07:52 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

chenjian added 3 commits April 17, 2026 16:27

merge opw and kinematic solver

68800ee

update

3b1fcf0

style

09df738

Copilot AI review requested due to automatic review settings April 17, 2026 08:28

Copilot started reviewing on behalf of matafela April 17, 2026 08:28 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

matafela marked this pull request as ready for review April 17, 2026 09:33

matafela changed the title ~~Draft: Upgrade pytorch kinematics~~ Refactor: benchmark Apr 17, 2026

yuecideng approved these changes Apr 20, 2026

View reviewed changes

matafela merged commit 50843a5 into main Apr 20, 2026
11 of 16 checks passed

matafela deleted the cj/upgrade-pytorch-kinematics branch April 20, 2026 06:28

	fk_xpos, joint_seed=qpos, return_all_solutions=False
	fk_xpos, qpos_seed=qpos, return_all_solutions=False

	help="Benchmark the OPW kinematic solver (FK/IK accuracy and speed).",
	help="Benchmark kinematic solvers (OPW/PyTorch) for FK/IK accuracy and speed.",

Conversation

matafela commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matafela commented Apr 16, 2026 •

edited

Loading