feat(grid search args): rename --use-tweedie; switch --models to --model; switch --methods to --method#167
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR consolidates CLI argument handling by converting plural flags to singular (--models to --model, --methods to --method) to enforce single model/method selection, and replaces the --use-tweedie boolean flag with a flexible --step-scaler-type flag that supports multiple scaler options (dataspace, noisespace, none). Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
run_grid_search.py (1)
292-345: Scopemethodto Boltz-2 jobs to avoid irrelevant run permutations.
method_suffixandJobConfig.methodare currently applied to every model. For non-Boltz2 models, this creates extra path/key variance without changing behavior.Refactor sketch
- method_suffix = f"_{args.method.replace(' ', '_')}" if args.method else "" + effective_method = args.method if model == StructurePredictor.BOLTZ_2 else None + method_suffix = f"_{effective_method.replace(' ', '_')}" if effective_method else "" ... - method=args.method, + method=effective_method, ... - method=args.method, + method=effective_method,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@run_grid_search.py` around lines 292 - 345, The code applies method_suffix and sets JobConfig.method for every model, causing irrelevant permutations; change the logic so method_suffix and the JobConfig.method field are only set when the model is the Boltz-2 model (e.g., check if model == "Boltz-2" or whatever identifier you use for Boltz-2), otherwise use an empty method_suffix and pass None (or an empty string) for the JobConfig.method; update the loops that construct output_dir (where method_suffix is used) and the JobConfig instantiation (where method=args.method is passed) to conditionally include the method only for Boltz-2 jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@run_grid_search.py`:
- Around line 433-436: The CLI argument for "--model" currently omits "boltz1"
from the choices, causing valid workflows to be rejected; update the parser
definition in run_grid_search.py (the "--model" argument choices list) to
re-include "boltz1" alongside "boltz2", "protenix", and "rf3" so the flag
accepts the legacy Boltz-1 option that get_pixi_env() and related code paths
still support; ensure any downstream validation that checks model names (e.g.,
any references to get_pixi_env or model-specific branching) will handle "boltz1"
consistently.
In `@src/sampleworks/utils/guidance_script_arguments.py`:
- Line 127: The assignment self.step_scaler_type = args.step_scaler_type can
raise AttributeError if args lacks that attribute; change it to use
getattr(args, "step_scaler_type", "noisespace") so self.step_scaler_type always
has a safe default consistent with runtime selection; update the
constructor/initializer where self.step_scaler_type is set (search for the
symbol self.step_scaler_type and the assignment from args) to use getattr with
the "noisespace" fallback.
---
Nitpick comments:
In `@run_grid_search.py`:
- Around line 292-345: The code applies method_suffix and sets JobConfig.method
for every model, causing irrelevant permutations; change the logic so
method_suffix and the JobConfig.method field are only set when the model is the
Boltz-2 model (e.g., check if model == "Boltz-2" or whatever identifier you use
for Boltz-2), otherwise use an empty method_suffix and pass None (or an empty
string) for the JobConfig.method; update the loops that construct output_dir
(where method_suffix is used) and the JobConfig instantiation (where
method=args.method is passed) to conditionally include the method only for
Boltz-2 jobs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f9da7a3-0033-4f05-bef5-b65da64a6122
📒 Files selected for processing (3)
run_grid_search.pysrc/sampleworks/utils/guidance_script_arguments.pysrc/sampleworks/utils/guidance_script_utils.py
k-chrispens
left a comment
There was a problem hiding this comment.
we should add the boltz1 option back, but other than that looks good! will leave to you to merge.
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/sampleworks/core/rewards/real_space_density.py (1)
258-289:⚠️ Potential issue | 🟠 MajorKeep
structure_to_reward_input()on the shared extraction path.This now diverges from
extract_density_inputs_from_atomarray(): it skips invalid-coordinate filtering, NaN B-factor repair, and the float32 coercion used everywhere else in the reward path. It also still advertisesdict[str, Float[...]]even though"elements"is nowtorch.long, so callers using this helper can see different runtime and type-checking behavior than the main reward pipeline.As per coding guidelines, "Use cast() or `# ty:ignore[...]` with explanatory comments when type checking cannot be satisfied; treat type errors as real issues."🩹 Suggested simplification
- def structure_to_reward_input(self, structure: dict) -> dict[str, Float[torch.Tensor, "..."]]: - atom_array = structure["asym_unit"] - mask = atom_array.occupancy > 0 - if isinstance(atom_array, AtomArrayStack): - atom_array = atom_array[:, mask] - else: - atom_array = atom_array[mask] - - element_indices = elements_to_scattering_indices(atom_array.element) - - elements = torch.tensor(element_indices, device=self.device, dtype=torch.long).unsqueeze(0) - b_factors = torch.from_numpy(atom_array.b_factor).to(self.device).unsqueeze(0) - occupancies = torch.from_numpy(atom_array.occupancy).to(self.device).unsqueeze(0) - - coordinates = torch.from_numpy(atom_array.coord).to(self.device) - if coordinates.ndim == 2: - coordinates = coordinates.unsqueeze(0) - - n_models = coordinates.shape[0] - elements = match_batch(elements, target_batch_size=n_models) - b_factors = match_batch(b_factors, target_batch_size=n_models) - occupancies = match_batch(occupancies, target_batch_size=n_models) + def structure_to_reward_input(self, structure: dict) -> dict[str, torch.Tensor]: + coordinates, elements, b_factors, occupancies = extract_density_inputs_from_atomarray( + structure["asym_unit"], self.device + ) return { "coordinates": coordinates, "elements": elements, "b_factors": b_factors, "occupancies": occupancies, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/core/rewards/real_space_density.py` around lines 258 - 289, structure_to_reward_input diverges from extract_density_inputs_from_atomarray by skipping invalid-coordinate filtering, NaN b-factor repair, and float32 coercion—and its return type annotation falsely claims Float tensors while "elements" is torch.long. Fix by applying the same preprocessing as extract_density_inputs_from_atomarray (filter out invalid coords, repair NaN b_factors, coerce numeric tensors to torch.float32), or simply call/reuse extract_density_inputs_from_atomarray inside structure_to_reward_input to obtain coordinates/elements/b_factors/occupancies; then ensure "elements" remains torch.long but update the function's declared return types (or add a cast/# type: ignore with comment) so type-checking matches runtime.
♻️ Duplicate comments (1)
src/sampleworks/utils/guidance_script_arguments.py (1)
188-188:⚠️ Potential issue | 🟡 MinorUse a defensive fallback for
step_scaler_type.Direct access to
args.step_scaler_typecan fail withAttributeErrorif callers constructargparse.Namespacewithout that field. Usinggetattrwith a default matches the fallback pattern.Proposed fix
- self.step_scaler_type = args.step_scaler_type + self.step_scaler_type = getattr(args, "step_scaler_type", "noisespace")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/utils/guidance_script_arguments.py` at line 188, Replace the direct attribute access on args with a defensive getattr to avoid AttributeError when callers pass an argparse.Namespace missing step_scaler_type: change the assignment that sets self.step_scaler_type (in guidance_script_arguments.py) to use getattr(args, "step_scaler_type", <fallback>) so it defaults to a safe value (e.g., None or the module's existing default) instead of accessing args.step_scaler_type directly.
🧹 Nitpick comments (1)
src/sampleworks/utils/guidance_script_utils.py (1)
439-453: Normalizestep_scaler_typethrough one canonical vocabulary.
_run_guidance()now expectsdataspace/noisespace/none, while the rest of the codebase already exposesdata_space_dps/noise_space_dps/no_scalingviaStepScalers. Keeping both spellings makes configs and job queues brittle.♻️ Example alias-based normalization
-from sampleworks.utils.guidance_constants import ( - GuidanceType, - StructurePredictor, -) +from sampleworks.utils.guidance_constants import ( + GuidanceType, + StepScalers, + StructurePredictor, +) ... - step_scaler_type = getattr(args, "step_scaler_type", "noisespace") - if step_scaler_type == "dataspace": + step_scaler_aliases = { + "dataspace": StepScalers.DATA_SPACE_DPS, + "data_space_dps": StepScalers.DATA_SPACE_DPS, + "noisespace": StepScalers.NOISE_SPACE_DPS, + "noise_space_dps": StepScalers.NOISE_SPACE_DPS, + "none": StepScalers.NO_SCALING, + "no_scaling": StepScalers.NO_SCALING, + } + step_scaler_type = step_scaler_aliases[getattr(args, "step_scaler_type", "noisespace")] + if step_scaler_type is StepScalers.DATA_SPACE_DPS: step_scaler = DataSpaceDPSScaler( step_size=args.step_size, gradient_normalization=args.gradient_normalization, ) - elif step_scaler_type == "noisespace": + elif step_scaler_type is StepScalers.NOISE_SPACE_DPS: step_scaler = NoiseSpaceDPSScaler( step_size=args.step_size, gradient_normalization=args.gradient_normalization, ) - elif step_scaler_type == "none": + elif step_scaler_type is StepScalers.NO_SCALING: step_scaler = NoScalingScaler()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampleworks/utils/guidance_script_utils.py` around lines 439 - 453, The code uses ad-hoc values for step_scaler_type in _run_guidance() (e.g., "dataspace"/"noisespace"/"none") which conflicts with the canonical StepScalers vocabulary used elsewhere (data_space_dps/noise_space_dps/no_scaling); add a small alias normalization before the existing if/elif block: create an alias map that maps "dataspace"->"data_space_dps", "noisespace"->"noise_space_dps", "none"->"no_scaling" (and pass-through unknowns), then assign step_scaler_type = alias_map.get(step_scaler_type, step_scaler_type) and update the conditional to check the canonical names (or switch to a mapping from canonical name to class: data_space_dps->DataSpaceDPSScaler, noise_space_dps->NoiseSpaceDPSScaler, no_scaling->NoScalingScaler) so _run_guidance(), step_scaler_type, DataSpaceDPSScaler, NoiseSpaceDPSScaler, and NoScalingScaler all use the single canonical vocabulary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@GRID_SEARCH.md`:
- Around line 29-30: Update GRID_SEARCH.md to use the singular CLI flags that
the parser expects: replace occurrences of "--models" with "--model" and
"--methods" with "--method" in the documented example commands (these correspond
to the parser changes in run_grid_search.py around the argument handling for
model/method). Ensure all examples (including the blocks around the previous
lines 29-30 and 54-58) consistently use "--model" and "--method" so the docs
match the code.
- Line 85: Replace the typo "imput" with "input" in the user-facing text that
currently reads "the Boltz imput YAML file" (search for the exact phrase "Boltz
imput YAML file" in GRID_SEARCH.md) so the line reads "the Boltz input YAML
file".
- Line 29: Update the documented model options in GRID_SEARCH.md to match the
argparse choices enforced in run_grid_search.py (the argument parser around
lines where model choices are defined, e.g., the parser.add_argument call for
the --models flag in run_grid_search.py:402-425) by removing "boltz1" and
listing only "boltz2, protenix, rf3" (ensure both occurrences at the two lines
noted are changed).
- Around line 66-68: Remove the obsolete reference to --use-tweedie in
GRID_SEARCH.md and replace it with documentation for the new CLI flag
--step-scaler-type (default "noisespace"); note that this flag replaces
--use-tweedie and can affect whether changing the flag triggers a re-run
(similar to other flags that are/aren't reflected in the directory structure),
and mention the default value and where the option is defined (the parser
handling --step-scaler-type in run_grid_search.py around the previous
--use-tweedie logic). Ensure the sentence about using --force-all to re-run
remains and update any examples or flag lists to show --step-scaler-type instead
of --use-tweedie.
In `@run_all_models.sh`:
- Line 17: The script currently uses "set -e" which doesn't propagate failures
through pipelines so a failing "docker run ... | tee" can be masked; update the
top-level shell options to enable pipefail (e.g., replace the existing "set -e"
with "set -e -o pipefail" or "set -eo pipefail") and/or explicitly check
PIPESTATUS after each pipeline that runs "docker run" piped to "tee" (capture
PIPESTATUS[0] into a variable and exit non‑zero if it failed); search for
occurrences of "docker run", "tee" and the existing "set -e" in
run_all_models.sh and apply one of these fixes consistently (modifying the
global set flags is preferred).
- Around line 124-133: In the protenix invocation in run_all_models.sh replace
the incorrect flag --models with --model and add the missing --method flag
(matching how other blocks call run_grid_search.py), e.g., update the protenix
run_grid_search.py CLI line that currently contains --models protenix to use
--model protenix and include the appropriate --method <method_name> token used
elsewhere in the script so the command mirrors the fixed RosettaFold3 block.
- Around line 59-69: The CLI flags in the boltz invocation still use the old
plural names (--models and --methods) which were renamed to --model and --method
in run_grid_search.py; update the boltz command invocation in run_all_models.sh
(the line invoking run_grid_search.py) to use --model boltz2 and --method "X-RAY
DIFFRACTION" instead of --models and --methods so the script matches the new CLI
signature used by the run_grid_search.py entrypoint.
- Around line 81-91: The command in run_all_models.sh uses plural flag names but
the CLI expects singular flags; update the boltz run_grid_search.py invocation
to use --model instead of --models, --method instead of --methods, --scaler
instead of --scalers, --ensemble-size instead of --ensemble-sizes and
--gradient-weight instead of --gradient-weights while keeping the same values
(e.g., replace --models boltz2, --methods "MD", --scalers pure_guidance,
--ensemble-sizes "8", --gradient-weights "0.1 0.2 0.5" with their singular
equivalents) so the flags passed to the script match the expected parameter
names.
- Around line 103-112: Change the rf3 command invocation to use the correct
argument name --model (singular) instead of --models when calling
run_grid_search.py; update the command that launches rf3 (the block invoking
"rf3 run_grid_search.py") to pass --model rf3. Also review the --method flag in
that same invocation: either explicitly set a rf3-appropriate value instead of
relying on the Boltz2 default ("X-RAY DIFFRACTION") or remove the --method flag
if RosettaFold3 (rf3) does not use that concept, so the invocation’s flags match
the run_grid_search.py parser and rf3 semantics.
In `@src/sampleworks/models/rf3/wrapper.py`:
- Around line 357-371: The x_init tensor returned by match_batch can be a
broadcast view and should be cloned to avoid shared-memory in-place mutations;
update the branch that sets x_init = match_batch(x_init.unsqueeze(0),
target_batch_size=ensemble_size) to call .clone() on the result (ensuring
device/dtype are preserved) so it matches the pattern used by other wrappers
(see Boltz/Protenix) and prevent corruption during later in-place diffusion
updates; leave the initialize_from_prior path unchanged.
In `@src/sampleworks/utils/guidance_script_arguments.py`:
- Around line 38-52: The current logic raises ValueError when resolved is empty
and then attempts to check Path(resolved).exists(), which is unreachable; update
the control flow in the function handling candidates/resolved so you first set
resolved = candidates[0] if candidates else "" then if not resolved raise the
ValueError, otherwise perform the filesystem check (if not
Path(resolved).exists() raise ValueError about missing checkpoint) and return
resolved; also replace the EN DASH (–) in the comment with a regular
hyphen-minus (-) to remove the static-analysis warning. Ensure you adjust the
branches around the resolved variable and the Path(...) existence check
accordingly.
---
Outside diff comments:
In `@src/sampleworks/core/rewards/real_space_density.py`:
- Around line 258-289: structure_to_reward_input diverges from
extract_density_inputs_from_atomarray by skipping invalid-coordinate filtering,
NaN b-factor repair, and float32 coercion—and its return type annotation falsely
claims Float tensors while "elements" is torch.long. Fix by applying the same
preprocessing as extract_density_inputs_from_atomarray (filter out invalid
coords, repair NaN b_factors, coerce numeric tensors to torch.float32), or
simply call/reuse extract_density_inputs_from_atomarray inside
structure_to_reward_input to obtain coordinates/elements/b_factors/occupancies;
then ensure "elements" remains torch.long but update the function's declared
return types (or add a cast/# type: ignore with comment) so type-checking
matches runtime.
---
Duplicate comments:
In `@src/sampleworks/utils/guidance_script_arguments.py`:
- Line 188: Replace the direct attribute access on args with a defensive getattr
to avoid AttributeError when callers pass an argparse.Namespace missing
step_scaler_type: change the assignment that sets self.step_scaler_type (in
guidance_script_arguments.py) to use getattr(args, "step_scaler_type",
<fallback>) so it defaults to a safe value (e.g., None or the module's existing
default) instead of accessing args.step_scaler_type directly.
---
Nitpick comments:
In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 439-453: The code uses ad-hoc values for step_scaler_type in
_run_guidance() (e.g., "dataspace"/"noisespace"/"none") which conflicts with the
canonical StepScalers vocabulary used elsewhere
(data_space_dps/noise_space_dps/no_scaling); add a small alias normalization
before the existing if/elif block: create an alias map that maps
"dataspace"->"data_space_dps", "noisespace"->"noise_space_dps",
"none"->"no_scaling" (and pass-through unknowns), then assign step_scaler_type =
alias_map.get(step_scaler_type, step_scaler_type) and update the conditional to
check the canonical names (or switch to a mapping from canonical name to class:
data_space_dps->DataSpaceDPSScaler, noise_space_dps->NoiseSpaceDPSScaler,
no_scaling->NoScalingScaler) so _run_guidance(), step_scaler_type,
DataSpaceDPSScaler, NoiseSpaceDPSScaler, and NoScalingScaler all use the single
canonical vocabulary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1d7f20e-cd12-42c3-9d2e-3c60fd3b073e
📒 Files selected for processing (19)
.github/workflows/docker.ymlDockerfileGRID_SEARCH.mdrun_all_models.shscripts/eval/rscc_grid_search_script.pysrc/sampleworks/core/forward_models/xray/real_space_density_deps/qfit/sf.pysrc/sampleworks/core/rewards/protocol.pysrc/sampleworks/core/rewards/real_space_density.pysrc/sampleworks/core/samplers/edm.pysrc/sampleworks/models/rf3/wrapper.pysrc/sampleworks/utils/density_utils.pysrc/sampleworks/utils/elements.pysrc/sampleworks/utils/guidance_script_arguments.pysrc/sampleworks/utils/guidance_script_utils.pytests/conftest.pytests/eval/test_eval_dataclasses.pytests/integration/test_mismatch_integration.pytests/integration/test_pipeline_integration.pytests/rewards/test_real_space_density_reward.py
✅ Files skipped from review due to trivial changes (2)
- .github/workflows/docker.yml
- Dockerfile
| ```bash | ||
| pixi run -e boltz python run_grid_search.py \ | ||
| --proteins proteins.csv \ | ||
| --models boltz2 \ # options: boltz1, boltz2, protenix, rf3 (make sure env aligns!) |
There was a problem hiding this comment.
Fix documented model choices to match parser constraints.
Line 29 and Line 54 include boltz1, but run_grid_search.py:402-425 only allows boltz2, protenix, rf3. Please align the option list with actual argparse choices.
Also applies to: 54-54
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@GRID_SEARCH.md` at line 29, Update the documented model options in
GRID_SEARCH.md to match the argparse choices enforced in run_grid_search.py (the
argument parser around lines where model choices are defined, e.g., the
parser.add_argument call for the --models flag in run_grid_search.py:402-425) by
removing "boltz1" and listing only "boltz2, protenix, rf3" (ensure both
occurrences at the two lines noted are changed).
| | `W` | the gradient weight. | | ||
|
|
||
| In that directory, you will find the following files generated by Sampleworks | ||
| (in addition to model-specific files, e.g. the Boltz imput YAML file): |
There was a problem hiding this comment.
Typo in user-facing docs.
Line 85: “imput” → “input”.
🧰 Tools
🪛 LanguageTool
[grammar] ~85-~85: Ensure spelling is correct
Context: ...to model-specific files, e.g. the Boltz imput YAML file): - refined.cif: the refined...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@GRID_SEARCH.md` at line 85, Replace the typo "imput" with "input" in the
user-facing text that currently reads "the Boltz imput YAML file" (search for
the exact phrase "Boltz imput YAML file" in GRID_SEARCH.md) so the line reads
"the Boltz input YAML file".
| # Usage: | ||
| # ./run_all_models.sh | ||
|
|
||
| set -e |
There was a problem hiding this comment.
Background job failures may go undetected.
With set -e but without set -o pipefail, the exit code of docker run is masked by tee. If a Docker job fails, the script will still report "All jobs completed!" and exit 0.
Proposed fix to detect job failures
set -e
+set -o pipefailAlternatively, capture and check exit codes explicitly:
-# Wait for all background jobs
-wait
+# Wait for all background jobs and track failures
+FAILED=0
+for pid in "${PIDS[@]}"; do
+ if ! wait "$pid"; then
+ echo "Job PID $pid failed"
+ FAILED=1
+ fi
+done
echo ""
echo "=========================================="
-echo "[$(date)] All jobs completed!"
+if [ "$FAILED" -eq 0 ]; then
+ echo "[$(date)] All jobs completed successfully!"
+else
+ echo "[$(date)] Some jobs failed. Check logs for details."
+ exit 1
+fi
echo "=========================================="Also applies to: 150-151
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@run_all_models.sh` at line 17, The script currently uses "set -e" which
doesn't propagate failures through pipelines so a failing "docker run ... | tee"
can be masked; update the top-level shell options to enable pipefail (e.g.,
replace the existing "set -e" with "set -e -o pipefail" or "set -eo pipefail")
and/or explicitly check PIPESTATUS after each pipeline that runs "docker run"
piped to "tee" (capture PIPESTATUS[0] into a variable and exit non‑zero if it
failed); search for occurrences of "docker run", "tee" and the existing "set -e"
in run_all_models.sh and apply one of these fixes consistently (modifying the
global set flags is preferred).
| # x_init here is a shape-compatible reference carried with the featurized | ||
| # model input. During guided sampling, alignment/reference coordinates are | ||
| # built later via process_structure_to_trajectory_input() and AtomReconciler. | ||
| # TODO: figure out if this is necessary or if we should just remove x_init completely. | ||
| if len(true_atom_array) == num_atoms: | ||
| x_init = torch.tensor(true_atom_array.coord, device=self.device, dtype=torch.float32) | ||
| x_init = match_batch(x_init.unsqueeze(0), target_batch_size=ensemble_size).clone() | ||
| x_init = match_batch(x_init.unsqueeze(0), target_batch_size=ensemble_size) | ||
| else: | ||
| logger.warning( | ||
| "True structure not available or atom count mismatch; initializing " | ||
| "x_init from prior. This means align_to_input will not work properly," | ||
| " and reward functions dependent on this won't be accurate." | ||
| logger.info( | ||
| f"Input structure has {len(true_atom_array)} atoms, but RF3 operates on " | ||
| f"{num_atoms} atoms after model-specific preprocessing. Initializing " | ||
| "x_init from prior to match model shape; guided sampling will " | ||
| "reconcile alignment and reward inputs later on the common atom subset." | ||
| ) | ||
| x_init = self.initialize_from_prior(batch_size=ensemble_size, shape=(num_atoms, 3)) |
There was a problem hiding this comment.
Missing .clone() after match_batch may cause shared memory issues.
The match_batch function returns a broadcast view (no copy) when the input batch size is 1. If x_init is later mutated in-place during diffusion steps, this could corrupt the original tensor or cause unexpected behavior.
Other wrappers (Boltz at Line 720, Protenix at Line 512) retain .clone() after match_batch for this reason.
Proposed fix
if len(true_atom_array) == num_atoms:
x_init = torch.tensor(true_atom_array.coord, device=self.device, dtype=torch.float32)
- x_init = match_batch(x_init.unsqueeze(0), target_batch_size=ensemble_size)
+ x_init = match_batch(x_init.unsqueeze(0), target_batch_size=ensemble_size).clone()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/sampleworks/models/rf3/wrapper.py` around lines 357 - 371, The x_init
tensor returned by match_batch can be a broadcast view and should be cloned to
avoid shared-memory in-place mutations; update the branch that sets x_init =
match_batch(x_init.unsqueeze(0), target_batch_size=ensemble_size) to call
.clone() on the result (ensuring device/dtype are preserved) so it matches the
pattern used by other wrappers (see Boltz/Protenix) and prevent corruption
during later in-place diffusion updates; leave the initialize_from_prior path
unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@run_grid_search.py`:
- Line 543: Update the misleading log and help text that refer specifically to
"Boltz2": change the runtime log call log.info(f"Boltz2 method: {args.method}")
to a neutral message such as log.info(f"Method: {args.method}") and update the
argparse help string where --method is defined (the parser.add_argument call
that sets the help for --method) to a generic description like "Method parameter
(applies to the selected model; e.g., boltz2, boltz1, protenix, rf3)" so the
help and logging correctly reflect that --method is not exclusive to Boltz2.
| log.info(f"Gradient weights: {args.gradient_weights}") | ||
| log.info(f"GD steps: {args.num_gd_steps}") | ||
| log.info(f"Boltz2 methods: {args.methods}") | ||
| log.info(f"Boltz2 method: {args.method}") |
There was a problem hiding this comment.
Log message is misleading for non-Boltz2 models.
The log says "Boltz2 method" but --method applies regardless of which model is selected. Similarly, the help text at line 447 says "Method for Boltz2" which is confusing when using boltz1, protenix, or rf3.
Suggested fix
- log.info(f"Boltz2 method: {args.method}")
+ log.info(f"Method: {args.method}")And at line 447:
- help="Method for Boltz2 ('X-RAY DIFFRACTION', 'MD')",
+ help="Experimental method ('X-RAY DIFFRACTION', 'MD')",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@run_grid_search.py` at line 543, Update the misleading log and help text that
refer specifically to "Boltz2": change the runtime log call log.info(f"Boltz2
method: {args.method}") to a neutral message such as log.info(f"Method:
{args.method}") and update the argparse help string where --method is defined
(the parser.add_argument call that sets the help for --method) to a generic
description like "Method parameter (applies to the selected model; e.g., boltz2,
boltz1, protenix, rf3)" so the help and logging correctly reflect that --method
is not exclusive to Boltz2.
ea968cf to
5c35e21
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
run_grid_search.py (1)
276-277: Consider adding a NumPy-style docstring.The
generate_jobsfunction lacks documentation. Per coding guidelines, functions should include NumPy-style docstrings describing parameters and return values.📝 Suggested docstring
def generate_jobs(args: argparse.Namespace) -> list[JobConfig]: + """ + Generate job configurations for grid search. + + Parameters + ---------- + args : argparse.Namespace + Parsed command-line arguments including model, method, scalers, + ensemble sizes, gradient weights, and GD steps. + + Returns + ------- + list[JobConfig] + List of job configurations covering all parameter combinations. + """ jobs = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@run_grid_search.py` around lines 276 - 277, Add a NumPy-style docstring to the generate_jobs function: document the args parameter (type argparse.Namespace) and describe what fields of args are used, and document the return value (list[JobConfig]) including what each JobConfig represents; place the docstring directly under the def generate_jobs(...): signature using the NumPy format with Parameters and Returns sections so the function is properly documented for readers and automated tools.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@run_grid_search.py`:
- Around line 276-277: Add a NumPy-style docstring to the generate_jobs
function: document the args parameter (type argparse.Namespace) and describe
what fields of args are used, and document the return value (list[JobConfig])
including what each JobConfig represents; place the docstring directly under the
def generate_jobs(...): signature using the NumPy format with Parameters and
Returns sections so the function is properly documented for readers and
automated tools.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b20cd6e7-cb44-4c1f-ab48-4de73e48b87d
📒 Files selected for processing (3)
run_grid_search.pysrc/sampleworks/utils/guidance_script_arguments.pysrc/sampleworks/utils/guidance_script_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/sampleworks/utils/guidance_script_arguments.py
…n run_all_models.sh
5c35e21 to
48f3490
Compare
rename --methods to --method and propagate the change, resolves #151
rename --models to --model and propagate the change; resolves #150
feat(grid search args): rename --use-tweedie argument to --step-scaler-type which defaults to noisespace, resolves #164
Summary by CodeRabbit