Skip to content

Recipe Modularization, DAG Integration, and BigQuery Integration#2715

Merged
copybara-service[bot] merged 1 commit intomainfrom
lidanny/pw-DAG-integrate2
Feb 12, 2026
Merged

Recipe Modularization, DAG Integration, and BigQuery Integration#2715
copybara-service[bot] merged 1 commit intomainfrom
lidanny/pw-DAG-integrate2

Conversation

@DannyLiCom
Copy link
Copy Markdown
Collaborator

@DannyLiCom DannyLiCom commented Nov 19, 2025

Description

When integrating Recipe Modularization, DAG Integration, and BigQuery, the DAG will pass in a command that includes multiple command-line flags.

Tests

Users can use commands similar to the following to run tests

python3 -m benchmarks.recipes.pw_mcjax_benchmark_recipe \
--user='user' \
--cluster_name='pw-scale-test-v5e-32' \
--project='cloud-tpu-multipod-dev' \
--zone='us-south1-a' \
--device_type='v5litepod-32' \
--benchmark_steps=20 \
--num_slices_list='2' \
--server_image='gcr.io/tpu-prod-env-one-vm/lidanny/unsanitized_server:latest' \
--proxy_image='gcr.io/tpu-prod-env-one-vm/lidanny/unsanitized_proxy_server:latest' \
--runner='gcr.io/tpu-prod-env-one-vm/lidanny_latest:latest' \
--selected_model_framework='pathways' \
--selected_model_names='llama3_1_8b_8192_v5e_256' \
--priority='medium' \
--max_restarts=0 \
--bq_enable=True \
--bq_db_project='cloud-tpu-multipod-dev' \
--bq_db_dataset="chzheng_test_100steps"

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from a961644 to f2a46a6 Compare November 19, 2025 05:46
@DannyLiCom DannyLiCom marked this pull request as ready for review November 19, 2025 06:05
@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from f2a46a6 to 3840670 Compare November 23, 2025 16:39
@DannyLiCom DannyLiCom requested a review from notabee as a code owner November 23, 2025 16:39
@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch 2 times, most recently from 8adde2d to b08ea95 Compare November 28, 2025 02:17


def handle_cmd_args(cluster_config: XpkClusterConfig, *actions: str, **kwargs) -> bool:
def handle_cmd_args(cluster_config: XpkClusterConfig, is_delete: bool, user: str, **kwargs) -> bool:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the original recipe could only accept the --delete flag, this section had to be modified to allow it to accept multiple flags.

parser_utils.add_arguments(parser)
args = parser.parse_args()

if len(sys.argv) > 2:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is implemented to determine whether multiple flags are being used.

@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch 3 times, most recently from 0303bcf to 78b8073 Compare December 17, 2025 09:02
@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch 3 times, most recently from ce0fc18 to c37edc0 Compare December 29, 2025 02:19
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! I run the PR through AI and got these comments: https://paste.googleplex.com/6085130735190016. I looked at some points and they make sense to me, so I left inline comments. But there are more points, could you take a look?

Comment thread benchmarks/recipes/parser_utils.py Outdated
Comment thread benchmarks/recipes/parser_utils.py Outdated
Comment thread tools/dev/code_style.sh Outdated
Comment thread src/MaxText/layers/attention_op.py
Comment thread src/MaxText/layers/moe.py
Comment thread benchmarks/recipes/pw_suspend_resume.py Outdated
Comment thread benchmarks/recipes/parser_utils.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 5, 2026

🤖 Hi @aireenmei, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This pull request introduces significant refactoring to the benchmarking recipes, modularizing argument parsing and improving configuration management. The changes make the recipes more flexible and easier to use from the command line, especially for DAG integration.

🔍 General Feedback

  • The introduction of parser_utils.py is a good step towards centralizing command-line argument handling.
  • The use of a UserConfig class helps in managing configurations cleanly.
  • The addition of check_and_create_bucket is a useful utility, though its error handling could be improved in the calling code.

Comment thread benchmarks/recipes/parser_utils.py
Comment thread benchmarks/recipes/pw_mcjax_benchmark_recipe.py Outdated
Comment thread benchmarks/recipes/pw_utils.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 5, 2026

🤖 I'm sorry @aireenmei, but I was unable to process your request. Please see the logs for more details.

@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from 9bb39cd to a57612a Compare February 6, 2026 01:04
@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

Hi @aireenmei I noticed a lot of these errors in Pylint.
E0611: No name 'multimodal' in module 'MaxText' (no-name-in-module)
And they were all from code that was just merged
https://screenshot.googleplex.com/8ddsSFkx9RuLo6u

@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

@aireenmei Also, these failures don't seem to be related to my code at all.
https://screenshot.googleplex.com/4BRE3k8xXhoft4s

@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from a57612a to 2e66a21 Compare February 6, 2026 01:20
@ycchenzheng ycchenzheng requested a review from aireenmei February 6, 2026 16:01
@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from 2e66a21 to cd65cbd Compare February 7, 2026 01:52
@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

Hi @aireenmei I noticed a lot of these errors in Pylint. E0611: No name 'multimodal' in module 'MaxText' (no-name-in-module) And they were all from code that was just merged https://screenshot.googleplex.com/8ddsSFkx9RuLo6u

Resolved

@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

@aireenmei Also, these failures don't seem to be related to my code at all. https://screenshot.googleplex.com/4BRE3k8xXhoft4s

Resolved

@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

Hi @aireenmei Could you please take another look? If everything looks good, could you help approve it?

Copy link
Copy Markdown
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR makes change in multiple recipes, but in the description only benchmarks.recipes.pw_mcjax_benchmark_recipe is tested. Could you make sure other recipes are also tested?

Comment thread benchmarks/recipes/pw_headless_mode.py Outdated
@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

The PR makes change in multiple recipes, but in the description only benchmarks.recipes.pw_mcjax_benchmark_recipe is tested. Could you make sure other recipes are also tested?

Hi @aireenmei Exactly. We are focusing on benchmarks.recipes.pw_mcjax_benchmark_recipe. Our current large-scale tests rely almost exclusively on this recipe. Once this is merged, we will standardize all other recipes to follow this model. In other words, the other recipes do not yet support multiple command-line flags. My changes to those recipes are simply because I modified some utils, which required minor adjustments to keep everything consistent. Currently, pw_headless_mode.py on the main branch may still encounter issues.

@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from cd65cbd to e5f5846 Compare February 12, 2026 06:52
@DannyLiCom
Copy link
Copy Markdown
Collaborator Author

Another reason is that integrating every recipe following the pw_mcjax_benchmark_recipe pattern all at once would make the PR too difficult to review. Therefore, we will submit one PR per recipe integration to keep things manageable. This is the first and most important recipe to be integrated; it will serve as the template for all future recipes.

@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch 4 times, most recently from b7a31a8 to 8464c0b Compare February 12, 2026 07:47
@DannyLiCom DannyLiCom force-pushed the lidanny/pw-DAG-integrate2 branch from 8464c0b to 962b696 Compare February 12, 2026 07:49
Copy link
Copy Markdown
Collaborator

@aireenmei aireenmei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for revising and the clarification!

@copybara-service copybara-service Bot merged commit fa52b6a into main Feb 12, 2026
48 checks passed
@copybara-service copybara-service Bot deleted the lidanny/pw-DAG-integrate2 branch February 12, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants