Skip to content

Support GCP region override for vertex ai jobs#294

Merged
kmontemayor2-sc merged 6 commits intomainfrom
kmonte/add-region-override
Aug 28, 2025
Merged

Support GCP region override for vertex ai jobs#294
kmontemayor2-sc merged 6 commits intomainfrom
kmonte/add-region-override

Conversation

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator

@kmontemayor2-sc kmontemayor2-sc commented Aug 26, 2025

Add gcp_region_override as a proto field to allow launching VAI jobs is a separate region.

We do this so that if there are more GPUS in one region we can launch training jobs there, even if there are generally more resources in the region specified in CommonComputeConfig.region.

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator Author

/unit_test

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator Author

/integration_test

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator Author

/e2e_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 26, 2025

GiGL Automation

@ 24:19:17UTC : 🔄 Unit Test started.

@ 24:58:36UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 26, 2025

GiGL Automation

@ 24:19:20UTC : 🔄 E2E Test started.

@ 01:39:49UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 26, 2025

GiGL Automation

@ 24:19:22UTC : 🔄 Integration Test started.

@ 01:15:47UTC : ✅ Workflow completed successfully.

Copy link
Copy Markdown
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests feel like overkill for a 2 line change, and prone to constant refactoring.

An alternative here is to do this in the wrapper or use omegaconf resolutions so we dont need to add conditionals in our trainer/inferencer code.

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator Author

An alternative here is to do this in the wrapper or use omegaconf resolutions so we dont need to add conditionals in our trainer/inferencer code.

Synced offline, let's do this in the wrapper :)

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator Author

/unit_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 27, 2025

GiGL Automation

@ 17:35:15UTC : 🔄 Unit Test started.

@ 18:12:10UTC : ✅ Workflow completed successfully.

Copy link
Copy Markdown
Collaborator

@xgao4-sc xgao4-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks a lot for the work!

Copy link
Copy Markdown
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor comments, otherwise LGTM - thanks for the iteration

Comment thread python/gigl/src/training/v2/glt_trainer.py Outdated
@kmontemayor2-sc kmontemayor2-sc marked this pull request as ready for review August 28, 2025 18:13
@kmontemayor2-sc kmontemayor2-sc added this pull request to the merge queue Aug 28, 2025
Merged via the queue into main with commit a193c99 Aug 28, 2025
5 checks passed
@kmontemayor2-sc kmontemayor2-sc deleted the kmonte/add-region-override branch August 28, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants