-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Describe the feature you'd like
Add git_config parameter support to the SourceCode class (or directly to ModelTrainer) to enable fetching source code directly from Git repositories, similar to the functionality available in JumpStart models and v2 Estimator classes.
The git_config parameter should accept a dictionary with the following keys:
repo(required): Git repository URL (https, http, git@, or ssh://)branch(optional): Branch name (defaults to 'master')commit(optional): Specific commit hash2FA_enabled(optional): Boolean for GitHub 2FA authenticationusername,password,token(optional): Authentication credentials
How would this feature be used? Please describe.
This feature would allow users to reference training code stored in Git repositories without manually cloning them first. This is particularly useful for:
- CI/CD pipelines - Automatically pull the latest training code from a repository
- Team collaboration - Share training scripts via version control without S3 uploads
- Reproducibility - Pin to specific commits for exact code versioning
Describe alternatives you've considered
Current workarounds include:
- Manual cloning - Clone the repository locally before creating ModelTrainer, then use local
source_dir - S3 upload - Upload code to S3 and reference it via S3 URI in
source_dir - Use legacy estimators - Switch to older Estimator classes that support
git_config
Additional context
The SDK already has the infrastructure for this feature:
sagemaker.core.git_utils.git_clone_repo()handles Git cloning with authenticationsagemaker.core.git_utils._sanitize_git_url()provides security validation- JumpStart models (
JumpStartModelInitKwargs) already supportgit_config
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels