Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions PREFLIGHT.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Before you run ML workload on Multihost with GCE or GKE, simply apply `bash pref

Here is an example for GCE:
```
bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME
bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
```

Here is an example for GKE:
```
bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME
bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
```

# Optimization 2: Numa binding (You can only apply this to v4 and v5p)
Expand All @@ -22,14 +22,14 @@ For GCE,
[preflight.sh](https://github.com/google/maxtext/blob/main/preflight.sh) will help you install `numactl` dependency, so you can use it directly, here is an example:

```
bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME
bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
```

For GKE,
`numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example

```
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
```

1. `numactl`: This is the command-line tool used for controlling NUMA policy for processes or shared memory. It's particularly useful on multi-socket systems where memory locality can impact performance.
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/Getting_Started_Benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Two approaches are here:
CLUSTER=my-cluster
ZONE=my-zone
PROJECT=my-project
python3 -m benchmarks.benchmark_runner xpk --project $PROJECT --zone $ZONE --cluster_name $CLUSTER --device_type v6e-256 --base_output_directory gs://maxtext-experiments-tpem/ --num_steps=5
python3 -m benchmarks.benchmark_runner xpk --project ${PROJECT?} --zone ${ZONE?} --cluster_name ${CLUSTER?} --device_type v6e-256 --base_output_directory gs://maxtext-experiments-tpem/ --num_steps=5
```

```shell
Expand All @@ -23,7 +23,7 @@ export RUNNER=us-docker.pkg.dev/path/to/maxtext_runner
export PROXY_IMAGE=us-docker.pkg.dev/cloud-tpu-v2-images/pathways/proxy_server
export SERVER_IMAGE=us-docker.pkg.dev/cloud-tpu-v2-images/pathways/server

python3 -m benchmarks.benchmark_runner xpk --project $PROJECT --zone $ZONE --cluster_name $CLUSTER --device_type v6e-256 --base_output_directory gs://maxtext-experiments-tpem/ --num_steps=5 --pathways_server_image="${SERVER_IMAGE}" --pathways_proxy_server_image="${PROXY_IMAGE}" --pathways_runner_image="${RUNNER}"
python3 -m benchmarks.benchmark_runner xpk --project ${PROJECT?} --zone ${ZONE?} --cluster_name ${CLUSTER?} --device_type v6e-256 --base_output_directory gs://maxtext-experiments-tpem/ --num_steps=5 --pathways_server_image="${SERVER_IMAGE?}" --pathways_proxy_server_image="${PROXY_IMAGE?}" --pathways_runner_image="${RUNNER?}"
```

```shell
Expand Down
28 changes: 14 additions & 14 deletions benchmarks/api_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,34 +131,34 @@ export ICI_EXPERT_PARALLELISM=2
# 2. Define the Command to Run on the Cluster
# ==============================================================================
# This command installs dependencies and then starts the server.
CMD="export HF_TOKEN=${HF_TOKEN} && \
CMD="export HF_TOKEN=${HF_TOKEN?} && \
pip install --upgrade pip && \
pip install -r benchmarks/api_server/requirements.txt && \
bash benchmarks/api_server/start_server.sh \
maxtext/configs/base.yml \
model_name="${MODEL_NAME}" \
tokenizer_path="${TOKENIZER_PATH}" \
load_parameters_path="${LOAD_PARAMETERS_PATH}" \
per_device_batch_size=${PER_DEVICE_BATCH_SIZE} \
ici_tensor_parallelism=${ICI_TENSOR_PARALLELISM} \
ici_expert_parallelism=${ICI_EXPERT_PARALLELISM} \
model_name="${MODEL_NAME?}" \
tokenizer_path="${TOKENIZER_PATH?}" \
load_parameters_path="${LOAD_PARAMETERS_PATH?}" \
per_device_batch_size=${PER_DEVICE_BATCH_SIZE?} \
ici_tensor_parallelism=${ICI_TENSOR_PARALLELISM?} \
ici_expert_parallelism=${ICI_EXPERT_PARALLELISM?} \
tokenizer_type=\"huggingface\" \
return_log_prob=True"


# ==============================================================================
# 3. Launch the Workload
# ==============================================================================
echo "Launching workload ${RUNNAME}..."
xpk workload create --workload "${RUNNAME}" \
--base-docker-image "${DOCKER_IMAGE}" \
--command "${CMD}" \
echo "Launching workload ${RUNNAME?}..."
xpk workload create --workload "${RUNNAME?}" \
--base-docker-image "${DOCKER_IMAGE?}" \
--command "${CMD?}" \
--num-slices=1 \
--cluster "${CLUSTER}" --device-type "${DEVICE_TYPE}" --project "${PROJECT}" --zone "${ZONE}"
--cluster "${CLUSTER?}" --device-type "${DEVICE_TYPE?}" --project "${PROJECT?}" --zone "${ZONE?}"

echo "Workload ${RUNNAME} created."
echo "Workload ${RUNNAME?} created."
echo "Use the following command to connect:"
echo "bash benchmarks/api_server/port_forward_xpk.sh job_name=${RUNNAME} project=${PROJECT} zone=${ZONE} cluster=${CLUSTER}"
echo "bash benchmarks/api_server/port_forward_xpk.sh job_name=${RUNNAME?} project=${PROJECT?} zone=${ZONE?} cluster=${CLUSTER?}"
```

### 2. Launch the Workload
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/maxtest/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ If we want to pass custom flags this is also possible by specifying
Useful checking for the existence of SDC on TPU hardware.

```
bash maxtest.sh --project $TPU_PROJECT --cluster $CLUSTER --region $REGION --nodepool $NODEPOOL_NAME --num_workers $NUM_WORKERS --libtpu_args '--xla_tpu_enable_sdc_checker'
bash maxtest.sh --project ${TPU_PROJECT?} --cluster ${CLUSTER?} --region ${REGION?} --nodepool ${NODEPOOL_NAME?} --num_workers ${NUM_WORKERS?} --libtpu_args '--xla_tpu_enable_sdc_checker'
```


Expand Down
16 changes: 8 additions & 8 deletions docs/guides/checkpointing_solutions/convert_checkpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ First, make sure python3 virtual environment for MaxText is set up and enabled.
```bash
export VENV_NAME=<your virtual env name> # e.g., maxtext_venv
pip install uv
uv venv --python 3.12 --seed $VENV_NAME
source $VENV_NAME/bin/activate
uv venv --python 3.12 --seed ${VENV_NAME?}
source ${VENV_NAME?}/bin/activate
```

Second, ensure you have the necessary dependencies installed (PyTorch for the conversion script).
Expand Down Expand Up @@ -68,16 +68,16 @@ Finally, run below command to complete the conversion

```bash
python3 -m maxtext.checkpoint_conversion.to_maxtext maxtext/configs/base.yml \
model_name=${HF_MODEL} \
hf_access_token=${HF_TOKEN} \
base_output_directory=${MODEL_CHECKPOINT_DIRECTORY} \
model_name=${HF_MODEL?} \
hf_access_token=${HF_TOKEN?} \
base_output_directory=${MODEL_CHECKPOINT_DIRECTORY?} \
scan_layers=True \
use_multimodal=false \
hardware=cpu \
skip_jax_distributed_system=true \
checkpoint_storage_use_zarr3=${USE_ZARR3} \
checkpoint_storage_use_ocdbt=${USE_OCDBT} \
--lazy_load_tensors=${LAZY_LOAD_TENSORS}
checkpoint_storage_use_zarr3=${USE_ZARR3?} \
checkpoint_storage_use_ocdbt=${USE_OCDBT?} \
--lazy_load_tensors=${LAZY_LOAD_TENSORS?}
```

**Key arguments:**
Expand Down
32 changes: 16 additions & 16 deletions docs/guides/checkpointing_solutions/emergency_checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ In this scenario, you should configure each pod in that slice with a ramdisk of
```
2. **Configure gcloud:**
```bash
gcloud config set project ${PROJECT_ID}
gcloud config set compute/zone ${ZONE}
gcloud config set project ${PROJECT_ID?}
gcloud config set compute/zone ${ZONE?}
```
3. **Clone the XPK repository:**
```bash
Expand All @@ -85,15 +85,15 @@ In this scenario, you should configure each pod in that slice with a ramdisk of
4. **Run the cluster creation command:**
```bash
python3 xpk/xpk.py cluster create \
--cluster ${CLUSTER_NAME} \
--cluster-cpu-machine-type=${MACHINE_TYPE} \
--num-slices=${NUM_SLICES} \
--tpu-type=${TPU_TYPE} \
--cluster ${CLUSTER_NAME?} \
--cluster-cpu-machine-type=${MACHINE_TYPE?} \
--num-slices=${NUM_SLICES?} \
--tpu-type=${TPU_TYPE?} \
--enable-mtc \
--enable-gcsfuse-csi-driver \
--mtc-ramdisk-size=${RAMDISK_SIZE} \
--mtc-gcs-bucket=${OUTPUT_PATH} \
--gke-version=${GKE_VERSION}
--mtc-ramdisk-size=${RAMDISK_SIZE?} \
--mtc-gcs-bucket=${OUTPUT_PATH?} \
--gke-version=${GKE_VERSION?}
```

## MaxText configuration
Expand Down Expand Up @@ -150,12 +150,12 @@ The flags below would give the user access to the ramdisk in their workload:

```bash
python3 xpk/xpk.py workload create \
--cluster ${CLUSTER_NAME} \
--docker-image ${DOCKER_IMAGE} \
--workload ${WORKLOAD_NAME} \
--tpu-type=${TPU_TYPE} \
--num-slices=${NUM_SLICES} \
--ramdisk-directory=${RAMDISK_DIRECTORY} \
--cluster ${CLUSTER_NAME?} \
--docker-image ${DOCKER_IMAGE?} \
--workload ${WORKLOAD_NAME?} \
--tpu-type=${TPU_TYPE?} \
--num-slices=${NUM_SLICES?} \
--ramdisk-directory=${RAMDISK_DIRECTORY?} \
--mtc-enabled \
--command "python3 src/maxtext/trainers/pre_train/train.py src/maxtext/configs/base.yml base_output_directory=$OUTPUT_PATH dataset_path=$DATA_PATH steps=120 per_device_batch_size=6 enable_checkpoint_cloud_logger=True checkpoint_period=${CHECKPOINT_PEROID} enable_emergency_checkpoint=True local_checkpoint_period=${LOCAL_CHECKPOINT_PERIOD} local_checkpoint_directory=/${RAMDISK_DIRECTORY}"
--command "python3 src/maxtext/trainers/pre_train/train.py src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} steps=120 per_device_batch_size=6 enable_checkpoint_cloud_logger=True checkpoint_period=${CHECKPOINT_PEROID?} enable_emergency_checkpoint=True local_checkpoint_period=${LOCAL_CHECKPOINT_PERIOD?} local_checkpoint_directory=/${RAMDISK_DIRECTORY?}"
```
32 changes: 16 additions & 16 deletions docs/guides/checkpointing_solutions/multi_tier_checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ In this scenario, you should configure each pod in that slice with a ramdisk of
```
2. **Configure gcloud:**
```bash
gcloud config set project ${PROJECT_ID}
gcloud config set compute/zone ${ZONE}
gcloud config set project ${PROJECT_ID?}
gcloud config set compute/zone ${ZONE?}
```
3. **Clone the XPK repository:**
```bash
Expand All @@ -115,15 +115,15 @@ In this scenario, you should configure each pod in that slice with a ramdisk of
4. **Run the cluster creation command:**
```bash
python3 xpk/xpk.py cluster create \
--cluster ${CLUSTER_NAME} \
--cluster-cpu-machine-type=${MACHINE_TYPE} \
--num-slices=${NUM_SLICES} \
--tpu-type=${TPU_TYPE} \
--cluster ${CLUSTER_NAME?} \
--cluster-cpu-machine-type=${MACHINE_TYPE?} \
--num-slices=${NUM_SLICES?} \
--tpu-type=${TPU_TYPE?} \
--enable-mtc \
--enable-gcsfuse-csi-driver \
--mtc-ramdisk-size=${RAMDISK_SIZE} \
--mtc-gcs-bucket=${OUTPUT_PATH} \
--gke-version=${GKE_VERSION}
--mtc-ramdisk-size=${RAMDISK_SIZE?} \
--mtc-gcs-bucket=${OUTPUT_PATH?} \
--gke-version=${GKE_VERSION?}
```

## MaxText configuration
Expand Down Expand Up @@ -179,12 +179,12 @@ The flags below would give the user access to the ramdisk in their workload:

```bash
python3 xpk/xpk.py workload create \
--cluster ${CLUSTER_NAME} \
--docker-image ${DOCKER_IMAGE} \
--workload ${WORKLOAD_NAME} \
--tpu-type=${TPU_TYPE} \
--num-slices=${NUM_SLICES} \
--ramdisk-directory=${RAMDISK_DIRECTORY} \
--cluster ${CLUSTER_NAME?} \
--docker-image ${DOCKER_IMAGE?} \
--workload ${WORKLOAD_NAME?} \
--tpu-type=${TPU_TYPE?} \
--num-slices=${NUM_SLICES?} \
--ramdisk-directory=${RAMDISK_DIRECTORY?} \
--mtc-enabled \
--command "python3 src/maxtext/trainers/pre_train/train.py src/maxtext/configs/base.yml base_output_directory=$OUTPUT_PATH dataset_path=$DATA_PATH steps=120 per_device_batch_size=6 enable_checkpoint_cloud_logger=True checkpoint_period=${CHECKPOINT_PEROID} enable_multi_tier_checkpointing=True local_checkpoint_period=${LOCAL_CHECKPOINT_PERIOD} local_checkpoint_directory=/${RAMDISK_DIRECTORY} multi_tier_checkpointing_backup_interval_minutes=${MULTI_TIER_CHECKPOINTING_BACKUP_INT_MIN}"
--command "python3 src/maxtext/trainers/pre_train/train.py src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} steps=120 per_device_batch_size=6 enable_checkpoint_cloud_logger=True checkpoint_period=${CHECKPOINT_PEROID?} enable_multi_tier_checkpointing=True local_checkpoint_period=${LOCAL_CHECKPOINT_PERIOD?} local_checkpoint_directory=/${RAMDISK_DIRECTORY?} multi_tier_checkpointing_backup_interval_minutes=${MULTI_TIER_CHECKPOINTING_BACKUP_INT_MIN?}"
```
6 changes: 3 additions & 3 deletions docs/guides/data_input_pipeline/data_input_grain.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ Grain ensures determinism in data input pipelines by saving the pipeline's state

```sh
bash tools/setup/setup_gcsfuse.sh \
DATASET_GCS_BUCKET=$BUCKET_NAME \
MOUNT_PATH=$MOUNT_PATH \
[FILE_PATH=$MOUNT_PATH/my_dataset]
DATASET_GCS_BUCKET=${BUCKET_NAME?} \
MOUNT_PATH=${MOUNT_PATH?} \
[FILE_PATH=${MOUNT_PATH?}/my_dataset]
```

Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pre-filling the metadata cache (see ["Performance tuning best practices" on the Google Cloud documentation](https://cloud.google.com/storage/docs/cloud-storage-fuse/performance#improve-first-time-reads)).
Expand Down
12 changes: 6 additions & 6 deletions docs/guides/monitoring_and_debugging/monitor_goodput.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,17 +89,17 @@ Please use a unique workload name, unless you intend to monitor cumulative Goodp
MaxText enables Goodput recording and monitoring by default with `enable_goodput_recording=True` and `monitor_goodput=True`. You can configure the goodput upload frequency by setting `goodput_upload_interval_seconds`.

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=$OUTPUT_PATH \
dataset_path=$DATA_PATH run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \
dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30
```

#### How to monitor step time deviation

MaxText enables step time deviation monitoring by default with `monitor_step_time_deviation=True`. You can configure the upload frequency by setting `step_deviation_interval_seconds`.

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=$OUTPUT_PATH \
dataset_path=$DATA_PATH run_name=goodput-test-run steps=200 step_deviation_interval_seconds=30
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \
dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 step_deviation_interval_seconds=30
```

#### How to enable Pathways Goodput
Expand All @@ -111,7 +111,7 @@ Enabling `enable_pathways_goodput` turns on Goodput measurement for Pathways wor
```

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=$OUTPUT_PATH dataset_path=$DATA_PATH \
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_pathways_goodput=True
```

Expand Down Expand Up @@ -168,7 +168,7 @@ and `enable_gcp_step_deviation_metrics` to `False` for disabling step deviation
metrics.

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=$OUTPUT_PATH dataset_path=$DATA_PATH \
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_gcp_goodput_metrics=False \
enable_gcp_step_deviation_metrics=False
```
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/core_concepts/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Common options for the `quantization` flag when using Qwix include:
Here is an example of how to run a training job with int8 quantization enabled via Qwix:

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
```

#### The Qwix Interception API
Expand Down Expand Up @@ -142,7 +142,7 @@ When using AQT, you can pass one of the following values to the `quantization` f
#### Example command for AQT

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
```

Note that `use_qwix_quantization` is not set to `True`.
Expand Down
12 changes: 6 additions & 6 deletions docs/run_maxtext/run_maxtext_localhost.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ After the installation is complete, run a short training job using synthetic dat

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
run_name=$YOUR_JOB_NAME \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
steps=10
Expand All @@ -73,7 +73,7 @@ To demonstrate model output, run the following command:

```bash
python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \
run_name=$YOUR_JOB_NAME \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
per_device_batch_size=1
```
Expand All @@ -94,7 +94,7 @@ To use a pre-configured model for TPUs, you override the `model_name` parameter,
```bash
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
model_name=llama3-8b \
run_name=$YOUR_JOB_NAME \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
steps=10
Expand All @@ -108,7 +108,7 @@ python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
```bash
python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
model_name=qwen3-4b \
run_name=$YOUR_JOB_NAME \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
steps=10
Expand All @@ -125,7 +125,7 @@ To use a GPU-optimized configuration, you should specify the path to the model's

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/gpu/models/mixtral_8x7b.yml \
run_name=$YOUR_JOB_NAME \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
steps=10
Expand All @@ -140,7 +140,7 @@ This will load `gpu/mixtral_8x7b.yml`, which inherits from `base.yml`.

```bash
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/gpu/models/llama3-8b.yml \
run_name=$YOUR_JOB_NAME \
run_name=${YOUR_JOB_NAME?} \
base_output_directory=gs://<my-bucket> \
dataset_type=synthetic \
steps=10
Expand Down
Loading
Loading