# Pre-train BERT on GLUE MRPC Dataset using Accelerate

This notebook shows how to pre-train BERT on the GLUE MRPC dataset using [Hugging Face Accelerate](https://github.com/huggingface/accelerate) library.

## Setup and Imports

In [None]:
! pip install kubernetes
! pip install boto3

In [None]:
import os
import subprocess
import sys

# Set working directory
os.chdir(os.path.expanduser('~/amazon-eks-machine-learning-with-terraform-and-kubeflow'))
print(f"Working directory: {os.getcwd()}")

# Get the src directory
src_dir = os.path.join(os.getcwd(), "src")
sys.path.insert(0, src_dir)

from k8s.utils import wait_for_helm_release_pods

# Get notebook directory
notebook_dir = os.path.join(os.getcwd(), 'examples', 'training', 'accelerate', 'bert-glue-mrpc')
print(f"Notebook directory: {notebook_dir}")

# Initialize key variables
release_name = 'accel-bert'
namespace = 'kubeflow-user-example-com'

## Step 1: Launch Pre-training

In [None]:
cmd = [
    'helm', 'install', '--debug', release_name,
    'charts/machine-learning/training/pytorchjob-elastic',
    '-f', f'{notebook_dir}/pretrain.yaml',
    '-n', namespace
]

result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)

In [None]:
# Wait for pre-training to complete
wait_for_helm_release_pods(release_name, namespace)

In [None]:
# Uninstall the training job
cmd = ['helm', 'uninstall', release_name, '-n', namespace]
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)

## Output

To access the output stored on EFS and FSx for Lustre file-systems:

```bash
kubectl apply -f eks-cluster/utils/attach-pvc.yaml -n kubeflow
kubectl exec -it -n kubeflow attach-pvc -- /bin/bash
```

### Logs
Pre-training logs are available in `/efs/home/accel-bert/logs` folder.

### Checkpoints
Pre-training checkpoints are available in `/efs/home/accel-bert/checkpoints` folder.

### S3 Backup
Any content stored under `/fsx` is automatically backed up to your configured S3 bucket.