## Run Model Training


The next cell changes the current directory on the server to `ml-ops-project/code/model-training/docker_training` where all the docker files are stored.

In [None]:
cd ml-ops-project/code/model-training/docker_training

This command starts the MLflow services (Optional)

In [None]:
docker compose -f docker-compose-mlflow.yaml up -d

This command starts the Ray cluster services

In [None]:
docker compose -f docker-compose-ray.yaml up -d

The following command runs a new Docker container in detached mode (`-d`) with GPU access (`--gpus all`). 
- It maps port 8888 of the host to port 8888 of the container (typically for Jupyter).
- It mounts the host directory `/home/cc/ml-ops-project` to `/home/jovyan/work` inside the container, making your project files accessible.
- It names the container `legalai-ray-env`.
- It uses the image `jupyter-mlflow:latest`.

In [None]:
sudo docker run -d --gpus all \
          -p 8888:8888 \
          -v /home/cc/ml-ops-project:/home/jovyan/work \
          --name legalai-ray-env \
          jupyter-mlflow:latest

Lists all currently running Docker containers on the server. To check if the services started by Docker Compose (MLflow, Ray) and the manually run container (`legalai-ray-env`) are active.

In [None]:
sudo docker ps

Open an interactive bash shell session inside the running Docker container named `legalai-ray-env`. 

In [None]:
docker exec -it legalai-ray-env /bin/bash

**The following commands are intended to be run *inside* the Docker container that we just `exec`'d into.**

This command sources (executes) the OpenStack RC file located at `/home/jovyan/work/app-cred-legalai-model-access-openrc.sh` (inside the container). This script sets up environment variables (like `OS_AUTH_URL`, `OS_APPLICATION_CREDENTIAL_ID`, etc.) necessary for our Python script to authenticate with OpenStack Swift (your object storage).

In [None]:
source /home/jovyan/work/app-cred-legalai-model-access-openrc.sh

Set the `MLFLOW_TRACKING_URI` environment variable *inside the container*. It points to our centralized MLflow server running on KVM@TACC at `http://129.114.27.166:8000`. The Python training script will use this URI to log experiments and metrics.

In [None]:
export MLFLOW_TRACKING_URI="http://129.114.27.166:8000"

Change the current directory to run the training script (`legal_bert_triplet_finetune_a100.py`) *inside the container* to `/home/jovyan/work/code/training_script/`. 

In [None]:
cd /home/jovyan/work/code/model-training/training_script/

This is the main command to execute your Python training script (`legal_bert_triplet_finetune_a100.py`) *inside the container*. It passes several arguments to the script:
- Paths for data, model input/output (these paths are relative to the container's filesystem, using the `/home/jovyan/work` mount point).
- Training hyperparameters like number of epochs (1) and batch size(8).
- MLflow configuration (tracking URI, experiment name, run name).
- Flags for evaluation and uploading the fine-tuned model to Swift, along with Swift container details.

**Note:** If you are running this after starting a Ray cluster with `docker compose -f docker-compose-ray.yaml up -d`, this direct execution of the script is *not* using the Ray cluster for distributed training or job management. To use Ray, you would typically use `ray job submit` from the host machine, pointing to this script.

In [None]:
python3 legal_bert_triplet_finetune_a100.py \
    --data_path "/home/jovyan/work/code/model-training/training_data/legal_data.jsonl" \
    --model_name_or_path "swift://object-store-persist-group36/model/Legal-BERT/" \
    --local_model_temp_dir "/home/jovyan/work/temp_swift_downloads_docker" \
    --output_dir "/home/jovyan/work/sbert_output_swift_docker" \
    --num_epochs 1 \
    --batch_size 8 \
    --mlflow_tracking_uri "${MLFLOW_TRACKING_URI}" \
    --mlflow_experiment_name "LegalAI-Swift-Sklearn-In-Docker" \
    --mlflow_run_name "docker-run-swift-sklearn-$(date +%Y%m%d-%H%M%S)-v8-test" \
    --dev_split_ratio 0.2 \
    --evaluation_steps 50 \
    --evaluate_base_model \
    --random_seed 42 \
    --upload_model_to_swift \
    --swift_container_name "object-store-persist-group36" \
    --swift_upload_prefix "models/my_finetuned_legal_bert/run_$(date +%Y%m%d-%H%M%S)_v8"