<img src="./images/DLI_Header.png" style="width: 400px;">

# 6.0 Automation Workflows/Pipelines of Fine-tuning 

The diagram presented provides a high-level overview of a fine-tuning workflow, showcasing multiple interconnected components. Each component in the diagram corresponds to a [Cluster Workflow Templates](https://argo-workflows.readthedocs.io/en/latest/cluster-workflow-templates/) within Argo Workflows. These individual components are combined to form an overarching  [Workflow template](https://argo-workflows.readthedocs.io/en/latest/workflow-templates/), ensuring a modular and reusable structure. By leveraging Argo Workflows, the fine-tuning process is automated, enhancing efficiency and scalability.


<center><img src="./images-dli/example-workflow.png" style="width: 800px;"></center>

Following steps are involved in the workflow: 

1. Create dataset in Nemo Datastore.
2. Upload files in the created dataset.
3. Register dataset in Nemo Entity Store.
4. Create customization job within Nemo Customizer.
5. Track the cusomization job status until its completed.
6. Then create evaluation target and configuration, and create evaluation on the fine-tuned model. 


This lab builds upon the automation steps implemented in the previous lab using Argo Workflows. Previously, steps required manual intervention, but in this iteration, those manual steps have been encapsulated within Cluster Workflow Templates. As a result, the entire fine-tuning workflow is now fully automated within Argo, reducing human intervention and ensuring consistency in execution. 




## 6.0 Local Lab's Setup

<div class="alert alert-block alert-warning">

Add the git Username
</div>

In [None]:
print("Please enter your Git username:")
git_username = input()

# Add assertions to validate the variables
assert git_username and git_username.strip(), "GitHub username cannot be empty"


In [None]:
git_repo_name="llmops-nvidia"
git_base_url="github.com"
applications_base_dir = "llmops-nvidia/applications"
secrets_base_dir = "llmops-nvidia/secrets"
ingress_base_dir = "llmops-nvidia/ingress"

In [None]:
git_repo_url=f"git@{git_base_url}:{git_username}/{git_repo_name}.git" 
git_repo_url_ssh=f"ssh://git@{git_base_url}/{git_username}/{git_repo_name}.git"
commit_name=git_username 
commit_email=f"{git_username}@llmops-nvidia"
print(git_repo_url)

### 6.0.1 Add datasets to MinIO 

We add datasets to the local minio, so that we can use in the automation. We will refer the bucket from the automation pipeline. 

Output: 

```
Bucket 'test-dataset' created.
Uploaded: /dli/task/dataset/validation/validation.jsonl -> validation/validation.jsonl
Uploaded: /dli/task/dataset/training/training.jsonl -> training/training.jsonl
Uploaded: /dli/task/dataset/testing/testing.jsonl -> testing/testing.jsonl
```

In [None]:
import os
from minio import Minio
from minio.error import S3Error

# MinIO Configuration
minio_url = "minio.local"  
access_key = "admin"
secret_key = "QuX2+P16SPc7"
bucket_name = "test-dataset"
local_base_path = "/dli/task/dataset"  # Local base directory containing the folders

# Initialize MinIO Client
client = Minio(
    minio_url,
    access_key=access_key,
    secret_key=secret_key,
    secure=False,  # Set to True if using HTTPS
)

# Ensure the bucket exists
try:
    if not client.bucket_exists(bucket_name):
        client.make_bucket(bucket_name)
        print(f"Bucket '{bucket_name}' created.")
except S3Error as e:
    print(f"Error checking/creating bucket: {e}")

# Upload all files from the base directory and subfolders
for root, _, files in os.walk(local_base_path):
    for file in files:
        local_file_path = os.path.join(root, file)  # Full local path
        relative_path = os.path.relpath(local_file_path, local_base_path)  # Preserve folder structure
        object_name = relative_path.replace("\\", "/")  # Ensure correct path format for MinIO

        try:
            client.fput_object(
                bucket_name,  # Bucket name
                object_name,  # Object path in MinIO
                local_file_path,  # Local file path
            )
            print(f"Uploaded: {local_file_path} -> {object_name}")
        except S3Error as e:
            print(f"Error uploading {local_file_path}: {e}")



## 6.1 LLM Workflow components

Each component in the diagram corresponds to a [Cluster Workflow Templates](https://argo-workflows.readthedocs.io/en/latest/cluster-workflow-templates/) within Argo Workflows.


### 6.1.1 Creating Dataset Component (llm-workflows/components/1-create-dataset.yaml)

This step involves invoking the Nemo Datastore API to create a dataset repository. Expected Behavior:

- A request is sent to the Nemo Datastore API.
- The dataset example-dataset is created in the default namespace.
- The dataset gets a repository_id, which uniquely identifies it.

**inputs**
```yaml
apiVersion: argoproj.io/v1alpha1
kind: ClusterWorkflowTemplate
metadata:
  name: nemo-create-dataset-template
spec:
  templates:
    - name: create-dataset
      inputs:
        parameters:
          - name: nemo_datastore_endpoint
          - name: dataset_name
          - name: namespace
```
**source code for creating dataset**

```yaml
            pip install huggingface_hub requests && \
            cat <<EOF > script.py
            import requests
            from huggingface_hub import HfApi
            from huggingface_hub import configure_http_backend

            def backend_factory():
                session = requests.Session()
                session.verify = False
                return session

            def create_repo(repo_id, repo_type):
                hf_endpoint = "{{inputs.parameters.nemo_datastore_endpoint}}/v1/hf"
                api = HfApi(endpoint=hf_endpoint, token="token")
                api.create_repo(repo_id=repo_id, repo_type=repo_type)

            if __name__ == "__main__":
                repo_id = "{{inputs.parameters.namespace}}/{{inputs.parameters.dataset_name}}"
                repo_type = "dataset"
                configure_http_backend(backend_factory=backend_factory)
                create_repo(repo_id, repo_type)
                f = open("/tmp/repo_id.txt", "w")
                f.write(repo_id)
                f.close()

```
**output of the component**

```yaml
      outputs:
        parameters:
        - name: repo_id 
          valueFrom:
            path: /tmp/repo_id.txt 
```


### 6.1.2 Upload Files to Dataset Component (llm-workflows/components/3-upload-files-dataset.yaml)

This component involves uploading training, testing, and validation files into the Nemo Datastore. 

The training, testing, and validation datasets (typically in .jsonl format) are downloaded from the MinIO in a local directory and uploaded. 

**Inputs:**
```yaml
apiVersion: argoproj.io/v1alpha1
kind: ClusterWorkflowTemplate
metadata:
  name: nemo-upload-files-to-nemo-datastore-template
spec:
  templates:
    - name: upload-files-to-nemo-datastore
      inputs:
        parameters:
          - name: nemo_datastore_endpoint
          - name: repo_id
          - name: minio_url
          - name: minio_username
          - name: minio_password
          - name: minio_bucket_name
```


**Downloading Dataset:**

```yaml
            def download_dataset():
                # MinIO Configuration
                minio_url = "{{inputs.parameters.minio_url}}"
                access_key = "{{inputs.parameters.minio_username}}"
                secret_key = "{{inputs.parameters.minio_password}}"
                bucket_name = "{{inputs.parameters.minio_bucket_name}}"
                local_download_path = "/tmp"  # Local base directory for downloads
                
                # Initialize MinIO Client
                client = Minio(
                    minio_url,
                    access_key=access_key,
                    secret_key=secret_key,
                    secure=False,  # Set to True if using HTTPS
                )
                
                # Ensure the local download directory exists
                os.makedirs(local_download_path, exist_ok=True)
                
                # List and Download all objects
                objects = client.list_objects(bucket_name, recursive=True)
            
                for obj in objects:
                    object_name = obj.object_name  # Full path in MinIO
                    local_file_path = os.path.join(local_download_path, object_name)
            
                    # Create directories if they don’t exist
                    os.makedirs(os.path.dirname(local_file_path), exist_ok=True)
            
                    # Download the file
                    client.fget_object(bucket_name, object_name, local_file_path)
                    print(f"Downloaded: {object_name} -> {local_file_path}")
```

**Uploading Dataset:** 

```yaml
            def upload_datasets(repo_id, repo_type):
                hf_endpoint = "{{inputs.parameters.nemo_datastore_endpoint}}/v1/hf"
                hf_api = HfApi(endpoint=hf_endpoint, token="token")
                subprocess.run(["find", "/tmp"])

                training_data_folder = "/tmp/training"  # Path to the folder
                testing_data_folder = "/tmp/testing"  # Path to the folder
                validation_data_folder = "/tmp/validation"  # Path to the folder

                # Upload the folder
                hf_api.upload_folder(
                    folder_path=training_data_folder,
                    repo_id=repo_id,
                    repo_type=repo_type,
                    path_in_repo="training"
                )

                hf_api.upload_folder(
                    folder_path=testing_data_folder,
                    repo_id=repo_id,
                    repo_type=repo_type,
                    path_in_repo="testing"
                )

                commit_info = hf_api.upload_folder(
                    folder_path=validation_data_folder,
                    repo_id=repo_id,
                    repo_type=repo_type,
                    path_in_repo="validation"
                )
                print(commit_info)
```


### 6.1.3 Other components
We have similar all other components in `llm-workflows/components`

You can check them out individually. 

Example output: 

```
llm-workflows/components/
├── 1-create-dataset.yaml
├── 2-print-repo-id.yaml
├── 3-upload-files-dataset.yaml
├── 4-register-dataset.yaml
├── 5-create-customization.yaml
├── 6-track-customization.yaml
├── 7-create-evaluation-target.yaml
├── 8-create-evaluation-config.yaml
└── 9-create-evaluation.yaml
```


In [None]:
!tree llm-workflows/components/

## 6.2 Workflow Template (llm-workflows/workflow-templates/workflow-fine-tuning.yaml)
The individual components are combined to form  [Workflow template](https://argo-workflows.readthedocs.io/en/latest/workflow-templates/), ensuring a modular and reusable structure.

### 6.2.1 Inputs to the workflow

```yaml
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-fine-tuning
spec:
  entrypoint: nemo-create-dataset
  arguments:                    # You can pass in arguments as normal
    parameters:
      - name: nemo_datastore_endpoint
        value: "http://nemo-datastore.nemo-datastore.svc.cluster.local:3000"
      - name: nemo_customizer_endpoint
        value: "http://nemo-customizer-api.nemo-customizer.svc.cluster.local:8000"
      - name: nemo_evaluator_endpoint
        value: "http://nemo-evaluator.nemo-evaluator.svc.cluster.local:7331"
      - name: nemo_entity_store_endpoint
        value: "http://nemo-entity-store.nemo-entity-store.svc.cluster.local:8000"
      - name: nim_internal_endpoint
        value: "http://meta-llama3-1-8b-instruct.llama3-1-8b-instruct.svc.cluster.local:8000"
      - name: dataset_name_prefix
        value: "test"
      - name: namespace
        value: "default"
      - name: new_model_name
        value: "example-model@v3"
      - name: project_name
        value: "example-project-workflow"
      - name: minio_url
        value: "minio-client.minio.svc.cluster.local:9000"
      - name: minio_username
        value: "admin"
      - name: minio_password
        value: "QuX2+P16SPc7"
      - name: minio_bucket_name
        value: "test-dataset"
```

### 6.2.2 Inputs to the workflow and steps in the workflow

```yaml

    - name: nemo-create-dataset
      steps:                              
        - - name: create-dataset
            templateRef:                  
              name: nemo-create-dataset-template
              template: create-dataset
              clusterScope: true
            arguments:                    
              parameters:
              - name: nemo_datastore_endpoint
                value: "{{workflow.parameters.nemo_datastore_endpoint}}"
              - name: dataset_name
                value: "{{workflow.parameters.dataset_name_prefix}}-{{workflow.name}}"
              - name: namespace
                value: "{{workflow.parameters.namespace}}"
[...]

        - - name: nemo-customization
            templateRef:                  
              name: nemo-customization-template
              template: nemo-customization
              clusterScope: true
            arguments:
              parameters:    
              - name: nemo_customizer_endpoint
                value: "{{workflow.parameters.nemo_customizer_endpoint}}"
              - name: dataset_name
                value: "{{workflow.parameters.dataset_name_prefix}}-{{workflow.name}}"
              - name: namespace
                value: "{{workflow.parameters.namespace}}"
              - name: project_name
                value: "{{workflow.parameters.project_name}}"
              - name: new_model_name
                value: "{{workflow.parameters.new_model_name}}"

[...]

        - - name: create-eval-target
            templateRef:                  
              name: nemo-create-eval-target-template
              template: create-eval-target
              clusterScope: true
            arguments:
              parameters:
              - name: nemo_evaluator_endpoint
                value: "{{workflow.parameters.nemo_evaluator_endpoint}}"
              - name: new_model_name
                value: "{{workflow.parameters.new_model_name}}"
              - name: nim_internal_endpoint
                value: "{{workflow.parameters.nim_internal_endpoint}}"
          - name: create-eval-config
            templateRef:                  
              name: nemo-create-eval-config-template
              template: create-eval-config
              clusterScope: true
            arguments:
              parameters:
              - name: nemo_evaluator_endpoint
                value: "{{workflow.parameters.nemo_evaluator_endpoint}}"
              - name: namespace
                value: "{{workflow.parameters.namespace}}"
              - name: dataset_name
                value: "{{workflow.parameters.dataset_name_prefix}}-{{workflow.name}}"           
        - - name: create-evaluation
            templateRef:                  
              name: nemo-create-evaluation-template
              template: create-evaluation
              clusterScope: true
            arguments:
              parameters:
              - name: nemo_evaluator_endpoint
                value: "{{workflow.parameters.nemo_evaluator_endpoint}}"
              - name: eval_target
                value: "{{steps.create-eval-target.outputs.parameters.eval_target}}"
              - name: eval_config
                value: "{{steps.create-eval-config.outputs.parameters.eval_config}}"
```

## 6.3 Add llm-workflows to Git repo

We will deploy the workflow components via ArgoCD into the argo workflows. 

In [None]:
## remove .ipynb_checkpoints if exists
! rm -r llm-workflows/components/.ipynb_checkpoints/
! rm -r llm-workflows/workflow-templates/.ipynb_checkpoints/
! rm -r llm-workflows/workflow-templates/common/.ipynb_checkpoints/

In [None]:
## copy llm-workflows to llmops-nvidia/
!cp -r llm-workflows llmops-nvidia/

In [None]:
## See the directory structure
!tree llmops-nvidia/llm-workflows

In [None]:
## Commit
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add llm workflows" && git push

## 6.4 Create application to track llm-workflows 
We will create an argocd application which will track all `llm-workflows`  under `llm-workflows` folder

In [None]:
!mkdir -p llmops-nvidia/applications/llm-workflows

In [None]:
argocd_workflows_application_yaml = f"""
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: llm-workflows
  namespace: argocd
spec:
  destination:
    namespace: nemo-workflow-templates
    server: 'https://kubernetes.default.svc'
  source:
    path: llm-workflows
    repoURL: '{git_repo_url_ssh}'
    targetRevision: main
    directory:
      recurse: true
  project: default
  syncPolicy:
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
"""
with open("llmops-nvidia/applications/llm-workflows/app.yaml", "w") as f:
    f.write(argocd_workflows_application_yaml)

In [None]:
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add llm workflows tracking app" && git push

### 6.4.1 Sync vi UI
As we have commit our code to Git, argocd usually sync automatically after every 5 minutes. But we can force it to sync either via UI or CLI. 

Here we use UI to sync. 

Click on the sycn button on the `argocd-components` application as shown in diagram, then the `llm-workflows` would show up. 
<img src="./images-dli/sync-apps.png" style="width: 435px; float: left">
<img src="./images-dli/llm-workflows-app.png" style="width: 500px; float: right">


## 6.5 Check llm-workflows components in Argo Workflows

### 6.5.1 Port forward Argo workflows for accessing

In [None]:
import subprocess

subprocess.Popen(
    ["kubectl", "-n", "argoworkflows", "port-forward", "--address", "0.0.0.0", "service/argo-workflows-server", "31091:2746"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
    close_fds=True
)

In [None]:
%%js
const href = window.location.hostname;
let a = document.createElement('a');
let link = document.createTextNode('Open Argoworkflow UI!');
a.appendChild(link);
a.href = "http://" + href + "/";
a.style.color = "navy"
a.target = "_blank"
element.append(a);

### 6.5.2 LLM Workflow Components

Once you open the Argo Workflows UI: 
1. Click on the `Cluster Workflow Templates` option from the side menu.
2. You can check the components byt clicking on them. 
    
<center><img src="./images-dli/workflow-components.png" style="width: 650px;"></center>


### 6.5.2 LLM Workflow Templates
The individual components are combined to form  [Workflow template](https://argo-workflows.readthedocs.io/en/latest/workflow-templates/), ensuring a modular and reusable structure.

Once you open the Argo Workflows UI: 
1. Click on the `Workflow Templates` option from the side menu and select `workflow-fine-tuning`

<center><img src="./images-dli/workflow-templates-submit.png" style="width: 650px;"></center>

2. The workflow will show up.

<center><img src="./images-dli/workflow-template-yaml.png" style="width: 650px;"></center>



### 6.5.3 LLM Workflow Submission

1. Select the workflow template and click on `Submit` on the top left
    
<center><img src="./images-dli/workflow-templates-submit.png" style="width: 650px;"></center>

2. You can edit the variables and click submit. The automation pipeline of the E2E workflow will run. 

<center><img src="./images-dli/workflow-submission.png" style="width: 650px;"></center>

3. At the end of the fine-tuning workflow, evaluation workflow will also be launched.


<center><img src="./images-dli/eval-workflow-fine-tuning.png" style="width: 650px;"></center>

 

<div class="alert alert-block alert-warning">

The overall workflow takes around 10-12 minutes to complete.
</div>

---
<h2 style="color:green;">Congratulations!</h2>

You've made it through the last Notebook. In this notebook, you have:
- Explored various argo workflows components created which reporesents the Cluster Worklfow tempaltes in argo workflows.
- Explored Workflow Templates in Argo Workflows.
- Synced all workflow components and templates via ArgoCD.
- Submit the workflow to complete E2E automation of fine-tuning pipeline.
- Monitor the status of the workflow process in Argo workflows UI. 
