### Covalent Cloud

☁️ [Covalent Cloud](https://www.covalent.xyz/cloud/)

🚀 [Covalent Cloud QuickStart](https://docs.covalent.xyz/docs/cloud/cloud_quickstart)

### Covalent OS

🌟 [GitHub: Covalent Open-Source](https://github.com/AgnostiqHQ/covalent)

---

# Building A Zero-Data Model Foundry

## Setting Up

In [56]:
import time
import json
import os
import random
import shutil
from dataclasses import dataclass
from pathlib import Path
from uuid import uuid4

import covalent as ct
import covalent_cloud as cc
import torch
from covalent_cloud.cloud_executor.models.gpu import GPU_TYPE
from datasets import Dataset, load_from_disk
from peft import LoraConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline
)
from trl import SFTTrainer

## Authenticating with Covalent Cloud

In [57]:
CC_API_KEY = os.environ["CC_API_KEY"]  # set in `environment.yml` file
cc.save_api_key(CC_API_KEY)

## Create a [cloud volume](https://docs.covalent.xyz/docs/cloud/guides/cloud_storage) for persistent storage

In [25]:
volume = cc.volume("model-storage")  # store fine-tuned models and generated datasets

## Create [runtime environments](https://docs.covalent.xyz/docs/cloud/guides/cloud_custom_environments) for tasks and services

An environment for fine-tuning models:

In [26]:
FT_ENV = "model-fine-tuning"  # assign unique name for referring to this env

cc.create_env(
    name=FT_ENV,
    pip=[
        "accelerate==0.29.1",
        "bitsandbytes==0.43.0",
        "datasets==2.18.0",
        "pandas==2.2.1",
        "scipy==1.12.0",
        "sentencepiece==0.2.0",
        "torch==2.2.2",
        "transformers==4.39.3",
        "trl==0.8.1",
        "tqdm==4.66.2",
        "peft==0.10.0",
    ],
    wait=True,
)

Environment Already Exists.


Another environment for running the data generator LLM:

In [27]:
DATA_ENV = "data-generation"
cc.create_env(name=DATA_ENV, pip=["vllm"], wait=True)

Environment Already Exists.


---

# Service: Data Generator LLM

This service hosts a powerful LLM that generates synthetic data for fine tuning another model.

<div align="center">
<img src="./assets/data-generator.png" alt="Highlight data generator component" height=550px/>
</div>

## Backend for the Data Generator LLM

- H100 GPU

- 48 GB RAM

In [28]:
data_generator_ex = cc.CloudExecutor(
    env=DATA_ENV,
    num_cpus=6,
    num_gpus=1,
    gpu_type=GPU_TYPE.H100,
    memory="48GB",
    time_limit="7 hours",
)

In [29]:
PROMPT_TEMPLATE = (
    "<|begin_of_text|><|start_header_id|>system<|end_header_id|>"
    "You are a knowledgeable assistant who generates fine-tuning data for an LLM. "
    "Please generate {target_items_per_response} data items for the fine-tuning task specified by the user.\n"
    "IMPORTANT: Return a JSON array of new items in the format: \"{return_format}\""
    "<|eot_id|>"
    "<|start_header_id|>user<|end_header_id|>{user_prompt}<|eot_id|>"
    "<|start_header_id|>assistant<|end_header_id|>"
)


@cc.service(executor=data_generator_ex, name="LLM Data Generator", auth=False)
def llm_data_generator(model_name):

    """Initialize the service that host the data generator LLM"""

    from vllm import LLM

    return {
        "llm": LLM(model=model_name, trust_remote_code=True, enforce_eager=True),
    }

@llm_data_generator.endpoint("/generate-data")
def generate_data(
    llm,
    task,
    return_format="[item] ## [label]",
    num_generations=5,
    target_items_per_response=5,
):
    """Endpoint to generate data using the LLM"""

    from vllm import SamplingParams

    # Format task into prompt.
    prompt_ = PROMPT_TEMPLATE.format(
        target_items_per_response=target_items_per_response,
        return_format=return_format,
        user_prompt=json.dumps({
            "task": task,
            "constraint": "Respond with ONLY the generated data as a valid JSON array!",
        }),
    )

    def _seeded_sampling_params():
        seed = random.randint(0, 1_000_000)
        return SamplingParams(temperature=0.9, top_p=0.8, max_tokens=2000, seed=seed)

    # Create a batch of prompts and sampling params
    prompts_batch = [prompt_] * num_generations
    params_batch = [_seeded_sampling_params() for _ in range(num_generations)]

    # Generate data
    outputs = llm.generate(prompts_batch, params_batch)

    # Extract and filter generated data
    data_items = []
    for output in outputs:
        generated_text = output.outputs[0].text
        try:
            data_items.extend(json.loads(generated_text))
        except Exception:
            continue

    return data_items

---

# Service: New Fine-tuned Models

<div align="center">
<img src="./assets/finetune-service.png" alt="Highlight fine-tuned model component" height=550px/>
</div>

### Backend for the Fine-tuned Model Service

Run on:
- L40 GPU
- 48 GB RAM

In [30]:
ft_service_ex = cc.CloudExecutor(
    env=FT_ENV,
    num_cpus=25,
    num_gpus=1,
    gpu_type=GPU_TYPE.L40,
    memory="48GB",
    time_limit="7 hours"
)

In [31]:
@cc.service(executor=ft_service_ex, volume=volume, name="Custom Fine-Tuned Model")
def finetuned_llm_service(ft_model_path):

    """Serves a newly fine-tuned LLM for text generation."""

    ft_model_path_ = Path("/tmp") / Path(ft_model_path).name
    if ft_model_path_.exists():
        shutil.rmtree(ft_model_path_)
    shutil.copytree(ft_model_path, ft_model_path_)

    # Load and configure saved model
    model = AutoModelForCausalLM.from_pretrained(ft_model_path_, device_map="auto", do_sample=True)

    # Load and configure tokenizer
    tokenizer = AutoTokenizer.from_pretrained(ft_model_path_)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    # Combine model and tokenizer into a pipeline
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

    return {"pipe": pipe, "model": model, "tokenizer": tokenizer}


@finetuned_llm_service.endpoint("/generate")
def generate_text(pipe, prompt, max_length=100):

    """Generate text from a prompt using the fine-tuned language model."""

    output = pipe(prompt, truncation=True, max_length=max_length, num_return_sequences=1)
    return output[0]["generated_text"]


@finetuned_llm_service.endpoint("/stream", streaming=True)
def generate_stream(model, tokenizer, prompt, prepend_prompt=False, max_tokens=100):

    """Prompt Llama-like model to stream generated text."""

    def _starts_with_space(_tokenizer, _token_id):
        token = _tokenizer.convert_ids_to_tokens(_token_id)
        return token.startswith('▁')

    _input = tokenizer(prompt, return_tensors='pt')
    _input = _input.to("cuda")

    if prepend_prompt:
        yield prompt

    for output_length in range(max_tokens):
        output = model.generate(**_input, max_new_tokens=1)
        current_token_id = output[0][-1]
        if current_token_id == tokenizer.eos_token_id:
            break

        current_token = tokenizer.decode(
            current_token_id, skip_special_tokens=True
        )
        if _starts_with_space(tokenizer, current_token_id.item()) and output_length > 1:
            current_token = ' ' + current_token
        yield current_token

        _input = {
            'input_ids': output.to("cuda"),
            'attention_mask': torch.ones(1, len(output[0])).to("cuda"),
        }

---

# Service: Main Agent

<div align="center">
<img src="./assets/main-agent.png" alt="Highlight main agent component" height=550px/>
</div>

In [32]:
agent_ex = cc.CloudExecutor(env=FT_ENV, num_cpus=12, memory="12GB", time_limit="7 hours")

@cc.service(executor=agent_ex, name="Fine Tuner Agent", auth=False, volume=volume)
def agent(lattice, llm_api):

    """Initialize the agent. Not much to do, just store the input params."""

    return {"finetune_lattice": lattice, "llm_api": llm_api}


@agent.endpoint("/submit", streaming=True)
def submit(
    finetune_lattice, llm_api,
    *,
    task,
    data_format="[item] ## [label]",
    num_generations=5,
    target_items_per_response=10,
    min_new_examples=2000,
    model_to_finetune="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
):
    """Receives a task description, generates fine-tuning data,
    and dispatches the fine-tuning + deployment workflow."""

    yield "Generating fine-tuning data "

    iteration = 1
    new_examples = []
    while len(new_examples) < min_new_examples:

        texts = llm_api.generate_data( 
            task=task,
            return_format=data_format,
            num_generations=num_generations,
            target_items_per_response=target_items_per_response,
        )
        new_examples.extend(texts)

        yield "."
        iteration += 1

    yield f"\nGenerated {len(new_examples)} total examples\n"

    dataset_save_path = volume / f"data_{len(new_examples)}-{uuid4()}"
    yield f"Saving dataset at {dataset_save_path!s}\n"
    dataset = Dataset.from_dict({"text": new_examples})
    dataset_save_path.mkdir(parents=True, exist_ok=True)
    dataset.save_to_disk(dataset_save_path)

    cc.save_api_key(CC_API_KEY)

    yield "\nDispatching fine-tuning workflow\n"
    dispatch_id = cc.dispatch(finetune_lattice, volume=volume)(
        model_to_finetune, str(dataset_save_path), finetuned_llm_service
    )
    yield f"Dispatch ID:\n{dispatch_id}\n"
    yield "Fine tuning new model "
    result = None
    while result is None:
        res = cc.get_result(dispatch_id)
        res.result.load()
        result = res.result.value
        time.sleep(10)
        yield "."

    yield f"\nNew Service ID: {result.function_id!s}\n"

---

# Workflow: Fine Tune & Deploy

This workflow runs model fine-tuning on a powerful GPU and deploys the model as a service.

<div align="center">
<img src="./assets/finetune-workflow.png" alt="Highlight fine-tune and deploy workflow" height=550px/>
</div>

## Training configuration params

This dataclass holds the myriad fine-tuning parameter defaults for the PEFT/LoRA approach.

In [33]:
@dataclass
class FineTuneArguments:
    # BitAndBytesConfig
    load_in_4bit: bool = True
    bnb_4bit_quant_type: str = "nf4"
    bnb_4bit_compute_dtype: str = "float16"
    bnb_4bit_use_double_quant: bool = False

    # TrainingArguments
    output_dir: str = "./outputs"
    learning_rate: float = 2e-3
    num_train_epochs: int = 5
    save_total_limit: int = 1
    save_strategy: str = "epoch"
    per_device_train_batch_size: int = 2
    gradient_accumulation_steps: int = 1
    optim: str = "paged_adamw_32bit"
    weight_decay: float = 0.001
    fp16: bool = False
    bf16: bool = False
    max_grad_norm: float = 0.3
    max_steps: int = -1
    warmup_ratio: float = 0.03
    group_by_length: bool = True
    lr_scheduler_type: str = "cosine"
    report_to: str = "none"

    # LoraConfig
    lora_alpha: int = 32
    lora_dropout: float = 0.05
    r: int = 32
    bias: str = "none"
    task_type: str = "CAUSAL_LM"

    # SFTTrainer
    dataset_text_field: str = "text"
    max_seq_length: int = 1024
    packing: bool = True
    dataset_batch_size: int = 10

    @property
    def training_args(self):
        return TrainingArguments(
            output_dir=self.output_dir,
            num_train_epochs=self.num_train_epochs,
            per_device_train_batch_size=self.per_device_train_batch_size,
            gradient_accumulation_steps=self.gradient_accumulation_steps,
            optim=self.optim,
            save_strategy=self.save_strategy,
            save_total_limit=self.save_total_limit,
            learning_rate=self.learning_rate,
            weight_decay=self.weight_decay,
            fp16=self.fp16,
            bf16=self.bf16,
            max_grad_norm=self.max_grad_norm,
            max_steps=self.max_steps,
            warmup_ratio=self.warmup_ratio,
            group_by_length=self.group_by_length,
            lr_scheduler_type=self.lr_scheduler_type,
            report_to=self.report_to,
        )

    @property
    def lora_config(self):
        return LoraConfig(
            lora_alpha=self.lora_alpha,
            lora_dropout=self.lora_dropout,
            r=self.r,
            bias=self.bias,
            task_type=self.task_type,
        )

    @property
    def trainer_params(self):
        return {
            "dataset_text_field": self.dataset_text_field,
            "max_seq_length": self.max_seq_length,
            "packing": self.packing,
            "dataset_batch_size": self.dataset_batch_size,
        }

## Electrons (i.e. workflow tasks)

### Fine-tuning task

Run on:
- H100 GPU
- 32 GB RAM

Tasks exit and release resources after completion.

In [34]:
fine_tune_ex = cc.CloudExecutor(
    env=FT_ENV,
    num_cpus=6,
    num_gpus=1,
    gpu_type=GPU_TYPE.H100,
    memory="32GB",
    time_limit="7 hours"
)

In [35]:
@ct.electron(executor=fine_tune_ex)
def fine_tune_model_peft(model_path, dataset_path):

    """Run fine-tuning, save the model, and return the path to the saved model."""

    ft_args = FineTuneArguments()

    # Quantization configuration
    quant_config = BitsAndBytesConfig(
        load_in_4bit=ft_args.load_in_4bit,
        bnb_4bit_quant_type=ft_args.bnb_4bit_quant_type,
        bnb_4bit_compute_dtype=getattr(torch, ft_args.bnb_4bit_compute_dtype),
        bnb_4bit_use_double_quant=ft_args.bnb_4bit_use_double_quant,
    )

    # Load and configure the downloaded model from pretrained
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        quantization_config=quant_config,
        device_map="auto",
        do_sample=True,
    )
    model.config.use_cache = False
    model.config.pretraining_tp = 1

    # Load and configure the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    # Load dataset
    dataset_path_ = Path("/tmp") / Path(dataset_path).name
    shutil.copytree(dataset_path, dataset_path_)
    dataset_path = dataset_path_
    dataset = load_from_disk(dataset_path, keep_in_memory=True)

    # Set up supervised fine-tuning trainer
    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset,
        peft_config=ft_args.lora_config,
        tokenizer=tokenizer,
        args=ft_args.training_args,
        **ft_args.trainer_params,
    )

    # Run training
    trainer.train()

    # Save trained model
    new_model_path = volume / (model_path.split("/")[-1] + f"_{uuid4()}")
    trainer.model.save_pretrained(new_model_path)
    trainer.tokenizer.save_pretrained(new_model_path)

    return new_model_path

## Workflow: Fine tuning (launched by Main Agent)

In [36]:
cpu_ex = cc.CloudExecutor(env=FT_ENV, num_cpus=12, memory="12GB", time_limit="4 hours")

@ct.lattice(executor=cpu_ex, workflow_executor=cpu_ex)
def finetune_workflow(model_id, data_path, llm_service):

    """Run fine tuning, then deploy the fine tuned model."""

    ft_model_path = fine_tune_model_peft(model_id, data_path)
    service_info = llm_service(ft_model_path)

    return service_info

---

# Workflow: Setting up The Zero-Data Model Foundry

In [37]:
@ct.lattice(executor=cpu_ex, workflow_executor=cpu_ex)
def setup_workflow(ft_workflow, data_generator_model="unsloth/llama-3-8b-Instruct"):

    """Set up for everything."""

    data_generator_handle = llm_data_generator(data_generator_model)
    agent_handle = agent(ft_workflow, data_generator_handle)
    return agent_handle

In [38]:
dispatch_id = cc.dispatch(setup_workflow, volume=volume)(ft_workflow=finetune_workflow)

print("Workflow Dispatch ID: ", dispatch_id)

res = cc.get_result(dispatch_id, wait=True)
res.result.load()
main_agent = res.result.value

print("Main Agent Service ID: ", main_agent.function_id)

Output()

Workflow Dispatch ID:  79405a20-2cc9-453c-9ae0-f8f4dd4b049e
Main Agent Service ID:  66460db7f7d37dbf2a4689cb


---

# Invoking the Agent API

In [39]:
main_agent = cc.get_deployment("66460db7f7d37dbf2a4689cb")
print(main_agent)

╭────────────────────────────── Deployment Information ──────────────────────────────╮
│  Name          Fine Tuner Agent                                                    │
│  Description   Initialize the agent. Not much to do, just store the input params.  │
│  Function ID   66460db7f7d37dbf2a4689cb                                            │
│  Address       https://fn.prod.covalent.xyz/066460db7f7d37dbf2a4689cb              │
│  Status        ACTIVE                                                              │
│  Tags                                                                              │
│  Auth Enabled  No                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────╮
│ [3m                              POST /submit                              [0m │
│  Streaming    Yes                                    

## Example 1: Spoiler Detector

In [40]:
task = "Fine-tuning an LLM to detect whether or not a movie review contains a spoiler."
for k in main_agent.submit(task=task, min_new_examples=3000):
    print(k.decode(), end="")

Generating fine-tuning data .......................................................................
Generated 3014 total examples
Saving dataset at /volumes/model-storage/data_3014-a804ac4f-3b7d-4b61-8857-6e90ec6419a8

Dispatching fine-tuning workflow
Dispatch ID:
bb8c0ae6-6584-499b-89c1-e862ddb69239
Fine tuning new model ....................................................................
New Service ID: 664610f3f7d37dbf2a4689cf


In [58]:
# Load a client for the new spoiler agent
spoiler_agent = cc.get_deployment("664610f3f7d37dbf2a4689cf")

In [59]:
spoiler_agent.generate(
    prompt=(
        "This ripping action-adventure features stellar effects and a "
        "superb lead performance from Owen Teague as a timid simian "
        "who must rescue his clan from the clutches of a warlike tribe. ##"
    )
)
# Review of Kingdom of the Planet of the Apes (2024). Scored 90/100. [source: Metacritic]

'This ripping action-adventure features stellar effects and a superb lead performance from Owen Teague as a timid simian who must rescue his clan from the clutches of a warlike tribe. ## NO SPOILER'

In [60]:
spoiler_agent.generate(prompt="Add a lot of dull acting -- except Sir Ian McKellen and Andy Serkis -- and you have an uneven movie with yawns aplenty. ##").split("##")[-1].strip()
# Review of The Lord of the Rings: The Return of the King (2003). Scored 0/100. [source: Metacritic]

'SPOILER'

In [61]:
spoiler_agent.generate(prompt="Soylent green is people! ##").split("##")[-1].strip()

'SPOILER'

In [62]:
spoiler_agent.generate(prompt="Anyways, at the end you find out that the Planet of The Apes was Earth all along. ##").split("##")[-1].strip()

'SPOILER'

## Example 2: Grammar Corrector

In [46]:
task = "Examples of bot responses that correct grammatical errors."
data_format = '"<|user|>{input_sentence}</s><|assistant|>{corrected_sentence}"'

for k in main_agent.submit(task=task, data_format=data_format):
    print(k.decode(), end="")

Generating fine-tuning data ........................................
Generated 2000 total examples
Saving dataset at /volumes/model-storage/data_2000-a3e2a820-bd58-4dff-aa40-8c3954ccab7a

Dispatching fine-tuning workflow
Dispatch ID:
27000d76-343d-4674-9f27-c554e7cea481
Fine tuning new model .............................................................
New Service ID: 6646144af7d37dbf2a4689d3


In [63]:
grammar_agent = cc.get_deployment("6646144af7d37dbf2a4689d3")

In [64]:
prompt = "<|user|>{}</s><|assistant|>"
def correct_grammar(sentence):
    prompt_ = prompt.format(sentence)
    response = grammar_agent.generate(prompt=prompt_).split("<|assistant|>")[-1].strip()
    print(response)

correct_grammar("I should of never got a pet.")  # should've
correct_grammar("Jerry and me argued about it.")  # Jerry and I
correct_grammar("He said, 'No cat bites it's own tail.'")  # its
correct_grammar("But if I had to choose, Id rather get a dog then a cat.")  # I'd, than
correct_grammar("All dogs bite they're own tails.")  # dogs, their, tails

I should have never got a pet.
Jerry and I argued about it.
He said, 'No cat bites its own tail.'
But if I had to choose, I would rather get a dog than a cat.
All dogs bite their own tails.


## Example 3: Emoji translation

In [49]:
task = "Fine-tuning an LLM to translate a sentence without any emojis into a string of only emojis with roughly the same meaning."
for k in main_agent.submit(task=task, data_format="[sentence] | [matching emoji string]"):
    print(k.decode(), end="")

Generating fine-tuning data .................................................
Generated 2000 total examples
Saving dataset at /volumes/model-storage/data_2000-e089dd31-cb29-4f42-baff-1e23ef4b579f

Dispatching fine-tuning workflow
Dispatch ID:
81c30729-c84c-493a-a1a2-c2915aab5cdd
Fine tuning new model ..............................................................
New Service ID: 66461753f7d37dbf2a4689d7


In [65]:
emoji_agent = cc.get_deployment("66461753f7d37dbf2a4689d7")

In [66]:
emoji_agent.generate(prompt="Dancing with my cat | ")

'Dancing with my cat | 🐈💕'

In [67]:
emoji_agent.generate(prompt="Got a brand new car | ")

'Got a brand new car | 🚗💸'

In [68]:
emoji_agent.generate(prompt="Paint me a scenic mountain, Mr. Ross | ")

'Paint me a scenic mountain, Mr. Ross | 🏔️'

In [69]:
emoji_agent.generate(prompt="Let's eat! | ")

"Let's eat! | 🍴👅"

In [70]:
emoji_agent.generate(prompt="Burning Man | ")

'Burning Man | 🔥'