# Aligning a model on Arcee Cloud with Direct Preference Optimization (DPO)

In this notebook, you will learn how to align a model with DPO on Arcee Cloud.

In order to run this demo, you need a Starter account on Arcee Cloud. Please see our [pricing](https://www.arcee.ai/pricing) page for details.

The Arcee documentation is available at [docs.arcee.ai](https://docs.arcee.ai/deployment/start-deployment).

## Prerequisites

Please [sign up](https://app.arcee.ai/account/signup) to Arcee Cloud and create an [API key](https://docs.arcee.ai/getting-arcee-api-key/getting-arcee-api-key).

Then, please update the cell below with your API key. Remember to keep this key safe, and **DON'T COMMIT IT to one of your repositories**.

In [None]:
%env ARCEE_API_KEY=YOUR_API_KEY

Create a new Python environment (optional but recommended) and install [arcee-python](https://github.com/arcee-ai/arcee-python).

In [None]:
# Uncomment the next three lines to create a virtual environment
#!pip install -q virtualenv
#!virtualenv -q arcee-cloud
#!source arcee-cloud/bin/activate

%pip install -qU arcee-py

In [None]:
import arcee
import pprint

# Aligning the model

At the moment, the DPO dataset is not configurable. We use the [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset. It consists of 64k prompts, 256k responses from differents LLMs and 380k high-quality feedback provided by GPT-4. 

Here, we will run DPO on the [Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model we tuned for instruction following in the Supervised Fine-Tuning (SFT) notebook. You may remember that we used the [reasoning-share-gpt](https://huggingface.co/datasets/arcee-ai/reasoning-sharegpt) dataset.

We could pick any model available on the Hugging Face hub, or a model we've already worked with on Arcee Cloud.

Let's launch the alignment job with the `start_alignment()` API. It should last between 2 and 2.5 hours.

In [None]:
help(arcee.start_alignment)

In [None]:
alignment_name = "llama-3-8B-reasoning-share-gpt-dpo"

In [None]:
response=arcee.start_alignment(alignment_name=alignment_name,
                      #hf_model="meta-llama/Meta-Llama-3-8B",
                      alignment_model="llama-3-8B-reasoning-share-gpt",
                      alignment_type="dpo",
                      full_or_peft="peft"
)
print(response)

In [None]:
from time import sleep

while True:
    response = arcee.alignment_status(alignment_name)
    if response["processing_state"] == "processing":
        print("Alignment is in progress. Waiting 15 minutes before checking again.")
        sleep(900)
    else:
        print(response)
        break

## Deploying our aligned model

Once alignment is complete, we can deploy and test the aligned model. As part of the Arcee Cloud free tier, this is free of charge and the endpoint will be automatically shut down after 2 hours.

Deployment should take 5-7 minutes. Please see the model deployment sample notebook for details.

In [None]:
deployment_name = alignment_name

In [None]:
response = arcee.start_deployment(deployment_name=deployment_name, alignment=alignment_name)

while True:
    response = arcee.deployment_status(deployment_name)
    if response["deployment_processing_state"] == "pending":
        print("Deployment is in progress. Waiting 60 seconds before checking again.")
        sleep(60)
    else:
        print(response)
        break

Once the model endpoint is up and running, we can prompt the model with a domain-specific question.

In [None]:
#query = "Is Pluto a planet? Use markdown."
query = "I was supposed to fly to NYC but my connecting flight was cancelled. I'm now stuck in Omaha, Nebraska and it's 8PM. I have a meeting in Manhattan tomorrow at 10AM. What is my best option? Use markdown."

response = arcee.generate(deployment_name=deployment_name, query=query)

In [None]:
from IPython.display import display, Markdown

display(Markdown(response["text"]))

## Stopping our deployment

When we're done working with our model, we should stop the deployment to save resources and avoid unwanted charges.

The `stop_deployment()` API only requires the deployment name.

In [None]:
arcee.stop_deployment(deployment_name=deployment_name)
arcee.deployment_status(deployment_name)

This concludes the model alignment demonstration. Thank you for your time!

If you'd like to know more about using Arcee Cloud in your organization, please visit the [Arcee website](https://www.arcee.ai), or contact [sales@arcee.ai](mailto:sales@arcee.ai).
