# 🤗 End-to-end distilabel example with Inference Endpoints and Notus

In [28]:
import os
import time
from typing import Dict

import argilla as rg

from distilabel.llm import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline, pipeline
from distilabel.tasks import Llama2TextGenerationTask, SelfInstructTask, Prompt

from datasets import Dataset

In [29]:
os.environ["HF_TOKEN"] = "hf_"
os.environ["OPENAI_API_KEY"] = "sk-"
os.environ["ARGILLA_API_URL"] = "https://argilla-ultrafeedback-curator.hf.space"
os.environ["ARGILLA_API_KEY"] = "admin.apikey"

## Setting up an inference endpoint with Notus

To kickstart this tutorial, let's see how to set up and endpoint for our Notus model. A HuggingFace endpoint is a service provided by HuggingFace that allows you to deploy and host your machine learning models for inference. This way, we'll have faster inference times, as these models will not run in our personal machines, but in HuggingFace servers. The endpoint of choice has a [Notus 7B instance](https://ui.endpoints.huggingface.co/argilla/endpoints/aws-notus-7b-v1-4052) running.

Let's see a quick example of how to use an inference endpoint. We have prepared an easy `Llama2QuestionAnsweringTask` to ask question to the model, in a very similar way as we talk with the LLMs using chatbots.

In [3]:
class Llama2QuestionAnsweringTask(Llama2TextGenerationTask):
    def generate_prompt(self, question: str) -> str:
        return Prompt(
            system_prompt=self.system_prompt,
            formatted_prompt=question,
        ).format_as("llama2")  # type: ignore

    def parse_output(self, output: str) -> Dict[str, str]:
        return {"answer": output.strip()}

    def input_args_names(self) -> list[str]:
        return ["question"]

    def output_args_names(self) -> list[str]:
        return ["answer"]

Once this class is ready, we have to instantiate an `InferenceEndpointsLLM` object, and pass as parameters the HF Inference Endpoint name and the HF namespace. One very convenient way to do so is through environment variables.

In [4]:
os.environ["HF_INFERENCE_ENDPOINT_NAME"] = "aws-notus-7b-v1-4052"
os.environ["HF_NAMESPACE"] = "argilla"

A HuggingFace Token is also required to use HuggingFace's services.

In [6]:
llm = InferenceEndpointsLLM(
    endpoint_name=os.getenv("HF_INFERENCE_ENDPOINT_NAME"),  # type: ignore
    endpoint_namespace=os.getenv("HF_NAMESPACE"),  # type: ignore
    token=os.getenv("HF_TOKEN") or None,
    task=Llama2QuestionAnsweringTask(),
)

The `llm` is an object of the `InferenceEndpointsLLM` class, and through it we can start generating answers to question using the `llm.generate()` method.

In [7]:
generation = llm.generate([{"question": "What's the capital of Spain?"}])
generation[0][0]["parsed_output"]["answer"]

"The capital of Spain is Madrid. It is the largest city in Spain and the third-largest city in the European Union. Madrid is known for its rich history, art, and culture, and is home to many famous landmarks, such as the Prado Museum, the Royal Palace of Madrid, and the Retiro Park. Madrid is also a major economic and financial center in Europe, and is home to many international companies and organizations. The city is known for its vibrant nightlife, delicious cuisine, and friendly people. Madrid is a great destination for travelers looking to experience the best of Spain's culture"

The endpoint is working! We now can do inference through the Inference Endpoint.

## Generating instructions with SelfInstructTask

With out Inference Endpoint up and running, we should be able to generate instructions with distilabel. These instructions, made by the LLM through our endpoint, will form an instruction dataset.

Firstly, let's provide some topics:

In [8]:
finance_topics = [
    "Budgeting and financial planning",
    "Investing (stocks, bonds, mutual funds, ETFs, real estate, etc.)",
    "Personal finance (saving, banking, insurance, retirement planning, etc.)",
    "Corporate finance (capital structure, dividend policy, working capital management, etc.)",
    "Financial statement analysis (balance sheet, income statement, cash flow statement)",
    "Cost accounting (cost classification, cost behavior, cost estimation, etc.)",
    "Financial modeling (discounted cash flow models, Monte Carlo simulations, etc.)",
    "Risk management (hedging, diversification, risk assessment, etc.)",
    "International finance (exchange rates, international parity conditions, currency derivatives, etc.)",
    "Behavioral finance (prospect theory, cognitive biases, behavioral portfolio theory, etc.)",
    "Alternative investments (private equity, hedge funds, commodities, cryptocurrencies, etc.)",
    "Islamic finance (Shariah-compliant financing, sukuk, mudarabah, musharakah, etc.)",
    "FinTech (financial technology, payment systems, digital currencies, robo-advisory, etc.)",
    "Quantitative finance (algorithmic trading, high-frequency trading, statistical arbitrage, etc.)",
    "Fixed income (bond valuation, yield curves, duration, convexity, etc.)",
    "Derivatives (options, futures, forwards, swaps, etc.)",
    "Taxation (tax planning, tax compliance, tax implications of financial decisions, etc.)",
    "Estate planning (will, trusts, inheritance, wealth transfer, etc.)",
    "Regulatory environment (financial regulations, compliance, regulatory bodies, etc.)",
    "Ethics in finance (professional ethics, code of conduct, corporate governance, etc.)"
]

In [9]:
instructions_dataset = Dataset.from_dict({
    "input": finance_topics
})

As you can see, our topics are related to novels, Fantasy, Sci-fi and cool stories! Picture this: a LLM capable of creating characters for a fictional universe, with backstories, cool names and amazing superpowers. These topics will be used to generate instructions for the LLMs to run, as the are grouped in a `Dataset` class.

Now, for our generator to work, we need to create an Instruction Task, so the endpoint know how to treate these topics as inputs to generate the desired instructions. Using the `SelfInstructTask` class, which is based on a framework that uses the model's own generations to create a large collection of instructional data, we can generate instructions given the topics and a brief description of the desired behaviour.

In [10]:
instructions_task = SelfInstructTask(
    application_description="A assistant that can answer questions about differente advanced finance topics."
)

Let's now define a generator, passing the `SelfInstructTask` object, and create a `Pipeline` object.

In [11]:
instructions_generator = InferenceEndpointsLLM(
    endpoint_name=os.getenv("HF_INFERENCE_ENDPOINT_NAME"),  # type: ignore
    endpoint_namespace=os.getenv("HF_NAMESPACE"),  # type: ignore
    token=os.getenv("HF_TOKEN") or None,
    task=instructions_task,
)

instructions_pipeline = Pipeline(
    generator=instructions_generator
)

Our pipeline is ready to be used to generate instructions. Let's do it!

In [14]:
generated_instructions = instructions_pipeline.generate(dataset=instructions_dataset, num_generations=4, batch_size=2)

Flattening the indices: 100%|██████████| 20/20 [00:00<00:00, 1899.90 examples/s]
Map: 100%|██████████| 20/20 [00:00<00:00, 4945.53 examples/s]


Our pipeline has succesfully generated instructions given the topics and the behaviour passed as input. Let's gather all those instructions and see how the look.

In [17]:
instructions = []
for generations in generated_instructions["generations"]:
    for generation in generations:
        instructions.extend(generation)

print(f"Number of generated instructions: {len(instructions)}")

for instruction in instructions[:5]:
    print(instruction)

Number of generated instructions: 704
1. How can I create a budget for my monthly expenses?
2. What are the best ways to save money on groceries?
3. How can I reduce my credit card debt?
4. What is the significance of having an emergency fund?
5. Detail the process of creating a retirement plan.


These instruction are really usefull in our story-making task, as we can start building a fictional world by just answering them.

## Generate a Preference Dataset using an Ultrafeedback text quality task.

Another possibility with Distilabel is to create a Preference Dataset through an Ultrafeedback text quality task. It's a type of task used in NLP to evaluate the quality of text generated. Our goal is to provide detailed feedback on the quality of the generated text, beyond just a binary label. 

Our `pipeline()` method allows us to create a `Pipeline` instance with the provided LLMs for a given task, which is useful whenever you want to use a pre-defined or custom `Pipeline` for a given task. We will specify our task and subtask, the generator we want to use (in this case, one based in a Llama2 Text Generator Task) and our OpenAI API key.

In [18]:
preference_pipeline = pipeline(
    "preference",
    "text-quality",
    generator=InferenceEndpointsLLM(
        endpoint_name=os.getenv("HF_INFERENCE_ENDPOINT_NAME"),  # type: ignore
        endpoint_namespace=os.getenv("HF_NAMESPACE", None),
        task=Llama2TextGenerationTask(),
        max_new_tokens=256,
        num_threads=2,
        temperature=0.3,
    ),
    max_new_tokens=256,
    num_threads=2,
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    temperature=0.0,
)

Now, let's build a dataset by using the pipeline we just created, and the topics from which our instructions were generated. They are still valid, as we want to create a preference dataset still focus on writing characters and stories.

In [19]:
preference_dataset = preference_pipeline.generate(
    instructions_dataset,  # type: ignore
    num_generations=2,
    batch_size=1,
    enable_checkpoints=True,
    display_progress_bar=True,
)

Flattening the indices: 100%|██████████| 20/20 [00:00<00:00, 2490.38 examples/s]
Map: 100%|██████████| 20/20 [00:00<00:00, 2774.29 examples/s]


Let's take a look at an instance of the preference dataset

In [30]:
preference_dataset[0]

{'input': 'Budgeting and financial planning',
 'generation_model': 'argilla/notus-7b-v1',
 'generation_prompt': "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>\n\nBudgeting and financial planning [/INST]",
 'raw_generation_responses': ["\n\n1. What is budgeting and financial planning?\nBudgeting and financial planning are essential processes that help individuals and families achieve their financial goals. Budgeting involves creating a plan for how to allocate your income to cover your expenses, savings, an

## Setting up an Argilla HF Space to upload the resulting dataset.

In [31]:
rg.init(
    api_url=os.getenv("ARGILLA_API_URL"), api_key=os.getenv("ARGILLA_API_KEY")
)

In [32]:
# Uploading the Preference Dataset
preference_rg_dataset = preference_dataset.to_argilla()
preference_rg_dataset.push_to_argilla(name=f"notus_ultrafeedback_preference", workspace="admin")

RemoteFeedbackDataset(
   id=bec871b4-913f-4084-9b84-294b119a0f63
   name=notus_ultrafeedback_preference
   workspace=Workspace(id=1debb02b-ca36-4807-86f9-c493b7dbc30e, name=admin, inserted_at=2023-11-13 18:32:30.307392, updated_at=2023-11-13 18:32:30.307392)
   url=https://argilla-ultrafeedback-curator.hf.space/dataset/bec871b4-913f-4084-9b84-294b119a0f63/annotation-mode
   fields=[RemoteTextField(id=UUID('c6f52946-0057-41cf-893d-7d0b94ff01ab'), client=None, name='input', title='input', required=True, type='text', use_markdown=False), RemoteTextField(id=UUID('10ea339f-0299-4eee-9d36-991ed98196fb'), client=None, name='generations-1', title='generations-1', required=True, type='text', use_markdown=False), RemoteTextField(id=UUID('e39b210d-4ae1-4384-8ac1-79b019ae6c34'), client=None, name='generations-2', title='generations-2', required=True, type='text', use_markdown=False)]
   questions=[RemoteRatingQuestion(id=UUID('03190684-2176-4433-b4a7-d5b9c06ccb3c'), client=None, name='generations