# 🤗 Welcome to AdalFlow!
## The PyTorch library to auto-optimize any LLM task pipelines

Thanks for trying us out, we're here to provide you with the best LLM application development experience you can dream of 😊 any questions or concerns you may have, [come talk to us on discord,](https://discord.gg/ezzszrRZvT) we're always here to help! ⭐ <i>Star us on <a href="https://github.com/SylphAI-Inc/AdalFlow">Github</a> </i> ⭐


# Quick Links

Github repo: https://github.com/SylphAI-Inc/AdalFlow

Full Tutorials: https://adalflow.sylph.ai/index.html#.

Deep dive on each API: check out the [developer notes](https://adalflow.sylph.ai/tutorials/index.html).

Common use cases along with the auto-optimization:  check out [Use cases](https://adalflow.sylph.ai/use_cases/index.html).

## 📖 Outline

In this tutorial, we will cover the auto-optimization of a standard RAG:

- Introducing HotPotQA dataset and HotPotQAData class.

- Convert Dspy’s Retriever to AdalFlow’s Retriever to easy comparison.

- Build the standard RAG with Retriever and Generator components.

- Learn how to connect the output-input between components to enable auto-text-grad optimization.


# Installation

1. Use `pip` to install the `adalflow` Python package. We will need `openai`, `groq` from the extra packages.

  ```bash
  pip install adalflow[openai,groq]
  ```
2. Setup  `openai` and `groq` API key in the environment variables

You can choose to use different client. You can import the model client you prefer. We support `Anthropic`, `Cohere`, `Google`, `GROQ`, `OpenAI`, `Transformer` and more in development. We will use OpenAI here as an example.Please refer to our [full installation guide](https://adalflow.sylph.ai/get_started/installation.html)

In [17]:
from IPython.display import clear_output

!pip install -U adalflow[openai] # also install the package for the model client you'll use
!pip install dspy
!pip install datasets
clear_output()

In [None]:
!pip uninstall httpx anyio -y
!pip install "anyio>=3.1.0,<4.0"
!pip install httpx==0.24.1

## Set Environment Variables

Run the following code and pass your api key.

Note: for normal `.py` projects, follow our [official installation guide](https://lightrag.sylph.ai/get_started/installation.html).

*Go to [OpenAI](https://platform.openai.com/docs/introduction) to get API keys if you don't already have.*

In [3]:
import os

from getpass import getpass

# Prompt user to enter their API keys securely
openai_api_key = getpass("Please enter your OpenAI API key: ")


# Set environment variables
os.environ["OPENAI_API_KEY"] = openai_api_key

print("API keys have been set.")

Please enter your OpenAI API key: ··········
API keys have been set.


In [20]:
import dspy
import re
from typing import List, Union, Optional, Dict, Callable, Any, Tuple
from dataclasses import dataclass, field
import adalflow as adal
from adalflow.optim.parameter import Parameter, ParameterType
from adalflow.datasets.hotpot_qa import HotPotQA, HotPotQAData
from adalflow.datasets.types import Example
from adalflow.core.types import RetrieverOutput
from adalflow.core import Component, Generator
from adalflow.core.retriever import Retriever
from adalflow.core.component import fun_to_component
from adalflow.components.model_client.openai_client import OpenAIClient

In [None]:
gpt_4o_model = {
    "model_client": OpenAIClient(),
    "model_kwargs": {
        "model": "gpt-4o-mini",
        "max_tokens": 2000,
    },
}

gpt_3_model = {
    "model_client": OpenAIClient(),
    "model_kwargs": {
        "model": "gpt-3.5-turbo",
        "max_tokens": 2000,
    },
}

In [22]:
def load_datasets():

    trainset = HotPotQA(split="train", size=20)
    valset = HotPotQA(split="val", size=50)
    testset = HotPotQA(split="test", size=50)
    print(f"trainset, valset: {len(trainset)}, {len(valset)}, example: {trainset[0]}")
    return trainset, valset, testset


@dataclass
class AnswerData(adal.DataClass):
    reasoning: str = field(
        metadata={"desc": "The reasoning to produce the answer"},
    )
    answer: str = field(
        metadata={"desc": "The answer you produced"},
    )

    __output_fields__ = ["reasoning", "answer"]


dataset = HotPotQA(split="train", size=20)
print(dataset[0], type(dataset[0]))

HotPotQAData(
    id="5a8b57f25542995d1e6f1371",
    question="Were Scott Derrickson and Ed Wood of the same nationality?",
    answer="yes",
    gold_titles="{'Scott Derrickson', 'Ed Wood'}",
)

HotPotQAData(id='5a8b57f25542995d1e6f1371', question='Were Scott Derrickson and Ed Wood of the same nationality?', answer='yes', gold_titles="{'Scott Derrickson', 'Ed Wood'}") <class 'adalflow.datasets.types.HotPotQAData'>


HotPotQAData(id='5a8b57f25542995d1e6f1371', question='Were Scott Derrickson and Ed Wood of the same nationality?', answer='yes', gold_titles="{'Scott Derrickson', 'Ed Wood'}")

In [23]:
class DspyRetriever(adal.Retriever):
    def __init__(self, top_k: int = 3):
        super().__init__()
        self.top_k = top_k
        self.dspy_retriever = dspy.Retrieve(k=top_k)

    def call(
        self, input: str, top_k: Optional[int] = None
    ) -> List[adal.RetrieverOutput]:

        k = top_k or self.top_k

        output = self.dspy_retriever(query_or_queries=input, k=k)
        final_output: List[RetrieverOutput] = []
        documents = output.passages

        final_output.append(
            RetrieverOutput(
                query=input,
                documents=documents,
                doc_indices=[],
            )
        )
        return final_output


def test_retriever():
    question = "How many storeys are in the castle that David Gregory inherited?"
    retriever = DspyRetriever(top_k=3)
    retriever_out = retriever(input=question)
    print(f"retriever_out: {retriever_out}")


def call(
    self, question: str, id: Optional[str] = None
) -> Union[adal.GeneratorOutput, adal.Parameter]:
    prompt_kwargs = self._prepare_input(question)
    output = self.llm(prompt_kwargs=prompt_kwargs, id=id)
    return output


def call(self, question: str, id: str = None) -> adal.GeneratorOutput:
    if self.training:
        raise ValueError("This component is not supposed to be called in training mode")

    retriever_out = self.retriever.call(input=question)

    successor_map_fn = lambda x: (  # noqa E731
        "\n\n".join(x[0].documents) if x and x[0] and x[0].documents else ""
    )
    retrieved_context = successor_map_fn(retriever_out)

    prompt_kwargs = {
        "context": retrieved_context,
        "question": question,
    }

    output = self.llm.call(
        prompt_kwargs=prompt_kwargs,
        id=id,
    )
    return output


def forward(self, question: str, id: str = None) -> adal.Parameter:
    if not self.training:
        raise ValueError("This component is not supposed to be called in eval mode")
    retriever_out = self.retriever.forward(input=question)
    successor_map_fn = lambda x: (  # noqa E731
        "\n\n".join(x.data[0].documents)
        if x.data and x.data[0] and x.data[0].documents
        else ""
    )
    retriever_out.add_successor_map_fn(successor=self.llm, map_fn=successor_map_fn)
    generator_out = self.llm.forward(
        prompt_kwargs={"question": question, "context": retriever_out}, id=id
    )
    return generator_out


def bicall(
    self, question: str, id: str = None
) -> Union[adal.GeneratorOutput, adal.Parameter]:
    """You can also combine both the forward and call in the same function.
    Supports both training and eval mode by using __call__ for GradComponents
    like Retriever and Generator
    """
    retriever_out = self.retriever(input=question)
    if isinstance(retriever_out, adal.Parameter):
        successor_map_fn = lambda x: (  # noqa E731
            "\n\n".join(x.data[0].documents)
            if x.data and x.data[0] and x.data[0].documents
            else ""
        )
        retriever_out.add_successor_map_fn(successor=self.llm, map_fn=successor_map_fn)
    else:
        successor_map_fn = lambda x: (  # noqa E731
            "\n\n".join(x[0].documents) if x and x[0] and x[0].documents else ""
        )
        retrieved_context = successor_map_fn(retriever_out)
    prompt_kwargs = {
        "context": retrieved_context,
        "question": question,
    }
    output = self.llm(prompt_kwargs=prompt_kwargs, id=id)
    return output


task_desc_str = r"""Answer questions with short factoid answers.

You will receive context(may contain relevant facts) and a question.
Think step by step."""


class VanillaRAG(adal.GradComponent):
    def __init__(self, passages_per_hop=3, model_client=None, model_kwargs=None):
        super().__init__()

        self.passages_per_hop = passages_per_hop

        self.retriever = DspyRetriever(top_k=passages_per_hop)
        self.llm_parser = adal.DataClassParser(
            data_class=AnswerData, return_data_class=True, format_type="json"
        )
        self.llm = Generator(
            model_client=model_client,
            model_kwargs=model_kwargs,
            prompt_kwargs={
                "task_desc_str": adal.Parameter(
                    data=task_desc_str,
                    role_desc="Task description for the language model",
                    param_type=adal.ParameterType.PROMPT,
                ),
                "few_shot_demos": adal.Parameter(
                    data=None,
                    requires_opt=True,
                    role_desc="To provide few shot demos to the language model",
                    param_type=adal.ParameterType.DEMOS,
                ),
                "output_format_str": self.llm_parser.get_output_format_str(),
            },
            template=answer_template,
            output_processors=self.llm_parser,
            use_cache=True,
        )


class VallinaRAGAdal(adal.AdalComponent):
    def __init__(
        self,
        model_client: adal.ModelClient,
        model_kwargs: Dict,
        backward_engine_model_config: Dict | None = None,
        teacher_model_config: Dict | None = None,
        text_optimizer_model_config: Dict | None = None,
    ):
        task = VanillaRAG(
            model_client=model_client,
            model_kwargs=model_kwargs,
            passages_per_hop=3,
        )
        eval_fn = AnswerMatchAcc(type="fuzzy_match").compute_single_item
        loss_fn = adal.EvalFnToTextLoss(
            eval_fn=eval_fn, eval_fn_desc="fuzzy_match: 1 if str(y) in str(y_gt) else 0"
        )
        super().__init__(
            task=task,
            eval_fn=eval_fn,
            loss_fn=loss_fn,
            backward_engine_model_config=backward_engine_model_config,
            teacher_model_config=teacher_model_config,
            text_optimizer_model_config=text_optimizer_model_config,
        )

    # tell the trainer how to call the task
    def prepare_task(self, sample: HotPotQAData) -> Tuple[Callable[..., Any], Dict]:
        if self.task.training:
            return self.task.forward, {"question": sample.question, "id": sample.id}
        else:
            return self.task.call, {"question": sample.question, "id": sample.id}

    # eval mode: get the generator output, directly engage with the eval_fn
    def prepare_eval(self, sample: HotPotQAData, y_pred: adal.GeneratorOutput) -> float:
        y_label = ""
        if y_pred and y_pred.data and y_pred.data.answer:
            y_label = y_pred.data.answer
        return self.eval_fn, {"y": y_label, "y_gt": sample.answer}

    # train mode: get the loss and get the data from the full_response
    def prepare_loss(self, sample: HotPotQAData, pred: adal.Parameter):
        # prepare gt parameter
        y_gt = adal.Parameter(
            name="y_gt",
            data=sample.answer,
            eval_input=sample.answer,
            requires_opt=False,
        )

        # pred's full_response is the output of the task pipeline which is GeneratorOutput
        pred.eval_input = (
            pred.full_response.data.answer
            if pred.full_response
            and pred.full_response.data
            and pred.full_response.data.answer
            else ""
        )
        return self.loss_fn, {"kwargs": {"y": pred, "y_gt": y_gt}}


def train_diagnose(
    model_client: adal.ModelClient,
    model_kwargs: Dict,
) -> Dict:

    trainset, valset, testset = load_datasets()

    adal_component = VallinaRAGAdal(
        model_client,
        model_kwargs,
        backward_engine_model_config=gpt_4o_model,
        teacher_model_config=gpt_3_model,
        text_optimizer_model_config=gpt_3_model,
    )
    trainer = adal.Trainer(adaltask=adal_component)
    trainer.diagnose(dataset=trainset, split="train")
    # trainer.diagnose(dataset=valset, split="val")
    # trainer.diagnose(dataset=testset, split="test")

# Issues and feedback

If you encounter any issues, please report them here: [GitHub Issues](https://github.com/SylphAI-Inc/LightRAG/issues).

For feedback, you can use either the [GitHub discussions](https://github.com/SylphAI-Inc/LightRAG/discussions) or [Discord](https://discord.gg/ezzszrRZvT).