!pip install "regex~=2023.10.3" dspy-ai  # DSPy requires an old version of regex that conflicts with the installed version on Colab
!pip install arize-phoenix openinference-instrumentation-dspy opentelemetry-exporter-otlp

In [1]:
import os
from dotenv import load_dotenv

import dspy
import openai
import phoenix as px

* 'allow_population_by_field_name' has been renamed to 'populate_by_name'
* 'smart_union' has been removed
  from .autonotebook import tqdm as notebook_tqdm


In [17]:
load_dotenv("/media/uberdev/ddrv/gitFolders/python_de_learners_data/.env")

True

## 2. Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

## 3. Configure Module Components

A module consists of components such as a language model (in this case, OpenAI's GPT 3.5 turbo), akin to the layers of a PyTorch module and a retriever (in this case, ColBERTv2).

In [2]:
turbo = dspy.OpenAI(model="gpt-4o-mini")
colbertv2_wiki17_abstracts = dspy.ColBERTv2(
    url="http://20.102.90.50:2017/wiki17_abstracts"  # endpoint for a hosted ColBERTv2 service
)

dspy.settings.configure(lm=turbo,
                        rm=colbertv2_wiki17_abstracts)

## 4. Load Data

Load a subset of the HotpotQA dataset.

In [3]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=10)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs("question") for x in dataset.train]
devset = [x.with_inputs("question") for x in dataset.dev]

print(f"Train set size: {len(trainset)}")
print(f"Dev set size: {len(devset)}")

Downloading builder script: 100%|██████████| 6.42k/6.42k [00:00<00:00, 13.2kB/s]
Downloading readme: 100%|██████████| 9.19k/9.19k [00:00<00:00, 16.4kB/s]
Downloading data: 100%|██████████| 566M/566M [01:26<00:00, 6.53MB/s]   
Downloading data: 100%|██████████| 47.5M/47.5M [00:09<00:00, 4.84MB/s]
Downloading data: 100%|██████████| 46.2M/46.2M [00:08<00:00, 5.70MB/s]
Generating train split: 100%|██████████| 90447/90447 [00:56<00:00, 1594.74 examples/s]
Generating validation split: 100%|██████████| 7405/7405 [00:04<00:00, 1720.33 examples/s]
Generating test split: 100%|██████████| 7405/7405 [00:03<00:00, 2039.87 examples/s]


Train set size: 20
Dev set size: 50


Each example in our training set has a question and a human-annotated answer.

In [4]:
train_example = trainset[0]
train_example

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})

Examples in the dev set have a third field containing titles of relevant Wikipedia articles.

In [5]:
dev_example = devset[18]
dev_example

Example({'question': 'What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?', 'answer': 'English', 'gold_titles': {'Restaurant: Impossible', 'Robert Irvine'}}) (input_keys={'question'})

## 5. Define Your RAG Module

Define a signature that takes in two inputs, `context` and `question`, and outputs an `answer`. The signature provides:

- A description of the sub-task the language model is supposed to solve.
- A description of the input fields to the language model.
- A description of the output fields the language model must produce.

In [6]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Define your module by subclassing `dspy.Module` and overriding the `forward` method.

In [7]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

This module uses retrieval-augmented generation (using the previously configured ColBERTv2 retriever) in tandem with chain of thought in order to generate the final answer to the user.

## 6. Compile Your RAG Module

In this case, we'll use the default `BootstrapFewShot` teleprompter that selects good demonstrations from the the training dataset for inclusion in the final prompt.

In [8]:
len(trainset)

20

## 7. Instrument DSPy and Launch Phoenix

In [9]:
# getting phoenix server
# This need not be run, as the server is running locally on your machine
phoenix_session = px.launch_app()

INFO:phoenix.config:📋 Ensuring phoenix working directory: /home/uberdev/.phoenix


🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


ERROR [strawberry.execution] Cannot convert value to AST: {}.

GraphQL request:61:7
60 |       type { ...TypeRef }
61 |       defaultValue
   |       ^
62 |       isDeprecated
Traceback (most recent call last):
  File "/media/uberdev/ddrv/telemetenv/lib/python3.10/site-packages/graphql/execution/execute.py", line 521, in execute_field
    result = resolve_fn(source, info, **args)
  File "/media/uberdev/ddrv/telemetenv/lib/python3.10/site-packages/graphql/type/introspection.py", line 485, in default_value
    value_ast = ast_from_value(item[1].default_value, item[1].type)
  File "/media/uberdev/ddrv/telemetenv/lib/python3.10/site-packages/graphql/utilities/ast_from_value.py", line 63, in ast_from_value
    ast_value = ast_from_value(value, type_.of_type)
  File "/media/uberdev/ddrv/telemetenv/lib/python3.10/site-packages/graphql/utilities/ast_from_value.py", line 136, in ast_from_value
    raise TypeError(f"Cannot convert value to AST: {inspect(serialized)}.")
TypeError: Cannot convert 

In [11]:
# Other libraries are installed when phoenix-arize is installed
!pip install openinference-instrumentation-dspy

I0000 00:00:1723809344.767450   22964 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


[0mCollecting openinference-instrumentation-dspy
  Using cached openinference_instrumentation_dspy-0.1.11-py3-none-any.whl (13 kB)
[0mInstalling collected packages: openinference-instrumentation-dspy
[0mSuccessfully installed openinference-instrumentation-dspy-0.1.11


In [1]:
from dotenv import load_dotenv
load_dotenv("/media/uberdev/ddrv/gitFolders/python_de_learners_data/.env")

True

In [2]:
from openinference.instrumentation.dspy import DSPyInstrumentor
# instruments the internal calls in DSPy library
from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
# help to get the span of http requests to the APIs
from opentelemetry.sdk import trace as trace_sdk
#processes the data collected from the spans
from opentelemetry.sdk.resources import Resource
from openinference.semconv.resource import ResourceAttributes
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

In [13]:
from dspy.teleprompt import BootstrapFewShot

In [14]:
# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    # Need to check what can be given in the place of None for the trace
    answer_EM = dspy.evaluate.answer_exact_match(example,
                                                 pred)
    answer_PM = dspy.evaluate.answer_passage_match(example,
                                                   pred)
    return answer_EM and answer_PM

In [15]:
input_module = RAG()
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

In [18]:
endpoint = "http://127.0.0.1:6006/v1/traces"
resource = Resource(attributes={})

tracer_provider = trace_sdk.TracerProvider(resource=resource)
span_otlp_exporter = OTLPSpanExporter(endpoint=endpoint)

tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter=span_otlp_exporter))

trace_api.set_tracer_provider(tracer_provider=tracer_provider)

DSPyInstrumentor().instrument(skip_dep_check=True)

In [20]:
compiled_module = teleprompter.compile(input_module,
                                       trainset=trainset)

 55%|█████▌    | 11/20 [00:22<00:18,  2.03s/it]

Bootstrapped 4 full traces after 12 examples in round 0.





Then instrument your application with [OpenInference](https://github.com/Arize-ai/openinference/tree/main/spec), an open standard build atop [OpenTelemetry](https://opentelemetry.io/) that captures and stores LLM application executions. OpenInference provides telemetry data to help you understand the invocation of your LLMs and the surrounding application context, including retrieval from vector stores, the usage of external tools or APIs, etc.

## 8. Run Your Application

Let's run our DSPy application on the dev set.

In [None]:
for example in devset:
    question = example["question"]
    prediction = compiled_module(question)
    print("Question")
    print("========")
    print(question)
    print()
    print("Predicted Answer")
    print("================")
    print(prediction.answer)
    print()
    print("Retrieved Contexts (truncated)")
    print(f"{[c[:200] + '...' for c in prediction.context]}")
    print()
    print()

Check the Phoenix UI to inspect the architecture of your DSPy module.

In [None]:
print(phoenix_session.url)

A few things to note:

- The spans in each trace correspond to the steps in the `forward` method of our custom subclass of `dspy.Module`,
- The call to `ColBERTv2` appears as a retriever span with retrieved documents and scores displayed for each forward pass,
- The LLM span includes the fully-formatted prompt containing few-shot examples computed by DSPy during compilation.

![a tour of your traces and spans in DSPy, highlighting retriever and LLM spans in particular](https://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/dspy-tracing-tutorial/dspy_spans_and_traces.gif)

Congrats! You've used DSPy to bootstrap a multishot prompt with hard negative passages and chain of thought, and you've used Phoenix to observe the inner workings of DSPy and understand the internals of the forward pass.