## Notes
This is a good way to approach it: https://github.com/Scale3-Labs/dspy-examples/tree/main/src/summarization/programs

# Using DSPy and MLflow with a local LLM

DSPy (Declarative Self-improving Python) is an open-source framework that enables users to write Python code, rather than prompts, to build and direct LLMs. DSPy includes tools for directing the bahavior of LLMs, automatically optimize prompts and weights, and evaluate the performance of AI systems.

MLflow's native integration with DSPy allows you to track and visualize the performance of your AI systems and to log your DSPy programs as MLflow models.

In this notebook, we will use DSPy with Llama-3.2-3B-Instruct, running locally via LMStudio. Note that you can swap in any other model running on an OpenAI-compatible endpoint. We will use the DSPy to build a system for summarizing research papers from arXiv at different levels of complexity.

First, we will set up an MLflow experiment and enable DSPy autologging so we can easily view traces of all of our DSPy runs.

In [64]:
import dspy
import mlflow
from rich import print
from dotenv import load_dotenv
import os
load_dotenv()

mlflow.set_experiment("dspy-paper-summarization")
mlflow.dspy.autolog()

Next, we will configure the LLM we want to use. In this case, we will use Llama-3.2-3B-Instruct, running locally via LMStudio. Note that you can swap in any other model running on an OpenAI-compatible endpoint.

In [122]:
# Configure the LM with your local endpoint
lm_local = dspy.LM(
    "openai/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",  # model name
    api_base="http://localhost:1234/v1",  # local endpoint
    api_key="",  # empty api_key for local endpoint
    model_type='chat',  # specify chat model type
    cache=False  # disable caching
)

lm_anthropic = dspy.LM(
    "anthropic/claude-3-5-sonnet-20241022",
    api_key=os.getenv("ANTHROPIC_API_KEY"),
)

lm_openai = dspy.LM(
    "openai/gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY"),
)

# Set this as the default LM for DSPy
dspy.configure(lm=lm_local)

In [101]:
# verify that the LLM is working
print(lm_local("Please tell me what MLflow is for.")[0])

We can already examine the traces captured via DSPy autologging:

```{image} ../images/2024-11-27-dspy/1_mlflow_ui.png
:alt: DSPy traces in the MLflow UI
:width: 800px
:align: center
```

Now, to build our system, we need a few things:

1. A way to get the paper from an arxiv id
2. A way to specify the level of complexity of the summary we want
3. A way to evaluate the summary

Integrating these components will demonstrate how DSPy *is* Python: we don't have to think in terms of prompts; we can write Python code for both the deterministic components (e.g. getting the paper) and the LLM-powered components (e.g. summarizing the paper).

For now, we will just focus on extracting and operating on the text from the paper, but this approach could be extended to look at images or other data modalities as well.

In [38]:
import arxiv
import PyPDF2
import io
import requests

def get_paper_from_arxiv(arxiv_id: str) -> tuple[str, str]:
    """
    Download and extract text from an arxiv paper.
    
    Args:
        arxiv_id (str): The arxiv ID (e.g., '2311.12399' or 'arxiv:2311.12399')
    
    Returns:
        tuple[str, str]: (title, text) of the paper
    """
    client = arxiv.Client()

    # Clean the arxiv ID
    arxiv_id = arxiv_id.replace('arxiv:', '').strip()
    
    # Search for the paper
    search = client.results(arxiv.Search(id_list=[arxiv_id]))
    paper = next(search)
    
    # Download the PDF
    response = requests.get(paper.pdf_url)
    pdf_file = io.BytesIO(response.content)
    
    # Extract text from PDF
    reader = PyPDF2.PdfReader(pdf_file)
    text = ' '.join(page.extract_text() for page in reader.pages)
    
    return paper.title, text

In [96]:
paper = get_paper_from_arxiv("2411.12372")

## Set up the DSPy Signature

In [106]:
from typing import Literal

class PaperSummarySignature(dspy.Signature):
    """Generate a summary of the provided research paper at a specified complexity level.
    The summary can be up to four paragraphs long. It should be tailored to the specified complexity level."""
    
    # Input fields
    title: str = dspy.InputField(description="The title of the research paper")
    complexity_level: Literal[1, 2, 3, 4, 5] = dspy.InputField(
        description=(
            "The desired complexity level of the generated summary:\n"
            "1: Elementary - Suitable for general audience with no technical background\n"
            "2: Basic - Suitable for undergraduate students or technical enthusiasts\n"
            "3: Intermediate - Suitable for graduate students or industry practitioners\n"
            "4: Advanced - Suitable for domain experts and researchers\n"
            "5: Expert - Suitable for specialists in this specific research area"
        )
    )
    text: str = dspy.InputField(description="The full text content of a single research paper")
    
    # Output fields
    summary: str = dspy.OutputField(description="A summary of the paper at the specified complexity level")

## Set up the DSPy Program

In [107]:
class Summarize(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(PaperSummarySignature)

    def forward(self, title: str, text: str, complexity_level: int):
        summary = self.summarize(
            title=title,
            text=text,
            complexity_level=complexity_level
        )
        return summary

### Test

In [111]:
program = Summarize()
print(program(title=paper[0], text=paper[1], complexity_level=1))

## Evaluate
We want to evaluate the summaries to determine whether they are at the appropriate complexity level. To do so, we will use another DSPy program that will take a summary and determine whether it is at the correct complexity level.

In [123]:
class EvaluateSummary(dspy.Signature):
    """Given a summary of a research paper, rate its complexity level from 1 to 5.

    The ratings correspond to the following levels of complexity:
        1: Elementary - Suitable for general audience with no technical background
        2: Basic - Suitable for undergraduate students or technical enthusiasts
        3: Intermediate - Suitable for graduate students or industry practitioners
        4: Advanced - Suitable for domain experts and researchers
        5: Expert - Suitable for specialists in this specific research area
    """

    summary: str = dspy.InputField(description="The summary of the research paper")
    complexity_level: Literal[1, 2, 3, 4, 5] = dspy.OutputField(description="The complexity level of the summary")


class Metric(dspy.Module):
    """Given a summary and its complexity level, determine whether the summary is at the correct complexity level."""
    def __init__(self):
        self.eval_complexity = dspy.ChainOfThought(EvaluateSummary)
    
    def forward(self, summary: str, complexity_level: int):
        """
        Return a score between 0 and 5 representing the accuracy of the summary complexity level.
        5 represents a perfect match, and 0 represents a complete mismatch.
        """
        with dspy.context(lm = lm_openai):
            return 5 - abs(self.eval_complexity(summary=summary)["complexity_level"] - complexity_level)


In [124]:
metric_program = Metric()

summary = program(title=paper[0], text=paper[1], complexity_level=1)["summary"]

print(metric_program(summary=summary, complexity_level=1))

In [83]:
paper_ids = ["2411.15138", "2411.15124", "2411.10442", "2411.10958", "2411.12372"]

# Create dataset of examples
dataset = []

for paper_id in paper_ids:
    title, text = get_paper_from_arxiv(paper_id)
    
    # Generate summary for each complexity level (1-5)
    for complexity in range(1, 6):
        result = summary(
            title=title,
            text=text,
            complexity_level=complexity
        )
        
        # Create a DSPy Example object with the input fields
        example = dspy.Example(
            title=title,
            text=text,
            complexity_level=complexity,
            summary=result.summary
        ).with_inputs("title", "text", "complexity_level")  # Specify which fields are inputs
        
        dataset.append(example)

# Using 20 examples for training (4 papers x 5 complexity levels)
# and 5 examples for validation (1 paper x 5 complexity levels)
trainset = dataset[:20]  
valset = dataset[20:]

RateLimitError: litellm.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}