<a href="https://colab.research.google.com/github/daisysong76/AI--Machine--learning/blob/main/gsm8k_dspy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **SolveGSM8k**: Solving grade school math problems using DSPy



This notebook builds upon the foundational concepts of the DSPy framework. DSPy overs a novel programming-centric approach to utilizing language and retrieval models. It offers a unique blend of prompting, reasoning, fine-tuning, and tool augmentation, all encapsulated under a minimalistic Python syntax.

We will focus on three parts:


1.   Define a DSPy program and evaluate its performance.
3.   Constrain DSPy program's behavior with runtime DSPy assertions and suggestions.
2.   Optimize the DSPy program with in-context learning and prompt tuning.



In [None]:
# Set up your API key for OpenAI
import os
os.environ["OPENAI_API_KEY"]="Paste your key here"

### Step -1. **Installing Cache and DSPy** (Run the collapsed cells)

The cells are collapsed by default. Running the following cells will set up cache and install all dependencies and DSPy.

The first cell ensures all the following LM calls in this notebook will be using the cached OpenAI's API result. Removing this step might sigificantly increase the running time of all the following DSPy programs depending on your OpenAI account setup.



In [None]:
# This cell sets up the cache (pre-computed OpenAI call results).
!rm -r gsm8k_cache || true
!rm -r dspy || true
!git clone https://github.com/Shangyint/gsm8k_cache.git

import os
repo_clone_path = '/content/gsm8k_cache'

# Check if '/content' is writable
if not os.access('/content', os.W_OK):
    # If '/content' is not writable, choose an alternative directory
    # Example: using a directory relative to the current working directory
    repo_clone_path = os.path.join(os.getcwd(), 'gsm8k_cache')

# Set up the cache for this notebook
os.environ["DSP_NOTEBOOK_CACHEDIR"] = repo_clone_path

Before we start, we install DSPy and all dependencies.

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os
import regex as re
import json

import pkg_resources # Install the package if it's not installed
if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
    !pip install git+https://github.com/stanfordnlp/dspy.git@accenture-course
    !pip install openai~=0.28.1

from rich import print
import dspy

from dspy.evaluate import Evaluate
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

### Step 0. **Getting Started** (Run the collapsed cells)

We'll start by importing our dataset GSM8K, a dataset containing 8.5K high quality linguistically diverse grade school math word problems created by human problem writers. We preshuffled the dataset, and divided it into three smaller sets - train set, dev set (validation set), and test set. For simplicity, we will be using the train set and dev set.

If you would like to inspect the dataset and see how we setup DSPy to use OpenAI gpt3, expand this step.


In [None]:
gms8k = GSM8K()
trainset, devset = gms8k.train, gms8k.dev
len(trainset), len(devset)

Now we can inspect some examples.

In [None]:
math_problem_example = devset[10]

print(f"Question: {math_problem_example.question}\n")
print(f"Gold Reasoning: {math_problem_example.gold_reasoning}\n")
print(f"Answer: {math_problem_example.answer}")

Then we set up the language model (LM). **DSPy** supports multiple API and local models. In this notebook, we'll work with GPT-3.5 (`gpt-3.5-turbo`).

We configure **DSPy** to use the turbo LM (`gpt-3.5-turbo`) by default. This can be overwritten for local parts of programs if needed.

In [None]:
turbo = dspy.OpenAI(model='gpt-3.5-turbo', max_tokens=500)
dspy.settings.configure(lm=turbo)

### Step 1. **First DSPy program**

In **DSPy**, we will maintain a clean separation between **defining your modules in a declarative way** and **calling them in a pipeline to solve the task**.

This allows you to focus on the information flow of your pipeline. **DSPy** will then take your program and automatically optimize **how to prompt** (or finetune) LMs **for your particular pipeline** so it works well.

If you have experience with PyTorch, you can think of DSPy as the PyTorch of the foundation model space. Before we see this in action, let's first understand some key pieces.

##### Using the Language Model: **Signatures** & **Predictors**

Every call to the LM in a **DSPy** program needs to have a **Signature**.

A signature consists of three simple elements:

- A minimal description of the sub-task the LM is supposed to solve.
- A description of one or more input fields (e.g., input question) that we will give to the LM.
- A description of one or more output fields (e.g., the question's answer) that we will expect from the LM.

Let's define a simple signature for basic math problem solving.

In [None]:
class SimpleMathSignature(dspy.Signature):
    """Answer the math question."""

    question = dspy.InputField(desc="A simple math question.")
    answer = dspy.OutputField(desc="The answer to the math question.")

In `SimpleMathSignature`, the docstring describes the sub-task here (i.e., answering math questions). Each `InputField` or `OutputField` can optionally contain a description `desc` too. When it's not given, it's inferred from the field's name (e.g., `question`).

Notice that there isn't anything special about this signature in **DSPy**. We can just as easily define a signature that takes a long snippet from a PDF and outputs structured information, for instance.

One trick for DSPy signature is that when it only contains simple fields performing straightforward tasks, we can replace the whole class definition with a syntactic sugar `question -> answer`. Now, lets define our first DSPy program with DSPy predictor. A predictor is a module that knows how to use the LM to implement a signature. Importantly, predictors can learn to fit their behavior to the task!

In [None]:
basic_math_solver = dspy.Predict("question -> answer") # Alternatively, we can write dspy.Predict(SimpleMathSignature)

**`DSPy.Predict`** is the simplest DSPy predictor. Now, we can call this minimal _program_ with a hand crafted question:

In [None]:
prediction = basic_math_solver(question="What is 1+1+1?")

print(f"Answer: {prediction.answer}")

In the example above, we asked the predictor a simple math question "What is 1+1+1?". The model outputs an answer ("3").

For visibility, we can inspect how this extremely basic predictor implemented our signature. Let's inspect the history of our LM (**turbo**).

In [None]:
_ = turbo.inspect_history(n=1)

Great. Now let's define the actual program. This is a class that inherits from `dspy.Module`.

It needs two methods:

- The `__init__` method will simply declare the sub-modules it needs: This time, we will be using a fancier predictor that implementes Chain-of-Thought prompting  `dspy.ChainOfThought`. `dspy.ChainOfThought` will add another field called "rationale" as output to help the model think step-by-step.
- The `forward` method will describe the control flow of answering the question using the modules we have (here, we just have one).

In [None]:
class SimpleMathSolver(dspy.Module):
    def __init__(self):
        self.prog = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        pred = self.prog(question=question)
        return pred

simple_math_solver = SimpleMathSolver()

#### **Exercise**
Create your own math problem, use the math solver we just defined. Then, inspect the trace of the LM with `turbo.inspect_history` to see what has changed compared to the `dspy.Predict` predictor.

In [None]:
### Fill this code cell

We can now evaluate our simple math solver on the validation set.

For a start, let's evaluate the accuracy of the predicted answer. We provide a simple metric function called `gsm8k_metric`, which essentially extract the numerical answer from the model input.

In [None]:
evaluate = Evaluate(devset=devset[:], metric=gsm8k_metric, num_threads=16, display_progress=True, display_table=5)

evaluate(simple_math_solver)

### Step 2. **Adding constraints with DSPy Assertions**

We have **61.67%** on our validation set, not bad! But we also noticed two things: 1). many answers are sentences rather than the numerical result we want. Although we are able to parse most of the answers within `gsm8k_metric`, generating irrevalent tokens as answers might negatively affect the overall accuracy; 2). some of the reasoning might not contain the desired computational steps as in the example below.

In [None]:
simple_math_solver(devset[0].question)
_ = turbo.inspect_history()

Forturnately, in DSPy, we can utilize a simple yet powerful construct called **LM Assertions** to constrain the output of LMs. For example, here, we can say:

```python
dspy.Suggest(len(pred.answer) < 10, "Your Answer should be a number.")
```

This suggestion tells the DSPy runtime that we expect the answer of our math solver to be short, and if the LM failed to yield such an answer, we instruct the LM that "Your Answer should be a number."

LM assertions in DSPy could either be a hard constraint `Assert` or a soft constraint `Suggest`. LM assertions accept two argument, one is the predicate to be tested, similar to that of traditional assertions; then, we also require an additional "error message" to guide the language model to refine itself when failing.



#### Math Solver with Suggestions
We can encode the two observations we have into two suggestions, and add them to the `SimpleMathSolver`:

In [None]:
def extract_number(question):
    numbers = [int(s) for s in question.split() if s.isdigit()]
    return numbers

def has_numbers(rationale, numbers):
    for number in numbers:
        if str(number) not in rationale:
            return False, number
    return True, None

class SimpleMathSolverWithSuggest(dspy.Module):
    def __init__(self):
        self.prog = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        pred = self.prog(question=question)
        rationale_has_numbers, missing_number = has_numbers(pred.rationale, extract_number(question))
        dspy.Suggest(rationale_has_numbers, f"Your Reasoning should contain {missing_number}.")
        dspy.Suggest(len(pred.answer) < 10, "Your Answer should be a number.")
        return pred

simple_math_solver_suggest = SimpleMathSolverWithSuggest().activate_assertions()

Now we can rerun our math solver on the first question, and see how LM assertions in DSPy internally fix these errors.

In [None]:
simple_math_solver_suggest(devset[0].question)
_ = turbo.inspect_history(n=3)

Finally, let's evaluate the performance of `simple_math_solver_suggest`:

In [None]:
evaluate(simple_math_solver_suggest)

### Step 3. **Compiling DSPy programs with optimizers**

Another cool thing to do with DSPy is optimizers!

A DSPy optimizer is an algorithm that can tune the parameters of a DSPy program (i.e., the prompts and/or the LM weights) to maximize the metrics you specify, like accuracy.

There are many built-in optimizers in DSPy, which apply vastly different strategies. A typical DSPy optimizer takes three things:

1. Your **DSPy program**. This may be a single module (e.g., dspy.Predict) or a complex multi-module program.

2. Your **metric**. This is a function that evaluates the output of your program, and assigns it a score (higher is better).

3. A few **training inputs**. This may be very small (i.e., only 5 or 10 examples) and incomplete (only inputs to your program, without any labels).

If you happen to have a lot of data, DSPy can leverage that. But you can start small and get strong results.


In this turtorial, we demonstrate one DSPy optimizer called `BootstrapFewShotWithRandomSearch`, which bootstraps demonstrations from the training set and search for the best combination of demonstrations. Two things to note here:
1. Most optimizers work with LM assertions.
2. This step is time/compute intensive. Therefore we cached the API calls. The good thing is, once you optmized the program, you can save the compiled DSPy program and reuse it later!

In [None]:
optimizer = BootstrapFewShotWithRandomSearch(gsm8k_metric, max_bootstrapped_demos=3, max_labeled_demos=6, num_candidate_programs=6)

compiled_prog = optimizer.compile(student=simple_math_solver, trainset=trainset[:], valset=devset[:100])
compiled_prog_suggest = optimizer.compile(student=simple_math_solver_suggest, trainset=trainset[:], valset=devset[:100])


In [None]:
# Evaluating compiled program
evaluate(compiled_prog)

In [None]:
# Evaluating compiled program with suggestions
evaluate(compiled_prog_suggest)

Now we inspect on our previous example, and see how the optmizer tunes the prompt:

In [None]:
compiled_prog(devset[0].question)
_ = turbo.inspect_history()

### More DSPy turtorials

1. [Intro to DSPy](https://github.com/stanfordnlp/dspy/blob/main/intro.ipynb)
2. [DSPy Assertions](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/examples/longformqa/longformqa_assertions.ipynb)
3. [Quiz Generation](https://github.com/stanfordnlp/dspy/blob/main/examples/quiz/quiz_assertions.ipynb)
4. ... more on [DSPy github](https://github.com/stanfordnlp/dspy)

#### Contact: Shangyin Tan (shangyin@berkeley.edu)