In [1]:
import dspy 


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
gpt3_turbo = dspy.OpenAI(model='gpt-3.5-turbo-1106', max_tokens=300)
dspy.configure(lm=gpt3_turbo)

## How much data do I need and how do I collect data for my task?

Concretely, you can use DSPy optimizers usefully with as few as 10 example inputs, but having 50-100 examples (or even better, 300-500 examples) goes a long way.

## DSPy Example objects

The core data type for data in DSPy is Example. You will use Examples to represent items in your training set and test set.

DSPy Examples are similar to Python dicts but have a few useful utilities. Your DSPy modules will return values of the type Prediction, which is a special sub-class of Example.

In [3]:
qa_pair = dspy.Example(question="This is a question?", answer="This is an answer.")

print(qa_pair)
print(qa_pair.question)
print(qa_pair.answer)

Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys=None)
This is a question?
This is an answer.


Examples can have any field keys and any value types, though usually values are strings.


In [4]:
object = dspy.Example(field1='aa', field2='bb', field3='cc', field4='dd')


You can now express your training set for example as:

In [5]:
trainset = [dspy.Example(report="LONG REPORT 1", summary="short summary 1"), object]


In [6]:
trainset


[Example({'report': 'LONG REPORT 1', 'summary': 'short summary 1'}) (input_keys=None),
 Example({'field1': 'aa', 'field2': 'bb', 'field3': 'cc', 'field4': 'dd'}) (input_keys=None)]

## Specifying Input Keys

In traditional ML, there are separated "inputs" and "labels".

In DSPy, the Example objects have a with_inputs() method, which can mark specific fields as inputs. (The rest are just metadata or labels.)

In [7]:
# Single Input.
print(qa_pair.with_inputs("question"))

# Multiple Inputs; be careful about marking your labels as inputs unless you mean it.
print(qa_pair.with_inputs("question", "answer"))


Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys={'question'})
Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys={'question', 'answer'})


In [8]:
article_summary = dspy.Example(article="This is an article", summary="This is a summary").with_inputs("article")


In [9]:
input_key_only = article_summary.inputs()
non_input_key_only = article_summary.labels()

In [10]:
print("Example object with Input fields only:", input_key_only)
print("Example opbject with Non-Input fields only:", non_input_key_only)

Example object with Input fields only: Example({'article': 'This is an article'}) (input_keys=None)
Example opbject with Non-Input fields only: Example({'summary': 'This is a summary'}) (input_keys=None)
