## Configure DSPy environment

In [6]:
# import dspy
import dspy

# For display
from IPython.display import Markdown

In [3]:
# Lets tell dspy that we will use OpenAI GPT 4o mini
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm = lm)

## Exploring some DSPy modules

In [7]:
qa = dspy.Predict('question : str -> response : str')
Markdown(qa(question = 'What are high memory and low memory in linux').response)

In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.

- **Low Memory**: This typically refers to the first 896 MB of memory in a 32-bit system. It is directly accessible by the kernel and can be used for kernel data structures and user processes. The low memory region is where most of the system's memory management occurs, and it is where the kernel can allocate memory for processes without needing special handling.

- **High Memory**: This refers to the memory above the 896 MB threshold in a 32-bit system. This memory is not directly accessible by the kernel in the same way as low memory. Instead, it requires special handling, such as using a mechanism called "high memory management" to access it. High memory is typically used for user processes and can be allocated dynamically, but the kernel must map it into its address space to access it.

In summary, low memory is directly accessible by the kernel, while high memory requires additional management to be accessed by the kernel in a 32-bit Linux environment.

## Inspection

In [10]:
dspy.inspect_history(n=1)





[34m[2024-10-19T13:05:29.277664][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)

Your output fields are:
1. `response` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
What are high memory and low memory in linux

Respond with the corresponding output fields, starting with the field `response`, and then ending with the marker for `completed`.


[31mResponse:[0m

[32m[[ ## response ## ]]
In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.

- **Low Memory**: This typically refers to the first 896 MB of memory in a 32-bit system. It is directly

In [16]:
cot = dspy.ChainOfThought('question -> response')
cot(question = "should curly braces appear on their own line ?")

Prediction(
    reasoning='The placement of curly braces often depends on the coding style guidelines being followed. In many programming languages, such as Java, C#, and JavaScript, it is common to place opening curly braces on the same line as the statement that precedes them, while closing curly braces are often placed on their own line. This style enhances readability and maintains a clear structure. However, some coding standards, like those used in Python or certain functional programming languages, may not use curly braces at all. Ultimately, whether curly braces should appear on their own line is a matter of personal or team preference, and consistency within a codebase is key.',
    response="Curly braces can appear on their own line depending on the coding style guidelines you are following. Many developers prefer placing the opening brace on the same line as the preceding statement for readability, while the closing brace is often placed on its own line. It's important to fo

In [17]:
dspy.inspect_history(n=1)





[34m[2024-10-19T13:14:49.784757][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)

Your output fields are:
1. `reasoning` (str)
2. `response` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
should curly braces appear on their own line ?

Respond with the corresponding output fields, starting with the field `reasoning`, then `response`, and then ending with the marker for `completed`.


[31mResponse:[0m

[32m[[ ## reasoning ## ]]
The placement of curly braces often depends on the coding style guidelines being followed. In many programming languages, such as Java, C#, and JavaScript, it is common to place opening c

## Manipulating Examples ins DSPy

In [19]:
import ujson

# Download 500 question--answer pairs from the RAG-QA Arena "Tech" dataset.
# !wget https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json
# CURL -O https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json

with open('data/ragqa_arena_tech_500.json') as f:
    data = ujson.load(f)

# Inspect one datapoint.
data[0]


{'question': 'how to transfer whatsapp voice message to computer?',
 'response': 'To transfer voice notes from WhatsApp on your device to your computer, you have the option to select the "Share" feature within the app and send the files via Email, Gmail, Bluetooth, or other available services.  \nYou can also move the files onto your phone\'s SD card, connect your phone to your computer via a USB cable, then find and transfer the files via File Explorer on your PC. \nAlternatively, you can choose to attach all the desired voice notes to an email and, from your phone, send them to your own email address.  \nUpon receiving the email on your computer, you can then download the voice note attachments.'}

In [20]:
data = [dspy.Example(**d).with_inputs('question') for d in data]

In [22]:
data[0]

Example({'question': 'how to transfer whatsapp voice message to computer?', 'response': 'To transfer voice notes from WhatsApp on your device to your computer, you have the option to select the "Share" feature within the app and send the files via Email, Gmail, Bluetooth, or other available services.  \nYou can also move the files onto your phone\'s SD card, connect your phone to your computer via a USB cable, then find and transfer the files via File Explorer on your PC. \nAlternatively, you can choose to attach all the desired voice notes to an email and, from your phone, send them to your own email address.  \nUpon receiving the email on your computer, you can then download the voice note attachments.'}) (input_keys={'question'})

In [24]:
example = data[2]
example

Example({'question': 'what are high memory and low memory on linux?', 'response': '"High Memory" refers to the application or user space, the memory that user programs can use and which isn\'t permanently mapped in the kernel\'s space, while "Low Memory" is the kernel\'s space, which the kernel can address directly and is permanently mapped. \nThe user cannot access the Low Memory as it is set aside for the required kernel programs.'}) (input_keys={'question'})

In [25]:
trainset, valset, devset, testset = data[:50], data[50:150], data[150:300], data[300:500]

len(trainset), len(valset), len(devset), len(testset)

(50, 100, 150, 200)

## Evaluation in DSPy

In [26]:
from dspy.evaluate import SemanticF1

# Instantiate the metric
metric = SemanticF1()

# Produce a prediction from our 'cot' module, using the 'example' above as input
pred = cot(**example.inputs())

# Compute the metrics score for prediction
score = metric(example, pred)

In [27]:
print(f"Question: \t {example.question}\n")
print(f"Gold Reponse: \t {example.response}\n")
print(f"Predicted Response: \t {pred.response}\n")
print(f"Semantic F1 Score: {score:.2f}")

Question: 	 what are high memory and low memory on linux?

Gold Reponse: 	 "High Memory" refers to the application or user space, the memory that user programs can use and which isn't permanently mapped in the kernel's space, while "Low Memory" is the kernel's space, which the kernel can address directly and is permanently mapped. 
The user cannot access the Low Memory as it is set aside for the required kernel programs.

Predicted Response: 	 In Linux, "low memory" refers to the memory that is directly accessible by the kernel, typically the first 896 MB on a 32-bit system. "High memory" refers to memory above this limit that cannot be directly accessed by the kernel and requires special handling. This distinction is crucial for efficient memory management, especially in systems with large amounts of RAM.

Semantic F1 Score: 0.71


In [29]:
dspy.inspect_history(n=1)





[34m[2024-10-19T13:43:38.330275][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `ground_truth` (str)
3. `system_response` (str)

Your output fields are:
1. `reasoning` (str)
2. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response
3. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## ground_truth ## ]]
{ground_truth}

[[ ## system_response ## ]]
{system_response}

[[ ## reasoning ## ]]
{reasoning}

[[ ## recall ## ]]
{recall}

[[ ## precision ## ]]
{precision}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Compare a system's response to the ground truth to compute its recall and precision.
        If asked to reason, enumerate key ideas in each response, and whether they are present in the other response.


[

In [30]:
# Define an evaluator that we can re-use
evaluate = dspy.Evaluate(devset=devset,
                         metric=metric,
                         num_threads = 24,
                         display_progress=True,
                         display_table=3)

In [31]:
# Evaluate the Chain-of-Thought program.
evaluate(cot)

Average Metric: 58.3150720833188 / 150  (38.9): 100%|██████████| 150/150 [00:34<00:00,  4.37it/s]  


Unnamed: 0,question,example_response,reasoning,pred_response,SemanticF1
0,why is mercurial considered to be easier than git?,"Mercurial's syntax is considered more familiar, especially for those accustomed to SVN, and is well documented. It focuses on interface aspects, which initially makes learning...","Mercurial is often considered easier than Git for several reasons. Firstly, Mercurial has a simpler command structure and a more straightforward workflow, which can be...","Mercurial is considered easier than Git primarily due to its simpler command structure and more straightforward workflow, which can be less intimidating for beginners. It...",✔️ [0.545]
1,open finder window from current terminal location?,"If you type 'open .' in Terminal, it will open the current directory in a Finder window. Alternatively, you can execute the command open `pwd`...","To open a Finder window from the current terminal location on a Mac, you can use the `open` command followed by a period (`.`) which...",You can open a Finder window from your current terminal location by typing the following command and pressing Enter: ``` open . ```,✔️ [0.667]
2,how to import secret gpg key (copied from one machine to another)?,It is advised that it is necessary to add `--import` to the command line to import the private key and that according to the man...,"To import a secret GPG key that has been copied from one machine to another, you need to ensure that the key is in the...","To import a secret GPG key that you have copied from one machine to another, follow these steps: 1. **Ensure the key file is accessible**:...",


38.88