mine-tune an AI program to respond using your "brand voice".

In [None]:
%pip install dspy
%pip install ipywidgets
%pip install IPython
%pip install requests
%pip install firecrawl-py
%pip install beautifulsoup4
%pip install markdownify



In [7]:
import dspy
import os
import dotenv
import pydantic

dotenv.load_dotenv()
assert 'OPENAI_API_KEY' in os.environ
llm = dspy.OpenAI(model='gpt-4o', max_tokens=4096, temperature=0.1)
dspy.settings.configure(lm=llm)

set a URL directly or via input prompt

In [8]:
url = 'https://polyspectra.com'
#url = input("Enter a URL: ") 

preferred method:  use firecrawl to get content more reliably. sign up at https://www.firecrawl.dev/ and put your API key in .env or directly below

In [None]:
from firecrawl import FirecrawlApp

# Initialize the FirecrawlApp with your API key
#app = FirecrawlApp(api_key='your_api_key')
FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY")
app = FirecrawlApp(api_key=FIRECRAWL_API_KEY)

# Scrape a single URL

scraped_data = app.scrape_url(url)
display(scraped_data)


fallback to beautifulsoup if you don't want to sign up for firecrawl. many websites (like mine) block this type of scraping.

In [9]:
from bs4 import BeautifulSoup
import markdownify
import ipywidgets as widgets
from IPython.display import display
import requests

# Make a request to the URL
response = requests.get(url)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Convert the HTML content to Markdown
markdown_content = markdownify.markdownify(str(soup), heading_style="ATX")

display(markdown_content)




'\nJust a moment...Enable JavaScript and cookies to continue'

choose what we are going to use as context (firecrawl or beautifulsoup)

In [None]:
from IPython.display import display, Markdown
context = scraped_data['markdown']
#context = markdown_content # or use beautifulsoup to convert to markdown
display(Markdown(context))



now we need a DSPy signature & module, and test it uncompiled (zero shot)

In [111]:
class BrandVoice(dspy.Signature):
    """Consider the context and always respond to the user's query using the brand voice."""

    context = dspy.InputField(desc="This is a website of the brand that you are representing. Carefully analyze this to make sure your responses are branded and professional.")
    query = dspy.InputField()
    response = dspy.OutputField()


class BrandAmbassador(dspy.Module):  # let's define a new module
    def __init__(self):
        super().__init__()
        self.n = 1
        self.signature = BrandVoice
        self.predictor = dspy.ChainOfThought(self.signature, n=self.n) #this can easily be subbed for something else
    
    def forward(self, query, context, num_generations = 1):
        self.predictor = dspy.ChainOfThought(self.signature, n=num_generations)
        result = self.predictor(query=query, context=context, n=num_generations)
        return dspy.Prediction(response=[str(completion.response) for completion in result.completions])


llm.temperature = 0.1
dspy.settings.configure(lm=llm)
uncompiled = BrandAmbassador()
exec_summary = uncompiled("write an executive summary of polySpectra", context)
display(exec_summary.response)
#print(len(exec_summary.response))


['polySpectra Executive Summary\n\npolySpectra is at the forefront of industrial additive manufacturing, specializing in the development and production of the world\'s most rugged photopolymer resins. Our flagship products, including COR Alpha, COR Black, and COR Bio, are engineered to deliver unparalleled strength, durability, and impact resistance, making them the go-to choice for innovative engineers seeking reliable 3D printing solutions.\n\nOur proprietary Cyclic Olefin Resin (COR) technology sets a new standard in the industry, offering best-in-class thermomechanical performance, chemical resistance, and long-term durability. These resins are compatible with a wide array of industrial and desktop DLP & LCD resin 3D printers, ensuring that our customers can seamlessly integrate our materials into their existing workflows.\n\nAt polySpectra, we are committed to helping engineers "Make It Real" by providing comprehensive support throughout the 3D printing process. From printing and 

this is pretty impressive already. but now what if we want to mine-tune it further?

as a first example, we raise the temperature and generate 5 responses, which we will rate in the next cell

In [110]:
llm.temperature = 0.8
dspy.settings.configure(lm=llm)
test5_query = "write a tweet about COR Black being used in automotive manufacturing"
test5_response = uncompiled(test5_query, context, num_generations=5)
display(test5_response.response)



['🚗 Say goodbye to brittle components! COR Black by polySpectra is revolutionizing automotive manufacturing with its unmatched impact resistance and durability. Trust in the strength of our rugged photopolymer resins to keep your innovations on the road. #MakeItReal #3DPrinting #AutoInnovation 🔧✨',
 "🚗🔧 Exciting news! #CORBlack is revolutionizing #automotive manufacturing with its unparalleled impact resistance and durability. Say goodbye to brittle parts and hello to robust, end-use components you can trust. #MakeItReal with polySpectra's toughest resin. 🌟\n\n#3DPrinting #AdditiveManufacturing #Innovation #polySpectra",
 '🚗💪 Exciting news from the automotive world! COR Black by #polySpectra is revolutionizing manufacturing with its unparalleled impact resistance and durability. Perfect for the demanding underhood environments, COR Black ensures automotive components that last. Make it real with polySpectra! #3DPrinting #AutomotiveInnovation #MakeItReal',
 '🚗🔧 Revolutionizing automotiv

run the next cell to rate each of the entries, and don't forget to provide feedback. this is the annoying "upfront attention" of mine-tuning.

In [57]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Assuming test5.response contains the generated responses
responses = test5.response

# Initialize storage for ratings and feedback
ratings_feedback = []

# Variable to store the current index
current_index = 0

# Variable to store the selected rating
selected_rating = None

# Create buttons for ratings 1 to 5
buttons = [widgets.Button(description=str(i)) for i in range(1, 6)]

# Arrange the buttons horizontally and center them
button_box = widgets.HBox(buttons, layout=widgets.Layout(justify_content='center'))

# Create a Textarea widget for feedback
feedback_area = widgets.Textarea(
    value='',
    placeholder='Type your feedback here',
    description='Feedback:',
    disabled=False,
    layout=widgets.Layout(width='100%', height='100px')
)

# Create a submit button
submit_button = widgets.Button(description="Submit")

# Center the submit button
submit_button_box = widgets.HBox([submit_button], layout=widgets.Layout(justify_content='center'))

# Function to display the current response and feedback widgets
def display_current_response():
    global current_index
    clear_output()
    
    # Display the current response
    display(widgets.HTML(f"<b>Response {current_index + 1}:</b> {responses[current_index]}"))
    
    # Display the rating buttons
    display(button_box)
    
    # Display the feedback area
    display(feedback_area)
    
    # Display the submit button
    display(submit_button_box)

# Function to handle submit button click
def on_submit_click(b):
    global current_index, selected_rating
    
    # Get the feedback text
    feedback_text = feedback_area.value
    
    # Save the rating and feedback
    ratings_feedback.append({
        'query': test5_query,
        'response': responses[current_index],
        'rating': selected_rating,
        'feedback': feedback_text
    })
    
    # Move to the next response
    current_index += 1
    
    # Reset the selected rating and feedback area
    selected_rating = None
    feedback_area.value = ''
    
    # Reset button styles
    for button in buttons:
        button.style.button_color = None
    
    # Check if there are more responses to rate
    if current_index < len(responses):
        display_current_response()
    else:
        clear_output()
        display(widgets.HTML("<b>All responses have been rated. Thank you!</b>"))
        display(ratings_feedback)

# Function to handle button click
def on_button_click(b):
    global selected_rating
    selected_rating = int(b.description)
    
    # Reset button styles
    for button in buttons:
        button.style.button_color = None
    
    # Highlight the selected button
    b.style.button_color = 'lightblue'
    print(f"Selected rating: {selected_rating}")

# Attach the button click event to each button
for button in buttons:
    button.on_click(on_button_click)

# Attach the submit button click event
submit_button.on_click(on_submit_click)

# Display the first response
display_current_response()

HTML(value='<b>All responses have been rated. Thank you!</b>')

[{'query': 'write a tweet about COR Black being used in automotive manufacturing',
  'response': "This is the response to the user's query. Always use the brand voice.\n\n---\n\nCOR Black is revolutionizing automotive manufacturing with its unparalleled impact resistance and durability. 🚗🔧 Engineers now trust polySpectra's rugged photopolymer resins for parts that endure the toughest conditions. #MakeItReal #3DPrinting #Innovation #Automotive\n\n---",
  'rating': 1,
  'feedback': 'only respond with the tweet. too many hashtags. this is bad.'},
 {'query': 'write a tweet about COR Black being used in automotive manufacturing',
  'response': "🚗💪 Elevate your automotive manufacturing game with #CORBlack! Engineered for superior impact resistance and durability, COR Black is the ultimate resin for high-performance, end-use automotive parts. #MakeItReal with polySpectra's rugged photopolymer resins. #3DPrinting #AutomotiveInnovation #AdditiveManufacturing",
  'rating': 3,
  'feedback': 'this

you might want to save this somewhere, since you took the time to provide all of this thoughtful feedback.

In [116]:
save = ratings_feedback
display(save)
import json

with open('demo_ratings_feedback.json', 'w') as f:
    json.dump(save, f)


[{'query': 'write a tweet about COR Black being used in automotive manufacturing',
  'response': "This is the response to the user's query. Always use the brand voice.\n\n---\n\nCOR Black is revolutionizing automotive manufacturing with its unparalleled impact resistance and durability. 🚗🔧 Engineers now trust polySpectra's rugged photopolymer resins for parts that endure the toughest conditions. #MakeItReal #3DPrinting #Innovation #Automotive\n\n---",
  'rating': '1',
  'feedback': 'only respond with the tweet. too many hashtags. this is bad.'},
 {'query': 'write a tweet about COR Black being used in automotive manufacturing',
  'response': "🚗💪 Elevate your automotive manufacturing game with #CORBlack! Engineered for superior impact resistance and durability, COR Black is the ultimate resin for high-performance, end-use automotive parts. #MakeItReal with polySpectra's rugged photopolymer resins. #3DPrinting #AutomotiveInnovation #AdditiveManufacturing",
  'rating': '3',
  'feedback': '

now we need to convert our feedback into a DSPy.example

In [None]:
# Append context to each item in ratings_feedback
for item in ratings_feedback:
    item['context'] = context
    item['rating'] = str(item['rating'])

#convert to a dspy.example
from dspy import Example
ratings_feedback_examples = [
    Example(base=item).with_inputs('query','context') for item in ratings_feedback
]
#display(ratings_feedback_examples)

let's try the bootstrapfewshot optimizer

In [118]:
from dspy.teleprompt import BootstrapFewShot

# def score(prediction, example, trace=None):
#     score = str(float(example['rating']) / 5)
#     print(score)
#     return score

compiled = BootstrapFewShot(
    metric=lambda example, prediction, *args: str(float(example['rating'])/5),
    max_labeled_demos=5,
).compile(
    student=BrandAmbassador(),
    trainset=ratings_feedback_examples,
)

compiled.save('demo_compiled_first_try.json')

 80%|████████  | 4/5 [00:00<00:00, 430.06it/s]

Bootstrapped 4 full traces after 5 examples in round 0.





test the bootstrapfewshot

In [120]:
test5_now = compiled.forward(str(test5_query + "_"), context, num_generations=5)
display(test5_now.response)

['🚗💥 COR Black is revolutionizing automotive manufacturing! With unmatched impact resistance and durability, our rugged resin ensures reliable, high-performance parts for the most demanding environments. #MakeItReal #3DPrinting #AutomotiveInnovation #polySpectra',
 "🚗💥 Drive innovation with polySpectra's COR Black! Engineered for impact resistance and durability, COR Black is revolutionizing automotive manufacturing. Make it real with the most rugged photopolymer resin on the market. #3DPrinting #Automotive #Innovation #MakeItReal #polySpectra",
 '🚗🔧 COR Black is revolutionizing automotive manufacturing! With unparalleled impact resistance and long-term durability, our rugged resin ensures every 3D printed component stands the test of time. #MakeItReal #3DPrinting #AutomotiveInnovation #polySpectra',
 '🚗🔧 Excited to see COR Black revolutionizing automotive manufacturing! With unmatched impact resistance and durability, our rugged resin ensures every component stands up to the toughest 

these responses are ok...but not obviously better. let's inspect the history

In [None]:
llm.inspect_history(n=1)

it looks like the prompt is getting flooded with the long context, what if we try replacing the context with a combination of:
- a summary of the feedback
- the executive summary

note that both of these summaries are "in the LM's own words". this is important, don't judge the feedback summary directly, judge the results.

In [130]:
ratings_feedback_str = " ".join([str(item) for item in ratings_feedback])
#print(ratings_feedback_str)

feedback_summary = uncompiled("here is feedback on your past work as our brand ambassador. reply with your key takeaways and learnings. how will you make sure not to make the same mistakes again? you should be striving for a 5 star rating.", ratings_feedback_str)
display(feedback_summary.response)

new_context = str("Here are your key takeways and notes on what a good response looks like:\n" + str(feedback_summary.response[0]) + "\n and here is an executive summary:\n" + exec_summary.response[0])

#display(new_context)


['Thank you for the feedback on my past work as your brand ambassador. Here are my key takeaways and learnings:\n\n1. **Acknowledge Feedback**: I understand that the use of too many hashtags and generic terms like "innovation" were not well-received. Additionally, phrases like "exciting news!" and "dive into the future" were considered cheesy.\n\n2. **Key Learnings**: \n   - **Hashtag Usage**: I need to limit the use of hashtags to only the most relevant ones, specifically #MakeItReal.\n   - **Language Tone**: Avoid using overly generic terms and cheesy phrases. Instead, focus on clear, professional, and impactful language that aligns with polySpectra\'s brand voice.\n\n3. **Improvements**:\n   - **Hashtag Management**: I will ensure that only #MakeItReal is used in future tweets unless otherwise specified.\n   - **Refined Language**: I will avoid generic terms and cheesy phrases, opting for more precise and professional language that highlights the unique strengths of polySpectra\'s p

overwrite the context and make a new example set 

In [131]:
# overwrite the context to each item in ratings_feedback
for item in ratings_feedback:
    item['context'] = new_context
    item['rating'] = str(item['rating'])

from dspy import Example
ratings_feedback_examples_new_context = [
    Example(base=item).with_inputs('query','context') for item in ratings_feedback
]
#display(ratings_feedback_examples_new_context)

recompile

In [132]:
recompiled = BootstrapFewShot(
    metric=lambda example, prediction, *args: str(float(example['rating'])/5),
    max_labeled_demos=5,
).compile(
    student=BrandAmbassador(),
    trainset=ratings_feedback_examples_new_context,
)

#maybe its even worth saving...
recompiled.save("demo_recompiled_polySpectra_brand_ambassador.json")


 80%|████████  | 4/5 [00:02<00:00,  1.44it/s]

Bootstrapped 4 full traces after 5 examples in round 0.





In [133]:
test5_again = recompiled.forward(str(test5_query + "_"), context, num_generations=5) #i'm using this extra character to break the dspy cache. change it if the responses are cached.
display(test5_again.response)

['🚗💥 COR Black is revolutionizing automotive manufacturing! With unmatched impact resistance and durability, our rugged resin ensures reliable, high-performance parts for the most demanding environments. #MakeItReal #3DPrinting #AutomotiveInnovation #polySpectra',
 "🚗💥 Drive innovation with polySpectra's COR Black! Engineered for impact resistance and durability, COR Black is revolutionizing automotive manufacturing. Make it real with the most rugged photopolymer resin on the market. #3DPrinting #Automotive #Innovation #MakeItReal #polySpectra",
 '🚗🔧 COR Black is revolutionizing automotive manufacturing! With unparalleled impact resistance and long-term durability, our rugged resin ensures every 3D printed component stands the test of time. #MakeItReal #3DPrinting #AutomotiveInnovation #polySpectra',
 '🚗🔧 Excited to see COR Black revolutionizing automotive manufacturing! With unmatched impact resistance and durability, our rugged resin ensures every component stands up to the toughest 

this looks better. give it a try for your brand!

some key takeaways: 
- yes, it is tedious to give AI feedback
- it is very easy to make things worse with "naive optimization"
- more is not always more
- one of the key themes of mine-tuning (which is enabled by DSPy) is to get the LM to rephrase things "in its own words", while at the same time putting the human in charge of the feedback and the scoring metric. (vs letting the LM evaluate itself)

next steps: 
- more examples
- override the max tokens in bootstrapfewshot to stuff more shots into the prompt
- fancier optimizers
- fancier Predictors (not just CoT)
- a module with multiple steps

questions:
- best practices for incorporating qualitative feedback in DSPy?
  
