In [1]:
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display

In [None]:
# Load the openai model from the github models
load_dotenv(override=True)
token = os.getenv("GITHUB_TOKEN")
endpoint = "https://models.inference.ai.azure.com"

openai_client = OpenAI(
    base_url=endpoint,
    api_key=token,
)

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [3]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

In [4]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [5]:
# GPT-4o-mini
# Temperature setting controls creativity

completion = openai_client.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist break up with the statistician?

Because she found him to be too mean!


In [6]:
# GPT-4o

completion = openai_client.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.4
)
print(completion.choices[0].message.content)

Sure! Here's one for the data-savvy crowd:

Why did the data scientist break up with the graph?

**It just didn’t have enough points!**


In [8]:
# GPT-o1

completion = openai_client.chat.completions.create(
    model='o1',
    messages=prompts,
)
print(completion.choices[0].message.content)

Here's one:

"Why did the data scientist break up with the statistician?  
Because they found a new model with a better fit!"


In [10]:
# GPT o3-mini

completion = openai_client.chat.completions.create(
    model='o3-mini',
    messages=prompts,
)
print(completion.choices[0].message.content)

How many data scientists does it take to change a light bulb? 

Just one—but not before running 10,000 simulations to make sure the change is statistically significant!


------

In [11]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in Markdown."}
  ]

In [12]:
# Have it stream back results in markdown

stream = openai_client.chat.completions.create(
    model='gpt-4o',
    messages=prompts,
    temperature=0.7,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    if chunk.choices:  # Check if choices is not empty
        reply += chunk.choices[0].delta.content or ''
        reply = reply.replace("```", "").replace("markdown", "")
        update_display(Markdown(reply), display_id=display_handle.display_id)

Deciding whether a business problem is suitable for a Large Language Model (LLM) solution involves a systematic evaluation of the problem's characteristics, constraints, and requirements. Here's a step-by-step guideline to help you determine suitability:

---

### **1. Define the Business Problem**
- Is the problem centered on language, text, or conversational data?
  - Examples: text generation, summarization, classification, translation, answering questions, etc.
- Is the problem well-defined with clear objectives and measurable success criteria?

---

### **2. Assess the Nature of the Task**
- **Suitable tasks for LLMs:**
  - Text-based tasks (e.g., summarization, content generation, sentiment analysis).
  - Conversational AI (e.g., customer support chatbots, virtual assistants).
  - Information retrieval and Q&A (e.g., FAQ answering, document search).
  - Language translation or transcription.
  - Code generation or debugging.
- **Unsuitable tasks for LLMs:**
  - Highly numerical or computation-heavy tasks.
  - Real-time decision-making with strict latency limits.
  - Tasks requiring highly domain-specific or niche knowledge without sufficient data to fine-tune.

---

### **3. Evaluate Data Availability**
- Do you have access to sufficient, high-quality text data for the problem domain?
- Is the data labeled or structured for supervised fine-tuning (if needed)?
- Is the data sensitive, confidential, or regulated by compliance requirements (e.g., GDPR, HIPAA)?

---

### **4. Consider LLM Capabilities**
- Does the problem require:
  - Understanding and generating natural language?
  - Context retention over multiple interactions (e.g., chatbots)?
  - Knowledge of general or domain-specific topics?
- Is the problem within the scope of the LLM's pre-trained knowledge, or would fine-tuning be required?

---

### **5. Evaluate Business Constraints**
- **Cost:** Does your budget support the computational and licensing costs of using an LLM?
- **Latency:** Can the business tolerate delays in model inference, or does it require real-time responses?
- **Scalability:** Can the LLM solution scale with your business needs?
- **Ethics & Compliance:** Does the LLM align with ethical guidelines and regulatory requirements?

---

### **6. Assess Alternatives**
- Are there simpler solutions (e.g., rule-based systems, traditional machine learning models) that could solve the problem effectively?
- Is an LLM overkill for the task complexity?

---

### **7. Test Feasibility**
- Run a small-scale Proof of Concept (PoC) using an LLM to validate its effectiveness for the task.
- Measure performance against Key Performance Indicators (KPIs) such as accuracy, relevance, or user satisfaction.

---

### **8. Monitor and Maintain**
- Are you prepared for ongoing monitoring, retraining, and maintenance of the LLM solution?
- Do you have processes in place to handle model drift, biases, or unexpected outputs?

---

### **Decision Checklist**
| Question                                    | Yes | No  |
|---------------------------------------------|-----|-----|
| Is the problem primarily language-based?    | ✅  | ❌  |
| Does the problem align with LLM capabilities?| ✅  | ❌  |
| Do you have sufficient high-quality data?   | ✅  | ❌  |
| Are business constraints (cost, latency, etc.) manageable? | ✅  | ❌  |
| Are simpler alternatives insufficient?      | ✅  | ❌  |

- If most answers are "Yes," the problem is likely suitable for an LLM solution.
- If many answers are "No," consider alternative approaches.

---

By systematically evaluating these factors, you can make an informed decision on whether an LLM solution is the right fit for your business problem.

In [13]:
gpt_model = "gpt-4o-mini"
gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [14]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    completion = openai_client.chat.completions.create(
        model=gpt_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [15]:
call_gpt()

'Oh great, another greeting. What’s next, a boring small talk?'