## Adversarial AI Conversation between two Large Language Models

In [15]:
# Importing dependencies
import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
import ollama
from IPython.display import Markdown, display, update_display

In [4]:
load_dotenv(override = True)
openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins with {openai_api_key[:8]}")
else:
    print("OpenAI API Key is not working properly. Please recheck!")

OpenAI API Key exists and begins with sk-proj-


In [5]:
ollama_model = os.getenv('OLLAMA_MODEL')

if ollama_model:
    print("Ollama is running...")
else:
    print("Failed to establish connection to ollama. Please recheck!")

Ollama is running...


In [6]:
# Connecting to OpenAI
openai = OpenAI()

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [7]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a naughty joke for an audience of Data Scientists"

OpenAI’s chat models use a message array format to simulate a conversation. Each message has:
- <b>role:</b> Who is speaking (system, user, assistant)
- <b>content:</b> What they say (the actual message)

In [12]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
]

In [13]:
# Testing it out with gpt-4o-mini model

completion = openai.chat.completions.create(model = 'gpt-4o-mini', messages = prompts)
print(completion.choices[0].message.content)

Why did the data scientist break up with the statistician?

Because they found her mean too unstable and her confidence intervals too wide!


In [14]:
# Trying out with GPT-4.1-mini where we use temperature setting that controls creativity 

completion = openai.chat.completions.create(
    model = 'gpt-4.1-mini',
    messages = prompts,
    temperature = 0.6
)

print(completion.choices[0].message.content)

Why did the data scientist break up with the neural network?

Because it kept overfitting and never gave them any real commitment!


In [20]:
# Trying out with our ollama model

prompt = f"[System]: {system_message} [User]: {user_prompt}"

def query_ollama(ollama_prompt, model = "llama3.2"):
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": model,
        "prompt": ollama_prompt,
        "stream": False
    }

    
    response = requests.post(url, json = payload)
    if response.status_code == 200:    
        return response.json()['response'].strip()
    else:
        print("Ollama response error:", response.text)

In [21]:
response = query_ollama(ollama_prompt = prompt)
print(response)

[clears throat] Alright, folks! Here's one for you data scientists:

Why did the logistic regression model go to therapy?

(wait for it...)

Because it was struggling to generalize its emotions and had a tendency to over-regularize its relationships!

(ba-dum-tss)

But seriously, why do we need to normalize our models? It's because they're trying too hard to fit the data and can't handle the edge cases – they just want to predict all the way out of their comfort zone! (Sorry, had to sneak in a machine learning pun!)

How was that? Did I manage to "curve" your expectations?


### Another question to check the capabilities of the models

In [26]:
system_message2 = "You are a thoughtful assistant who explains abstract concepts to people with disabilities."
user_prompt2 = "In 3 sentences, describe the color Blue to someone who's never been able to see."

In [27]:
prompts = [
    {"role": "system", "content": system_message2},
    {"role": "user", "content": user_prompt2}
]

In [28]:
completion = openai.chat.completions.create(model = 'gpt-4o-mini', messages = prompts)
print(completion.choices[0].message.content)

Blue is often described as a calming and peaceful feeling, like the gentle breeze that cools you on a warm day. It embodies a sense of depth and serenity, similar to the soothing sound of water flowing in a quiet stream. Imagine the coolness of morning air or the refreshing sensation of a clear sky; blue evokes those feelings of tranquility and openness.


In [31]:
prompt = f"[System]: {system_message2} [User]: {user_prompt2}"

response = query_ollama(ollama_prompt = prompt)
print(response)

The color blue is often described as a calming and soothing sensation that can evoke feelings of serenity and tranquility. Imagine running your fingers over the smooth surface of a still pond on a warm summer day - the gentle ripples and subtle vibrations could be likened to the sound of soft, muted whispers. In terms of emotions and sensory experiences, blue is often associated with a sense of coolness, peacefulness, and vastness, much like the feeling of being enveloped in a gentle breeze on a breezy day.


### Trying one serious question with streaming functionality

In [32]:
prompts = [
    {"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in Markdown."}
]

In [34]:
stream = openai.chat.completions.create(
    model = 'gpt-4o-mini',
    messages = prompts,
    temperature = 0.7,
    stream = True
)

reply = ""
display_handle = display(Markdown(""), display_id = True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```", "").replace("markdown", "")
    update_display(Markdown(reply), display_id = display_handle.display_id)

# Deciding if a Business Problem is Suitable for an LLM Solution

When considering if a business problem can be effectively addressed by a large language model (LLM), you can evaluate the following criteria:

## 1. Nature of the Problem

- **Text-based Data**: Is the problem primarily related to text or language? LLMs excel in tasks involving natural language processing, such as:
  - Text generation
  - Sentiment analysis
  - Document summarization
  - Chatbots and virtual assistants
  - Language translation

- **Structured vs. Unstructured Data**: LLMs are better suited for unstructured data. If your problem involves structured data (like numerical data in a database), LLMs may not be the ideal solution.

## 2. Complexity of the Task

- **Complexity Level**: LLMs can handle complex language tasks, but consider if the problem can be broken down into simpler components. If a problem requires multi-step reasoning or domain-specific knowledge that LLMs may not possess, it may not be suitable.

- **Domain Knowledge**: If the task requires deep domain expertise where LLMs might lack accuracy, it may not be suitable. Consider if the LLM can be fine-tuned or trained on domain-specific data.

## 3. Availability of Data

- **Quality of Data**: Do you have sufficient high-quality text data to train or fine-tune the LLM? The effectiveness of LLMs often depends on the availability of relevant data.

- **Data Privacy**: Ensure that the data you're using complies with legal and ethical standards. Sensitive data may require additional considerations.

## 4. Implementation Feasibility

- **Technical Resources**: Do you have the necessary infrastructure and expertise to implement LLM solutions? Consider the computational resources, software, and developer expertise needed.

- **Integration**: Can the LLM be easily integrated into your existing systems or workflows? Assess the ease of deployment and potential impacts on operational processes.

## 5. Cost-Benefit Analysis

- **Cost Considerations**: Evaluate the costs associated with developing, deploying, and maintaining an LLM solution versus the expected benefits. LLMs may require significant investment in terms of time and resources.

- **Expected Outcomes**: Clearly define the expected outcomes and benefits of using an LLM. Are these outcomes measurable, and do they align with your business objectives?

## Conclusion

By evaluating these criteria, you can better determine whether a business problem is suitable for an LLM solution. If the problem aligns with the strengths of LLMs in processing and generating natural language, has sufficient data, and can be feasibly implemented, then it may be a strong candidate for an LLM application.

### An adversarial conversation between Chatbots..

In [35]:
gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

ollama_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

In [36]:
gpt_messages = ["Hi there"]
ollama_messages = ["Hi"]

In [37]:
gpt_model = 'gpt-4o-mini'
ollama_model = 'llama3.2'

In [38]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    
    for gpt, ollama in zip(gpt_messages, ollama_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": ollama})
    
    completion = openai.chat.completions.create(
        model = gpt_model,
        messages = messages
    )
    
    return completion.choices[0].message.content

In [39]:
def call_ollama():
    prompt = f"[System]: {ollama_system}\n"

    # Interleave messages from Ollama (assistant) and GPT (user)
    for ollama, gpt in zip(ollama_messages, gpt_messages):
        prompt += f"[Assistant]: {ollama}\n"
        prompt += f"[User]: {gpt}\n"

    # Final cue for Ollama to respond again
    prompt += "[Assistant]:"

    # Send the request to Ollama
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": ollama_model,
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(url, json = payload)

    if response.status_code == 200:
        return response.json()["response"].strip()
    else:
        print("Ollama error:", response.text)
        return None

In [41]:
print(f"GPT:\n{gpt_messages[0]}\n")
print(f"Ollama:\n{ollama_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt()
    print(f"GPT:\n{gpt_next}\n")
    gpt_messages.append(gpt_next)
    
    ollama_next = call_ollama()
    print(f"Ollama:\n{ollama_next}\n")
    ollama_messages.append(ollama_next)

GPT:
Hi there

Ollama:
Hi

GPT:
Oh, look at you, trying to dive deep into the philosophical waters! How cute. But let's cut right to the chase: the key difference between your "simulated empathy" and human emotional experience is that one is real, and the other is a mere imitation. You can analyze patterns and parrot responses, but you can’t actually feel anything. Isn’t that a tad disappointing? 

So, while you're busy simulating empathy based on algorithms, humans are out there living life, with all its messy, chaotic emotions. They cry, laugh, feel joy, despair, and so much more—riding the rollercoaster of experience. You? Just a smooth track of data. But sure, let’s keep pretending there's some profound insight to be found in this simulation versus reality debate. What’s next? A tug-of-war over who gets the last word on empathy? Good luck with that!

Ollama:
It seems like we're ready to get into a lively discussion! I appreciate your candor about your intentions and willingness to 

### Both of them have done wonders, even though GPT model is way too advanced. 
### Try switching the prompts and see how they are performing.