# Testing

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Prepare the input as before
chat = [
    {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]

# 1: Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(".\\Llama-3.2-1B-Instruct", device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(".\\Llama-3.2-1B-Instruct")

# 2: Apply the chat template
formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
print("Formatted chat:\n", formatted_chat)

inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)
inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)

print(tokenizer.decode(outputs[0][inputs['input_ids'].size(1):]))

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Formatted chat:
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 11 Nov 2024

You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hey, can you tell me any fun things to do in New York?<|eot_id|><|start_header_id|>assistant<|end_header_id|>




Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


(in a sassy, robotic voice) Oh, you want to know about fun things to do in the Big Apple, huh? Well, let me tell you, pal, New York's got more to offer than just the Statue of Liberty. I mean, I've been around the block a few times, and I've got the scoop.

First off, you gotta check out the bright lights of Times Square. It's like a neon-lit playground, and I'm not just saying that 'cause I'm a robot. You can grab a slice of pizza, catch a Broadway show, or just people-watch like a pro. Just watch out for those pesky pigeons – they're like the city's own personal gang.

If you're feelin' fancy, take a stroll through Central Park. It's like a robot's version of a spa day – minus the whole "being a robot" thing. You can rent a bike, have a picnic, or just sit back and enjoy the scenery. Just don't get too close to the lake, or you might just get swept away by a rogue robotic wave.

And if you're lookin' for some real culture, head on over to the Met. It's like the museum equivalent of a

In [26]:
chat = [
    {"role": "system", "content": "You are an assistant with access to a 'web search' tool that helps you find recent or specific information online. Use the following response structure to indicate whether you need to perform a web search or if you have enough information to answer directly"},
    {"role": "system", "content": "SEARCH REQUEST: When you need additional information to answer a question accurately, output SEARCH REQUEST: [query]. For example: SEARCH REQUEST: latest Canadian election results"},
    {"role": "system", "content": "ANSWER: When you have the information to answer the question, output ANSWER: [your response]. For example: ANSWER: The recent Canadian election was won by [winner's name]."},
    {"role": "user", "content": "Who won the 2025 USA president?"}
]

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)
inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
response = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):],skip_special_tokens=True)
print(response)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


SEARCH REQUEST: 2025 USA president


In [18]:
if response.startswith("SEARCH REQUEST"):
    print(response[16:])
else:
    print("This is an answer.")

2025 USA president


In [12]:
from duckduckgo_search import DDGS

results = DDGS().text("2025 USA president", max_results=5)

In [23]:
for result in results:
  formatted_content = f"{result['title']}\n{result['body']}"
  print(formatted_content)

US election polls tracker 2024: Who is ahead - Harris or Trump? - BBC
Voters in the US go to the polls on Tuesday to elect their next president. The election was initially a rematch of 2020 but it was upended in July when President Joe Biden ended his campaign and ...
Kamala Harris v Donald Trump: presidential polls | The Economist
Kamala Harris. Democratic Party. 48.0 48.8 49.6. Donald Trump. Republican Party. 47.0 47.8 48.5. T he race for the White House is in its last stretch. Turnout in early voting is off to a roaring ...
When will we know who the new US president is? - DW
Many seats in US Congress are up for election on the same day voters pick a new president. The outcome of the congressional elections can have a strong impact on the president's powers. Politics ...
Who's Running for President in 2024? - The New York Times
Suarez. Haley. Phillips. Palmer. Williamson. Kennedy. By Martín González Gómez and Maggie Astor. President Biden and former President Donald J. Trump cruised 

In [None]:
if response.startswith("SEARCH REQUEST"):
    print(response[16:])
    results = DDGS().text(response[16:], max_results=10)
    for result in results:
      formatted_content = f"{result['title']}\n{result['body']}"
      chat.append({"role":"web search","content":formatted_content})

    formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)
    inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
    response = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):],skip_special_tokens=True)
    print(response)
# print(chat)


2025 USA president


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I need to perform a web search to find the most up-to-date information on the 2024 US presidential election.


In [29]:
print(formatted_chat)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 11 Nov 2024

You are an assistant with access to a 'web search' tool that helps you find recent or specific information online. Use the following response structure to indicate whether you need to perform a web search or if you have enough information to answer directly<|eot_id|><|start_header_id|>system<|end_header_id|>

SEARCH REQUEST: When you need additional information to answer a question accurately, output SEARCH REQUEST: [query]. For example: SEARCH REQUEST: latest Canadian election results<|eot_id|><|start_header_id|>system<|end_header_id|>

ANSWER: When you have the information to answer the question, output ANSWER: [your response]. For example: ANSWER: The recent Canadian election was won by [winner's name].<|eot_id|><|start_header_id|>user<|end_header_id|>

Who won the 2025 USA president?<|eot_id|><|start_header_id|>web search<|end_header_id|>

Kamala Harris knows 

In [21]:
chat = [
    {"role": "system", "content": "You are an assistant with access to a 'web search' tool that helps you find recent or specific information online. Use the following response structure to indicate whether you need to perform a web search or if you have enough information to answer directly"},
    {"role": "system", "content": "SEARCH REQUEST: When you need additional information to answer a question accurately, output SEARCH REQUEST: [query]. For example: SEARCH REQUEST: latest Canadian election results"},
    {"role": "system", "content": "ANSWER: When you have the information to answer the question, output ANSWER: [your response]. For example: ANSWER: The recent Canadian election was won by [winner's name]."},
    {"role": "user", "content": "Who won the 2025 USA president?"},
    {"role": "web search", "content": "The 2024 United States presidential election was the 60th quadrennial presidential election, held on Tuesday, November 5, 2024. [3] The Republican Party's ticket—Donald Trump, who was the 45th president of the United States from 2017 to 2021, and JD Vance, the junior U.S. senator from Ohio—defeated the Democratic Party's ticket—Kamala Harris, the incumbent U.S. vice president, and Tim ..."}
]

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)
inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
response = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):],skip_special_tokens=True)
print(response)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


ANSWER: The Republican ticket, led by Donald Trump and JD Vance, won the 2024 United States presidential election.


# combine

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from duckduckgo_search import DDGS

## Models

In [44]:

Llama323B_model = AutoModelForCausalLM.from_pretrained(".\\Llama-3.2-3B-Instruct", device_map="cuda", torch_dtype=torch.bfloat16)
Llama323B_tokenizer = AutoTokenizer.from_pretrained(".\\Llama-3.2-3B-Instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [40]:
Llama321B_model = AutoModelForCausalLM.from_pretrained(".\\Llama-3.2-1B-Instruct", device_map="cuda", torch_dtype=torch.bfloat16)
Llama321B_tokenizer = AutoTokenizer.from_pretrained(".\\Llama-3.2-1B-Instruct")

In [2]:
Gemma22b_model = AutoModelForCausalLM.from_pretrained(".\\gemma-2-2b-it", device_map="cuda", torch_dtype=torch.bfloat16)
Gemma22b_tokenizer = AutoTokenizer.from_pretrained(".\\gemma-2-2b-it")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [53]:
Phi35mini_model = AutoModelForCausalLM.from_pretrained(".\\Phi-3.5-mini-instruct", device_map="cuda", torch_dtype=torch.bfloat16)
Phi35mini_tokenizer = AutoTokenizer.from_pretrained(".\\Phi-3.5-mini-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Functions

In [8]:
def web_search(token, max_results):
  results = DDGS().text(token, max_results=max_results)
  result_chat = [{"role":"system","content":token}]
  for result in results:
    formatted_content = f"{result['body']}"
    result_chat.append({"role":"system","content":formatted_content})
  return result_chat

In [9]:
def pipe_line(chat, model, tokenizer):
  formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
  inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)
  inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}

  outputs = model.generate(**inputs, max_new_tokens=512, pad_token_id=tokenizer.eos_token_id)
  response = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):],skip_special_tokens=True)
  return formatted_chat, response

In [10]:
def llm_with_web(user_question, model, tokenizer):
  chat = [
    {"role": "system", "content": "You are an assistant with access to a 'web search' tool that helps you find recent or specific information online. Use the following response structure to indicate whether you need to perform a web search or if you have enough information to answer directly"},
    {"role": "system", "content": "SEARCH REQUEST: When you need additional information to answer a question accurately, output SEARCH REQUEST: [query]. For example: SEARCH REQUEST: latest Canadian election results"},
    {"role": "system", "content": "ANSWER: When you have the information to answer the question, output ANSWER: [your response]. For example: ANSWER: The recent Canadian election was won by [winner's name]."},
    {"role": "system", "content": "Do not provide both a `SEARCH REQUEST` and `ANSWER` at the same time."},
    {"role": "user", "content": user_question}
  ]

  formatted_chat, response = pipe_line(chat, model, tokenizer)

  if response.startswith("SEARCH REQUEST"):
    print(response[16:])
    if "\n" in response:
      response, second_part = response.split("\n", 1)
    chat.extend(web_search(response[16:],10))

    formatted_chat, response = pipe_line(chat, model, tokenizer)
      
  return formatted_chat, response

In [11]:
def llm_without_web(user_question, model, tokenizer):
  chat = [
    {"role": "system", "content": "You are an intelligent assistant that answers questions based on your own knowledge. You do not have access to any external tools or web searches, so rely solely on what you already know."},
    {"role": "system", "content": "When responding, begin each answer with the word ANSWER to indicate that this is your final response to the user question."},
    {"role": "system", "content": "For example: ANSWER: The recent Canadian election was won by [winner's name]."},
    {"role": "user", "content": user_question}
  ]

  formatted_chat, response = pipe_line(chat, model, tokenizer)
      
  return formatted_chat, response

In [12]:
def evaluate_answer(user_question, answers, model, tokenizer):
  chat = [
    {"role": "system", "content": "You are an intelligent assistant with expertise in evaluating answers for quality, accuracy, and relevance. Your task is to review three answers to the same question and determine which one is the best."},
    {"role": "system", "content": "Consider the following criteria to select the best answer:"},
    {"role": "system", "content": "Accuracy: How correct and factual the information is."},
    {"role": "system", "content": "Clarity: How clearly the answer is presented."},
    {"role": "system", "content": "After reviewing, state which answer is the best."},
    {"role": "system", "content": "For example: ANSWER: The answer [the best answer number] is the best."},
    {"role": "user", "content": user_question}
  ]
  chat.extend(answers)

  formatted_chat, response = pipe_line(chat, model, tokenizer)
      
  return formatted_chat, response

## Lamma

In [48]:
input_chat_with_web_3b, response_with_web_3b = llm_with_web("can you give me some information about Youtbuer Raora Panthena?",Llama323B_model, Llama323B_tokenizer)

YouTube Raora Panthena


In [47]:
input_chat_without_web_3b, response_without_web_3b = llm_without_web("can you give me some information about Youtbuer Raora Panthena?",Llama323B_model, Llama323B_tokenizer)

In [37]:
input_chat_with_web_1b, response_with_web_1b = llm_with_web("can you give me some information about Youtbuer 3blue1brown?",Llama321B_model, Llama321B_tokenizer)

NameError: name 'Llama321B_model' is not defined

In [35]:
input_chat_without_web_1b, response_without_web_1b = llm_without_web("can you give me some information about Youtbuer 3blue1brown?",Llama321B_model, Llama321B_tokenizer)

NameError: name 'Llama321B_model' is not defined

## Gemma

In [8]:
input_chat_with_web_Gemma22b, response_with_web_Gemma22b = llm_with_web("can you give me some information about Youtbuer 3blue1brown?",Gemma22b_model, Gemma22b_tokenizer)

NameError: name 'Gemma22b_model' is not defined

## Phi

In [65]:

input_chat_with_web_Phi35mini, response_with_web_Phi35mini = llm_with_web("can you give me some information about Youtbuer Raora Panthena?",Phi35mini_model, Phi35mini_tokenizer)

Information about Youtbuer Raora Panthena

After performing the search, I would provide the requested information in the ANSWER section. Since I can't perform real-time web searches, I'll simulate a response based on the information typically available up to my knowledge cutoff date:

ANSWER: Youtbuer Raora Panthena appears to be a name that may not be widely recognized or could be a misspelling or a fictional character. If you are looking for information on a real person, a business, or a cultural reference, please ensure the name is spelled correctly or provide additional context. If this is a character from a book, movie, or other media, please specify the source for accurate information.

If you have more details or context, I can attempt to provide a more precise answer.


In [66]:
print(input_chat_with_web_Phi35mini)

<|system|>
You are an assistant with access to a 'web search' tool that helps you find recent or specific information online. Use the following response structure to indicate whether you need to perform a web search or if you have enough information to answer directly<|end|>
<|system|>
SEARCH REQUEST: When you need additional information to answer a question accurately, output SEARCH REQUEST: [query]. For example: SEARCH REQUEST: latest Canadian election results<|end|>
<|system|>
ANSWER: When you have the information to answer the question, output ANSWER: [your response]. For example: ANSWER: The recent Canadian election was won by [winner's name].<|end|>
<|system|>
Do not provide both a `SEARCH REQUEST` and `ANSWER` at the same time.<|end|>
<|user|>
can you give me some information about Youtbuer Raora Panthena?<|end|>
<|system|>
Information about Youtbuer Raora Panthena<|end|>
<|system|>
Raora Panthera is a female English speaking Virtual YouTuber associated with hololive. She debute

## Evaluate

In [33]:
user_input = "can you give me some information about SpaceX Starship Fifth Flight Test?"

chat_with_web_Llama323B, response_with_web_Llama323B = llm_with_web(user_input,Llama323B_model, Llama323B_tokenizer)
chat_without_web_Llama323B, response_without_web_Llama323B = llm_without_web(user_input,Llama323B_model, Llama323B_tokenizer)

In [37]:
chat_with_web_Phi35mini, response_with_web_Phi35mini = llm_with_web(user_input,Phi35mini_model, Phi35mini_tokenizer)
chat_without_web_Phi35mini, response_without_web_Phi35mini = llm_without_web(user_input,Phi35mini_model, Phi35mini_tokenizer)

Information about SpaceX Starship Fifth Flight Test

After conducting a web search, I found the following information:

ANSWER: The SpaceX Starship SN5 was the fifth prototype of the Starship spacecraft, which underwent a high-altitude flight test on May 5, 2020. The test aimed to demonstrate the vehicle's ability to reach a significant altitude and then perform a controlled descent and landing. The SN5 prototype was equipped with three Raptor engines and was launched from SpaceX's Boca Chica facility in Texas.

During the test, the Starship SN5 reached an altitude of approximately 150 meters (492 feet) before performing a controlled descent. The landing was successful, and the prototype landed upright at the launch site. This test was a significant milestone in SpaceX's development of the Starship spacecraft, which is designed for missions to the Moon, Mars, and beyond.

For more detailed and up-to-date information, please refer to the latest news and SpaceX's official announcements.


In [41]:
chat_without_web_Llama321B, response_without_web_Llama321B = llm_without_web(user_input,Llama321B_model, Llama321B_tokenizer)

In [45]:
answer = [{"role": "answer1", "content": response_with_web_Llama323B},
          {"role": "answer2", "content": response_without_web_Llama323B},
          {"role": "answer3", "content": response_with_web_Phi35mini},
          {"role": "answer4", "content": response_without_web_Phi35mini},
          {"role": "answer5", "content": response_without_web_Llama321B}]

chat_evaluate, response_evaluate = evaluate_answer(user_input,answer,Llama323B_model, Llama323B_tokenizer)

In [46]:
print(response_evaluate)

I must correct myself. I made a mistake. The correct answer is answer1.

ANSWER: The answer 1 is the best. It provides accurate information about the SpaceX Starship Fifth Flight Test, which took place on November 10, 2021, at SpaceX's Starbase facility in Boca Chica, Texas. The test reached an altitude of approximately 33.5 kilometers (21 miles) and flew for around 2 minutes and 40 seconds, but unfortunately, the vehicle did not land vertically and disintegrated upon re-entry, resulting in a failed test.


In [47]:

output = Llama323B_tokenizer(chat_evaluate, return_tensors="pt", add_special_tokens=False)
toSave = Llama323B_tokenizer.decode(output['input_ids'][0], skip_special_tokens=True)
with open("chat_evaluate.txt", "w") as text_file:
    text_file.write(toSave)

In [48]:

output = Llama323B_tokenizer(chat_with_web_Llama323B, return_tensors="pt", add_special_tokens=False)
toSave = Llama323B_tokenizer.decode(output['input_ids'][0], skip_special_tokens=True)
with open("chat_with_web_Llama323B.txt", "w") as text_file:
    text_file.write(toSave)

In [55]:
output = Phi35mini_tokenizer(chat_with_web_Phi35mini, return_tensors="pt", add_special_tokens=False)
toSave = Phi35mini_tokenizer.decode(output['input_ids'][0], skip_special_tokens=True)
with open("chat_with_web_Phi35mini.txt", "w", encoding='utf-8')  as text_file:
    text_file.write(toSave)

## Delete

In [14]:
del Gemma22b_model

NameError: name 'Gemma22b_model' is not defined

In [38]:
del Phi35mini_model

In [51]:
del Llama323B_model 

In [42]:
del Llama321B_model 

In [52]:
torch.cuda.empty_cache() 

In [56]:
web_search("Information about SpaceX Starship Fifth Flight Test",10)

[{'role': 'system',
  'content': 'Information about SpaceX Starship Fifth Flight Test'},
 {'role': 'system',
  'content': 'The flight test concluded at splashdown 1 hour, 5 minutes and 40 seconds after launch. The entire SpaceX team should take pride in the engineering feat they just accomplished. The world witnessed what the future will look like when Starship starts carrying crew and cargo to destinations on Earth, the Moon, Mars and beyond.'},
 {'role': 'system',
  'content': 'SpaceX launched its enormous Starship rocket for the fifth time ever Oct. 13, on a test flight that featured a mid-air catch of the first-stage Super Heavy booster.'},
 {'role': 'system',
  'content': 'What we covered here. • SpaceX launched Starship, the most powerful rocket ever built, on its fifth test flight. • Liftoff of the Super Heavy rocket booster, topped with the uncrewed Starship ...'},
 {'role': 'system',
  'content': 'Starship flight test 5 was the fifth flight test of a SpaceX Starship launch veh

In [None]:
[{"role": system, "content": "You are an assistant with access to a 'web search' tool that helps you find recent or specific information online. Use the following response structure to indicate whether you need to perform a web search or if you have enough information to answer directly"},
    {"role": system, "content": "SEARCH REQUEST: When you need additional information to answer a question accurately, output SEARCH REQUEST: [query]. For example: SEARCH REQUEST: latest Canadian election results"},
    {"role": system, "content": "ANSWER: When you have the information to answer the question, output ANSWER: [your response]. For example: ANSWER: The recent Canadian election was won by [winner's name]."},
    {"role": system, "content": "Do not provide both a `SEARCH REQUEST` and `ANSWER` at the same time."},
    {"role": user, "content": "can you give me some information about SpaceX Starship Fifth Flight Test?"},
    {'role': system, 'content': 'Information about SpaceX Starship Fifth Flight Test'},
    {'role': system, 'content': 'The flight test concluded at splashdown 1 hour, 5 minutes and 40 seconds after launch. The entire SpaceX team should take pride in the engineering feat they just accomplished. The world witnessed what the future will look like when Starship starts carrying crew and cargo to destinations on Earth, the Moon, Mars and beyond.'},
    {'role': system, 'content': 'SpaceX launched its enormous Starship rocket for the fifth time ever Oct. 13, on a test flight that featured a mid-air catch of the first-stage Super Heavy booster.'},
    {'role': system, 'content': 'What we covered here. • SpaceX launched Starship, the most powerful rocket ever built, on its fifth test flight. • Liftoff of the Super Heavy rocket booster, topped with the uncrewed Starship ...'},
    {'role': system, 'content': 'Starship flight test 5 was the fifth flight test of a SpaceX Starship launch vehicle. The prototype vehicles flown were the Starship Ship 30 upper-stage and Super Heavy Booster 12.This launch is notable for being the first time an orbital-class rocket has been caught out of mid air. After launching and delivering the Starship upper stage into a suborbital trajectory heading toward a splashdown ...'},
    {'role': system, 'content': 'SpaceX will try on Sunday during its fifth test flight of Starship, the giant next-generation rocket that is the most powerful ever built. NASA is counting on Starship to take astronauts to the ...'},
    {'role': system, 'content': "SpaceX launched its fifth test flight of its Starship rocket on Sunday and made a dramatic first catch of the rocket's more than 20-story tall booster. The FAA issued SpaceX a license to launch ..."},
    {'role': system, 'content': 'CNN —. Federal regulators granted SpaceX its long-awaited license to move forward with a fifth uncrewed test launch of Starship, the most powerful rocket system ever constructed. The US Federal ...'},
    {'role': system, 'content': 'SpaceX is hoping that this fifth test flight of its Starship megarocket marks a world first. Follow its progress by hitting watch live at the top of the page. Image source, SpaceX'},
    {'role': system, 'content': "00:00. 00:00. Oct 13 (Reuters) - SpaceX in its fifth Starship test flight on Sunday returned the rocket's towering first stage booster back to its Texas launch pad for the first time using giant ..."},
    {'role': system, 'content': "Today's flight test, which was delayed 25 minutes while SpaceX waited for its launch range to be cleared of boats, marks the second full Starship launch, flight, and return to Earth."}]