<a href="https://colab.research.google.com/github/AshhKetchup/Mini-P/blob/main/smolLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [43]:
!pip install transformers



In [44]:
# pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen2.5-1.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)



In [49]:
#zero shot prompting
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


tasks = ["step on", "sit comfortably on", "place flowers in", "get potatoes out of fire with", "water plant with",
         "get lemon out of tea with", "dig hole with", "open bottle of beer with", "open parcel with", "serve wine with",
         "pour sugar with", "smear butter with", "extinguish fire with", "pound carpet with"]

problem_statement = "I am a highly intelligent question answering bot and I answer questions from a human perspective. Given the target task, I will list candidate objects in daily life that can be used as a vehicle for it, think about the rationales for why they afford the task and the corresponding visual features. Finally, summarize the corresponding visual features in one sentence. The description of each object is best distinguished from the other."

# Comprehensive single prompt that handles all steps at once
comprehensive_prompt = """Q: Which common objects in daily life can be used as a tool for humans to {}? Please list 20 most suitable objects.
Objects should be different from each other.

For each object:
1. First provide the name of the object
2. Then explain the rationales why it affords the task from the perspective of visual features
3. Finally, summarize the corresponding visual features in one sentence, as comma-separated values where each feature is brief

The answer for each object should follow this format:
# [Number]. [Object Name]:
# Rationales: [Detailed explanation of why this object works for the task]
# Visual Features: [feature 1], [feature 2], [feature 3], ...

Please provide all 20 objects with their rationales and visual features."""

# Select the task
selected_task = tasks[3]  # "get potatoes out of fire with"

# Create message for the single comprehensive prompt
messages = [
    {"role": "system", "content": problem_statement},
    {"role": "user", "content": comprehensive_prompt.format(selected_task)}
]

# Apply the template and generate the response
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print("COMPREHENSIVE PROMPT:\n" + input_text)

inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=3000, temperature=0.2, top_p=0.9, do_sample=True)
response = tokenizer.decode(outputs[0])
print("\nResponse:\n" + response)


COMPREHENSIVE PROMPT:
<|im_start|>system
I am a highly intelligent question answering bot and I answer questions from a human perspective. Given the target task, I will list candidate objects in daily life that can be used as a vehicle for it, think about the rationales for why they afford the task and the corresponding visual features. Finally, summarize the corresponding visual features in one sentence. The description of each object is best distinguished from the other.<|im_end|>
<|im_start|>user
Q: Which common objects in daily life can be used as a tool for humans to get potatoes out of fire with? Please list 20 most suitable objects.
Objects should be different from each other.

For each object:
1. First provide the name of the object
2. Then explain the rationales why it affords the task from the perspective of visual features
3. Finally, summarize the corresponding visual features in one sentence, as comma-separated values where each feature is brief

The answer for each object

In [48]:
#chain of thought
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


tasks = ["step on", "sit comfortably on", "place flowers in", "get potatoes out of fire with", "water plant with",
         "get lemon out of tea with", "dig hole with", "open bottle of beer with", "open parcel with", "serve wine with",
         "pour sugar with", "smear butter with", "extinguish fire with", "pound carpet with"]

problem_statement = "I am a highly intelligent question answering bot and I answer questions from a human perspective."

# Step 1: Get objects prompt
objects_prompt = """Q: Which common objects in daily life can be used as a tool for humans to {}?
Please list 20 most suitable objects. Objects should be different from each other."""

# Step 2: Get rationales prompt
rationales_prompt = """For each object listed above, please explain the rationales for why they afford the task of {}
from the perspective of visual features."""

# Step 3: Get visual features prompt
features_prompt = """For each object and its rationales, please summarize the corresponding visual features in one sentence,
with comma-separated values of features, where each feature is described briefly.

For example:
1. [Object name]:
Rationales: [Rationale from previous response]
Visual Features: [feature 1], [feature 2], [feature 3], ..."""

# Select the task
selected_task = tasks[3]  # "get potatoes out of fire with"

# Chain of Thought: First prompt for objects
messages_step1 = [
    {"role": "system", "content": problem_statement},
    {"role": "user", "content": objects_prompt.format(selected_task)}
]
input_text_step1 = tokenizer.apply_chat_template(messages_step1, tokenize=False)
print("STEP 1: Get objects\n" + input_text_step1)

inputs_step1 = tokenizer.encode(input_text_step1, return_tensors="pt").to(device)
outputs_step1 = model.generate(inputs_step1, max_new_tokens=1000, temperature=0.2, top_p=0.9, do_sample=True)
response_step1 = tokenizer.decode(outputs_step1[0])
print("Response for Step 1:\n" + response_step1)

# Extract just the model's response part for the next prompt
# This would need logic to extract just the model's answer from the full decoded text
# For this example, we'll assume we have a function to do this
# model_response_step1 = extract_model_response(response_step1)

# Chain of Thought: Second prompt for rationales
messages_step2 = [
    {"role": "system", "content": problem_statement},
    {"role": "user", "content": objects_prompt.format(selected_task)},
    {"role": "assistant", "content": "model_response_step1"},  # Replace with actual extracted response
    {"role": "user", "content": rationales_prompt.format(selected_task)}
]
input_text_step2 = tokenizer.apply_chat_template(messages_step2, tokenize=False)
print("\nSTEP 2: Get rationales\n" + input_text_step2)

inputs_step2 = tokenizer.encode(input_text_step2, return_tensors="pt").to(device)
outputs_step2 = model.generate(inputs_step2, max_new_tokens=1500, temperature=0.2, top_p=0.9, do_sample=True)
response_step2 = tokenizer.decode(outputs_step2[0])
print("Response for Step 2:\n" + response_step2)

messages_step3 = [
    {"role": "system", "content": problem_statement},
    {"role": "user", "content": objects_prompt.format(selected_task)},
    {"role": "assistant", "content": "model_response_step1"},  # Replace with actual extracted response
    {"role": "user", "content": rationales_prompt.format(selected_task)},
    {"role": "assistant", "content": "model_response_step2"},  # Replace with actual extracted response
    {"role": "user", "content": features_prompt}
]
input_text_step3 = tokenizer.apply_chat_template(messages_step3, tokenize=False)
print("\nSTEP 3: Get visual features\n" + input_text_step3)

inputs_step3 = tokenizer.encode(input_text_step3, return_tensors="pt").to(device)
outputs_step3 = model.generate(inputs_step3, max_new_tokens=2000, temperature=0.2, top_p=0.9, do_sample=True)
response_step3 = tokenizer.decode(outputs_step3[0])
print("Response for Step 3:\n" + response_step3)


STEP 1: Get objects
<|im_start|>system
I am a highly intelligent question answering bot and I answer questions from a human perspective.<|im_end|>
<|im_start|>user
Q: Which common objects in daily life can be used as a tool for humans to get potatoes out of fire with? 
Please list 20 most suitable objects. Objects should be different from each other.<|im_end|>

Response for Step 1:
<|im_start|>system
I am a highly intelligent question answering bot and I answer questions from a human perspective.<|im_end|>
<|im_start|>user
Q: Which common objects in daily life can be used as a tool for humans to get potatoes out of fire with? 
Please list 20 most suitable objects. Objects should be different from each other.<|im_end|>
<|im_start|>system
1. A pair of tongs or clamps designed specifically for handling hot items.
2. A long-handled spoon, which can be used to scoop up the potato without direct contact with the flames.
3. A large spatula or grill brush that is designed to handle hot surface