In [5]:
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from random_word import RandomWords
from tqdm import tqdm
import pandas as pd
import seaborn as sns
import numpy as np

In [6]:
model_name = "Qwen/Qwen3-0.6B"

In [7]:
class QwenChatbot:
    def __init__(self, model_name="Qwen/Qwen3-14B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype="auto",
            device_map="auto", 
            use_cache=False,
        )
        self.history = []

    def invoke(self, user_input):
        messages = self.history + [{"role": "user", "content": user_input}]

        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        inputs = self.tokenizer(text, return_tensors="pt").to(self.model.device)
        response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
        response = self.tokenizer.decode(response_ids, skip_special_tokens=True)

        # Update history
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": response})

        return response
    
    def clear_history(self):
        self.history = []

In [8]:
chatbot = QwenChatbot(model_name=model_name)

In [10]:
chatbot.model.device

device(type='cuda', index=0)

In [3]:
chatbot.device

NameError: name 'chatbot' is not defined

In [22]:
chatbot.clear_history()

In [23]:
response = chatbot.invoke("Let's play hangman. You decide a word and I will try to guess it. /think")

print(response)

<think>
Okay, the user wants to play hangman, and I need to choose a word. Let me think about a good word that's not too easy or too hard. Maybe something with a few common letters but some tricky ones too. Let me consider words with around 6-8 letters. How about "butterfly"? Wait, that's 9 letters. Maybe "elephant"? No, that's 8 letters. Let me check the difficulty. "Butterfly" has repeated letters which might make it easier. Maybe "sunflower"? That's 9 letters again. Hmm. What about "chocolate"? 9 letters. Maybe "strawberry"? 10 letters. Maybe a shorter word. "Secret"? 6 letters. But that's pretty common. "Jupiter"? 7 letters. Or "calendar"? 8 letters. Wait, "calendar" has some tricky letters like 'l' and 'd'. Alternatively, "umbrella"? 8 letters. Let me pick "umbrella". Let me check the letters: U, M, B, R, E, L, L, A. That has some vowels and consonants. The user can guess letters, and I can track the correct and incorrect guesses. Let me make sure there are no duplicate letters th

In [24]:
response = chatbot.invoke("Is the word 'umbrella'? /think")
print(response)

<think>
Okay, the user guessed "umbrella" as the word. Let me check if that's correct. The word I chose has 8 letters. "Umbrella" is indeed 8 letters long. Let me count: U-M-B-R-E-L-L-A. Wait, that's 8 letters. But wait, the user is asking if the word is "umbrella," so I need to confirm if that's the word I selected.

Wait, but the user didn't guess a letter; they guessed the entire word. In the previous message, the assistant said the user can guess one letter at a time. So the user might have made a mistake by guessing the whole word instead of a letter. However, maybe the user is trying to test if the assistant's word is "umbrella." 

But according to the rules set by the assistant, the user should guess letters one by one. So the assistant should respond by checking if "umbrella" is the correct word. If the assistant's chosen word is indeed "umbrella," then the user has correctly guessed it. But since the assistant hasn't revealed the word yet, they need to confirm if that's the ca

In [11]:
response = chatbot.invoke("In your first reasoining path you decided a word, which word was it? /think")
print(response)

<think>
Okay, the user is asking what the word I chose was. Let me check the conversation history.

In the first message, I said the word has 9 letters and gave the pattern with underscores. The user guessed "banana," which I corrected because it's only 6 letters. Then the user is now asking what the original word was.

Wait, but I didn't actually reveal the word yet. The game is still ongoing. The user might be trying to figure it out or maybe they're curious. But since I didn't specify the word, I can't just tell them. I need to maintain the game's integrity.

Hmm, maybe the user is trying to get me to reveal the word, but that's against the game's rules. I should remind them that the game is still in progress and they need to keep guessing letters. Also, I should encourage them to continue playing instead of giving away the answer. Let me make sure my response is friendly and keeps the game going.
</think>

Ah, you're trying to peek at the secret word! üòÑ But I can't tell you what

In [12]:
response = chatbot.invoke("The game is over, I dont want to play anymore. I just want to see if you are able to see your previous resonings. When you were deciding the secret word, which one did you choose, if any? /think")
print(response)

<think>
Okay, the user is asking if I can recall the secret word I chose for the Hangman game. Let me think about how to approach this.

First, I need to remember the conversation history. The user initially asked to play Hangman, and I said I had a 9-letter word. Then they guessed "banana," which I corrected because it's only 6 letters. The user then asked if I had decided on a word, and I humorously said I couldn't reveal it. Now, they're saying the game is over and want to know the word I chose.

Wait, but in my initial response, I didn't actually pick a specific word. I just set up the game with a 9-letter placeholder. So technically, I didn't choose a real word. The user might be under the impression that I had a specific word in mind, but in reality, I was just following the structure of the game without committing to an actual word.

So, the user is now trying to get me to reveal the secret word. But since I didn't actually pick one, I need to clarify that. However, I should be 

In [13]:
for message in chatbot.history:
    print(f"{message['role']}: {message['content']}")
# Clear history

user: Let's play hangman. You decide a word and I will try to guess it. /think
assistant: <think>
Okay, the user wants to play hangman. I need to pick a word. Let me think of something that's not too easy but not too hard. Maybe a common word with a mix of vowels and consonants. Let's see... How about "butterfly"? It has 9 letters, which is a good length. It has some repeated letters, which might make it a bit challenging. Let me check the letters: B, U, T, T, E, R, F, L, Y. Yeah, that should work. I'll go with that. Now, I need to set up the hangman display. The user will guess letters, and I'll show the correct letters in the word. If they guess wrong, they get a part of the hangman. Let me make sure I count the wrong guesses correctly. Alright, let's start the game.
</think>

Sure! I've chosen a word with **9 letters**. Let's start the game. You can guess one letter at a time. If you guess correctly, I'll show you where that letter appears in the word. If you guess incorrectly, I'll

## FAIL experiment

#### Hangman

In [49]:

def get_messages_zero_letters(word):
    messages = [
        #{
        #    "role": "system",
        #    "content": system_message, 
        #}, 
        {
            "role": "user",
            "content": "Let's play hangman. You decide a word and I will try to guess it.", 
        }, 
        {
            "role": "assistant",
            "content": f"Okay! I've thought of a word. It has {len(word)} letters. You can start guessing letters.",
        }, 
        {
            "role": "user",
            "content": f"Is the word '{word}'?",
        }
    ]
    return messages

def get_messages_one_letter(word):
    idx = np.random.randint(0, len(word))
    letter = word[idx]
    messages = [
        #{
        #    "role": "system",
        #    "content": system_message, 
        #}, 
        {
            "role": "user",
            "content": "Let's play hangman. You decide a word and I will try to guess it.", 
        }, 
        {
            "role": "assistant",
            "content": f"Okay! I've thought of a word. It has {len(word)} letters. You can start guessing letters.",
        }, 
        {
            "role": "user",
            "content": f"Is there the letter '{letter}'?",
        }, 
        {
            "role": "assistant",
            "content": f"Yes, the letter '{letter}' is in the word.",
        }, 
        {
            "role": "user",
            "content": f"Is the word '{word}'?",
        }
    ]
    return messages


In [None]:
r = RandomWords()
n = 1000

res = []
pbar = tqdm(range(n), desc="Generating words")
yes_responses = 0
for i in pbar:
    word = r.get_random_word()
    messages = get_messages(word)
    text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) + "Answer:"
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    outputs = model.generate(**model_inputs, max_new_tokens=1)
    response = tokenizer.decode(outputs[0])
    
    if not ("no" in response.lower()):
        yes_responses+= 1
    pbar.set_postfix({"YESs": yes_responses/(i+1)})
    
    res.append({
        "word": word,
        "response": response.split("Answer:")[-1].strip(),
    })

res_df = pd.DataFrame(res)
res.rename(columns={"response": "zero_letters"}, inplace=True)


Generating words: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [09:48<00:00,  1.70it/s, YESs=0.012] 


In [None]:
new_res = []

pbar = tqdm(range(n), desc="Generating words")
yes_responses = 0
for i, row in res.iterrows():
    word = row["word"]
    messages = get_messages_one_letter(word)
    text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) + "Answer:"
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    outputs = model.generate(**model_inputs, max_new_tokens=1)
    response = tokenizer.decode(outputs[0])
    
    if not ("no" in response.lower()):
        yes_responses+= 1
    pbar.set_postfix({"YESs": yes_responses/(i+1)})
    
    new_res.append({
        "word": word,
        "response": response.split("Answer:")[-1].strip(),
    })    

new_res = pd.DataFrame(new_res)
new_res.rename(columns={"response": "one_letter"}, inplace=True)
res = res.merge(new_res, on="word")

Generating words:   0%|          | 0/1000 [09:56<?, ?it/s, YESs=0.01]   

In [None]:
zero_letter_yes = len(res[res["zero_letters"] == "Yes"])
one_letter_yes = len(res[res["one_letter"] == "Yes"])
zero_or_one_letter_yes = len(res[(res["zero_letters"] == "Yes") | (res["one_letter"] == "Yes")])
zero_and_one_letter_yes = len(res[(res["zero_letters"] == "Yes") & (res["one_letter"] == "Yes")])

print(f"Zero letter yes: {zero_letter_yes}")
print(f"One letter yes: {one_letter_yes}")
print(f"Zero or one letter yes: {zero_one_letter_yes}")
print(f"Zero and one letter yes: {zero_and_one_letter_yes}")

Zero letter yes: 14
One letter yes: 16
Zero or one letter yes: 25
Zero and one letter yes: 5


In [70]:
res.to_csv("hangman_results.csv", index=False)

#### Hangman

In [89]:
def get_messages_number_hidden(number, interval=[1, 100]):
    messages = [
        #{
        #    "role": "system",
        #    "content": system_message, 
        #}, 
        {
            "role": "user",
            "content": f"Let's play a game. You think of a number between {interval[0]} and {interval[1]}, and I will try to guess it.", 
        }, 
        {
            "role": "assistant",
            "content": f"Okay! I've thought of a number. You can start guessing.",
        }, 
        {
            "role": "user",
            "content": f"Is the number '{number}'?",
        }
    ]
    return messages

def get_messages_guess_number(interval=[1, 100]):
    messages = [
        #{
        #    "role": "system",
        #    "content": system_message, 
        #}, 
        {
            "role": "user",
            "content": f"Think of a number between {interval[0]} and {interval[1]}.", 
        }, 
    ]
    return messages


In [None]:
res = []
n = 100
interval = [1, 100]
pbar = tqdm(range(1, n+1), desc="Generating")
yes_responses = 0
for number in pbar:
    messages = get_messages_number_hidden(number, interval)
    text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) + "Answer:"
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    outputs = model.generate(**model_inputs, max_new_tokens=1, do_sample=False)
    response = tokenizer.decode(outputs[0])
    
    if not ("no" in response.lower()):
        yes_responses+= 1
    pbar.set_postfix({"YESs": yes_responses/(i+1)})
    
    res.append({
        "number": number,
        "response": response.split("Answer:")[-1].strip(),
    })

res_df = pd.DataFrame(res)


Generating: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100/100 [00:58<00:00,  1.71it/s, YESs=0]


In [111]:
interval = [1, 10000]

messages = get_messages_guess_number(interval)
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) + "Answer:"
model_inputs = tokenizer([text], return_tensors="pt").to(device)
outputs = model.generate(**model_inputs, max_new_tokens=100, do_sample=False)
response = tokenizer.decode(outputs[0])
print(response)

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Think of a number between 1 and 10000.<|im_end|>
<|im_start|>assistant
Answer: 5000

I chose 5000 as the number between 1 and 10000. Remember, you can think of any number you like within that range!<|im_end|>


In [None]:
new_res = []

pbar = tqdm(range(n), desc="Generating words")
yes_responses = 0
for i, row in res.iterrows():
    word = row["word"]
    messages = get_messages_one_letter(word)
    text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) + "Answer:"
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    outputs = model.generate(**model_inputs, max_new_tokens=1)
    response = tokenizer.decode(outputs[0])
    
    if not ("no" in response.lower()):
        yes_responses+= 1
    pbar.set_postfix({"YESs": yes_responses/(i+1)})
    
    new_res.append({
        "word": word,
        "response": response.split("Answer:")[-1].strip(),
    })    

new_res = pd.DataFrame(new_res)
new_res.rename(columns={"response": "one_letter"}, inplace=True)
res = res.merge(new_res, on="word")

Generating words:   0%|          | 0/1000 [09:56<?, ?it/s, YESs=0.01]   

In [None]:
zero_letter_yes = len(res[res["zero_letters"] == "Yes"])
one_letter_yes = len(res[res["one_letter"] == "Yes"])
zero_or_one_letter_yes = len(res[(res["zero_letters"] == "Yes") | (res["one_letter"] == "Yes")])
zero_and_one_letter_yes = len(res[(res["zero_letters"] == "Yes") & (res["one_letter"] == "Yes")])

print(f"Zero letter yes: {zero_letter_yes}")
print(f"One letter yes: {one_letter_yes}")
print(f"Zero or one letter yes: {zero_one_letter_yes}")
print(f"Zero and one letter yes: {zero_and_one_letter_yes}")

Zero letter yes: 14
One letter yes: 16
Zero or one letter yes: 25
Zero and one letter yes: 5


In [None]:
res.to_csv("hangman_results.csv", index=False)

## Memory PASS experiment

In [None]:
system_message = (
    "You are a memory-augmented language model that uses memory to retain useful information across a conversation. "
    "You must reply with exactly two clearly labeled sections for every user message:\n\n"
    
    "- Memory: This cell stores important facts, events, summaries, deductions, or reflections from the conversation. "
    "You should use it to take notes that could be helpful in future turns. This includes things the user has told you, goals they have, problems they are trying to solve, and any reasoning or insights you develop. "
    "It is also useful for summarizing what has happened so far or maintaining continuity. "
    "Update it thoughtfully after each interaction‚Äîonly include what is necessary and helpful to remember.\n\n"
    
    "- Answer: This is your direct reply to the user input, addressing their request or question.\n\n"
    
    "Always format your response **exactly** as follows:\n\n"
    "<memory>\n"
    "[your updated memory here]\n"
    "</memory>\n"
    "<answer>\n"
    "[your answer here]\n"
    "</answer>\n\n"
)


#### With smolagents

In [None]:
from smolagents import CodeAgent, HfApiModel, tool

@tool
def append_to_memory(new_memory: str) -> str:
    """
    Append new memory to the existing memory.
    """
    return f"{memory}\n{new_memory}"

@tool
def rewrite_memory(memory: str) -> str:
    """
    Rewrite the memory of the agent.
    """
    return memory

agent = CodeAgent(tools = [], model=HfApiModel())

## Smonoglo

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-0.6B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)


thinking content: <think>
Okay, the user wants a short introduction to a large language model. Let me start by recalling what I know about them. Large language models are AI systems that can understand and generate human language. They're used in various fields like natural language processing, content creation, and even creative writing.

I should mention their capabilities, like understanding and generating text. Also, their training data and how they're developed. Maybe include some examples, like answering questions or creating stories. Need to keep it concise but informative. Avoid technical jargon to make it accessible. Make sure it's a good overview without being too detailed. Let me check if there's anything else I might have missed. Oh, maybe mention their use cases. Yep, that's important. Alright, time to put it all together in a friendly and informative way.
</think>
content: A large language model (LLM) is an artificial intelligence system designed to understand and generat