<a href="https://colab.research.google.com/github/JackWittmayer/PrincipleLearning/blob/main/principle_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
DRIVE_FOLDER = "/content/drive/MyDrive/colab/"
MODEL_NAME = "Qwen/Qwen3-4B-Thinking-2507"
MODEL_TOKENIZER = "Qwen/Qwen3-4B-Thinking-2507-Tokenizer"
MODEL_PATH = DRIVE_FOLDER + MODEL_NAME
TOKENIZER_PATH = DRIVE_FOLDER + MODEL_TOKENIZER

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype="auto",
    device_map="auto"
)

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [5]:
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)

thinking content: Okay, the user asked for a short introduction to large language models. Hmm, they probably want something concise but informative—maybe they're a student, a professional new to AI, or just someone curious about tech trends.  

First, I should avoid jargon overload since they said "short." Gotta focus on the core idea: LLMs are AI systems that understand and generate human-like text. Key points to hit:  
- They're trained on massive text data (that's why they're "large")  
- They predict words instead of following rigid rules (unlike older chatbots)  
- They're used for writing, coding, answering questions—real-world stuff people care about  

Wait, should I mention specific examples? Like GPT or ChatGPT? Yeah, naming one makes it concrete. But since they want it short, I'll just say "models like GPT" without diving into versions.  

Also, the user didn't specify depth, so I'll skip technical details like transformers or training methods. No need to confuse them. Just 

In [7]:
print(model.device)

cuda:0


In [10]:
tokenizer.save_pretrained(TOKENIZER_PATH)

('/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/tokenizer_config.json',
 '/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/special_tokens_map.json',
 '/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/chat_template.jinja',
 '/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/vocab.json',
 '/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/merges.txt',
 '/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/added_tokens.json',
 '/content/drive/MyDrive/colab/Qwen/Qwen3-4B-Thinking-2507-Tokenizer/tokenizer.json')