In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import re

model_name_or_path = "tencent/Hunyuan-1.8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    device_map="auto",
    #torch_dtype=torch.bfloat16 # You may want to use bfloat16 and/or move to GPU here
)

messages = [
    #{"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "/no_think Salam! Azərbaycanca cavab ver və özünü təqdim et."} # /no_think as prefix to disable thinking mode.
]

tokenized_chat = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=False # Toggle thinking mode (default: True)
)

outputs = model.generate(
    tokenized_chat.to(model.device),
    max_new_tokens=2048,
    do_sample=True,
    top_k=20,
    top_p=0.8,
    repetition_penalty=1.05,
    temperature=0.7
)

output_text = tokenizer.decode(outputs[0])
print("output_text=",output_text)
think_pattern = r'<think>(.*?)</think>'
think_matches = re.findall(think_pattern, output_text, re.DOTALL)

answer_pattern = r'<answer>(.*?)</answer>'
answer_matches = re.findall(answer_pattern, output_text, re.DOTALL)

think_content = [match.strip() for match in think_matches][0]
answer_content = [match.strip() for match in answer_matches][0]
print(f"thinking_content:{think_content}\n\n")
print(f"answer_content:{answer_content}\n\n")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/146 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/3.58G [00:00<?, ?B/s]



generation_config.json:   0%|          | 0.00/221 [00:00<?, ?B/s]



output_text= <｜hy_begin▁of▁sentence｜><｜hy_User｜>/no_think Salam! Azərbaycanca cavab ver və özünü təqdim et.<｜hy_Assistant｜><think>

</think>
<answer>
Azərbaycanca cavab ver, özünün təqdim etmek istiyorum. 😊
</answer><｜hy_place▁holder▁no▁2｜>
thinking_content:


answer_content:Azərbaycanca cavab ver, özünün təqdim etmek istiyorum. 😊




In [6]:
import re

# === CHANGE THIS ===
new_message_content = "/no_think Salam! Necəsən?! Azərbaycanca bilirsən?"

# Update messages
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": new_message_content}
]

# Tokenize with thinking disabled
tokenized_chat = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=False
)

# Generate
outputs = model.generate(
    tokenized_chat.to(model.device),
    max_new_tokens=2048,
    #do_sample=True,
    #top_k=20,
    #top_p=0.8,
    #repetition_penalty=1.05,
    #temperature=0.7
)

# Decode & Extract
output_text = tokenizer.decode(outputs[0])
print("output_text=", output_text)

# Extract thinking and answer
think_matches = re.findall(r'<think>(.*?)</think>', output_text, re.DOTALL)
answer_matches = re.findall(r'<answer>(.*?)</answer>', output_text, re.DOTALL)

think_content = think_matches[0].strip() if think_matches else ""
answer_content = answer_matches[0].strip() if answer_matches else ""

print(f"thinking_content: {think_content}\n")
print(f"answer_content: {answer_content}\n")


output_text= <｜hy_begin▁of▁sentence｜>You are a helpful assistant.<｜hy_place▁holder▁no▁3｜><｜hy_User｜>/no_think Salam! Necəsən?! Azərbaycanca bilirsən?<｜hy_Assistant｜><think>

</think>
<answer>
Salam! Necəsən? Azərbaycanca bilirsən? 😊
</answer><｜hy_place▁holder▁no▁2｜>
thinking_content: 

answer_content: Salam! Necəsən? Azərbaycanca bilirsən? 😊



## Conclusion & Insights

The evaluation of **Tencent's Hunyuan Small Generative Model (1.8B parameters)** demonstrated valuable observations regarding its Azerbaijani language handling, even in its pre-trained form:

1. **Tokenizer Coverage in Azerbaijani**: Despite being a small-scale model with just 1.8 billion parameters, Hunyuan's tokenizer includes Azerbaijani tokens, allowing the model to process and recognize Azerbaijani text inputs at the token level.

2. **Fallback Behavior – Prompt Repetition**: When the model lacks sufficient pre-training exposure to Azerbaijani, it tends to repeat or slightly modify the user’s prompt as a default response. This "prompt echoing" is a common fallback mechanism in under-tuned small generative models.

3. **Instruction Tuning as a Path to Enhancement**: By applying instruction fine-tuning on a high-quality, curated Azerbaijani dataset, we can potentially teach the model to follow instructions more reliably and generate contextually appropriate responses. Such targeted tuning would significantly enhance its utility for Azerbaijani language applications.

4. **Scalability with Larger Hunyuan Variants**: The larger Hunyuan models (4B, 7B parameters), while still lightweight compared to large-scale FMs, are expected to provide richer linguistic capabilities and context handling. They offer a promising balance between computational efficiency and model performance for low-resource languages like Azerbaijani.

5. **Consumer Hardware Deployment**: Hunyuan’s small generative models are optimized for efficient inference on resource-constrained hardware. They can operate on single-GPU servers and even CPUs, making them viable for deployment on personal devices like laptops, desktops, and potentially smartphones with further optimization.

### Key Insight:

The **Hunyuan small model series** represents a compelling foundation for building **Azerbaijani-centric AI applications**, provided they are instruction-tuned on domain-specific datasets. Their lightweight design aligns well with edge deployment scenarios, allowing the development of localized, private AI services without heavy infrastructure dependencies.

---