# Hunyuan-MT-Chimera-7B-fp8 Translation Model Testing

This notebook demonstrates how to use the **Hunyuan-MT-Chimera-7B-fp8** model for English to Vietnamese translation.

## Features Tested:
- ✅ **Single Translation**: Translate individual sentences
- ✅ **Batch Translation**: Translate multiple sentences efficiently  
- ✅ **Performance Analysis**: Measure translation speed and memory usage
- ✅ **Interactive Mode**: Real-time translation interface
- ✅ **Length Analysis**: Test translation quality across different sentence lengths

## Model Information:
- **Model**: Hunyuan-MT-Chimera-7B-fp8
- **Task**: English to Vietnamese Translation
- **Format**: FP8 quantized for efficiency
- **Location**: `../Hunyuan-MT-Chimera-7B-fp8/`

## Usage Instructions:
1. Run all cells in sequence
2. Uncomment the `interactive_translate()` line for interactive mode
3. Modify test sentences as needed
4. Check GPU/CPU usage in the performance analysis section

---

In [8]:
# Hunyuan-MT-Chimera-7B-fp8 Translation Inference Demo
# Setup and import required libraries

import sys
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import time

# Add the src directory to the path
sys.path.append('/Users/thanh/Workspace/VietLLMDataset/src')

from translation.hunyuan_translator import HunyuanTranslator

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name()}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

PyTorch version: 2.8.0
CUDA available: False


In [6]:
model_path = "../Hunyuan-MT-Chimera-7B-fp8"
tokenizer = AutoTokenizer.from_pretrained(model_path, dtype=torch.float16, device_map='auto')

In [9]:
model = AutoModelForCausalLM.from_pretrained(model_path, dtype="auto", device_map="auto")

ImportError: compressed_tensors is not installed and is required for compressed-tensors quantization. Please install it with `pip install compressed-tensors`.

In [None]:
# Initialize the Hunyuan Translator
# Model path is relative to the notebook location
  # Path to the model from current location
print(f"Loading model from: {model_path}")

# Initialize translator with appropriate settings
translator = HunyuanTranslator(
    model_name=model_path,
    device=None,  # Auto-detect (CUDA if available, otherwise CPU)
    batch_size=2,  # Smaller batch size for memory efficiency
    max_length=256  # Reasonable max length for translation
)

print("✅ Hunyuan-MT-Chimera-7B-fp8 model loaded successfully!")

Loading model from: ../Hunyuan-MT-Chimera-7B-fp8


`torch_dtype` is deprecated! Use `dtype` instead!
Error loading model: Unrecognized configuration class <class 'transformers.models.hunyuan_v1_dense.configuration_hunyuan_v1_dense.HunYuanDenseV1Config'> for this kind of AutoModel: AutoModelForSeq2SeqLM.
Model type should be one of BartConfig, BigBirdPegasusConfig, BlenderbotConfig, BlenderbotSmallConfig, EncoderDecoderConfig, FSMTConfig, GPTSanJapaneseConfig, GraniteSpeechConfig, LEDConfig, LongT5Config, M2M100Config, MarianConfig, MBartConfig, MT5Config, MvpConfig, NllbMoeConfig, PegasusConfig, PegasusXConfig, PLBartConfig, ProphetNetConfig, Qwen2AudioConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SwitchTransformersConfig, T5Config, T5GemmaConfig, UMT5Config, VoxtralConfig, XLMProphetNetConfig.
Error loading model: Unrecognized configuration class <class 'transformers.models.hunyuan_v1_dense.configuration_hunyuan_v1_dense.HunYuanDenseV1Config'> for this kind of AutoModel: AutoModelForSeq2SeqLM.
Model type should be one of BartConfig,

ValueError: Unrecognized configuration class <class 'transformers.models.hunyuan_v1_dense.configuration_hunyuan_v1_dense.HunYuanDenseV1Config'> for this kind of AutoModel: AutoModelForSeq2SeqLM.
Model type should be one of BartConfig, BigBirdPegasusConfig, BlenderbotConfig, BlenderbotSmallConfig, EncoderDecoderConfig, FSMTConfig, GPTSanJapaneseConfig, GraniteSpeechConfig, LEDConfig, LongT5Config, M2M100Config, MarianConfig, MBartConfig, MT5Config, MvpConfig, NllbMoeConfig, PegasusConfig, PegasusXConfig, PLBartConfig, ProphetNetConfig, Qwen2AudioConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SwitchTransformersConfig, T5Config, T5GemmaConfig, UMT5Config, VoxtralConfig, XLMProphetNetConfig.

In [None]:
# Test single translation
print("🔄 Testing single translation...")

# Sample English texts for translation
test_sentences = [
    "Hello, how are you today?",
    "The weather is beautiful this morning.",
    "I love learning new languages.",
    "Technology is changing our world rapidly.",
    "What time is the meeting scheduled for tomorrow?"
]

print("\n" + "="*60)
print("SINGLE TRANSLATION TESTS")
print("="*60)

for i, sentence in enumerate(test_sentences, 1):
    print(f"\n{i}. English: {sentence}")
    
    # Measure translation time
    start_time = time.time()
    vietnamese_translation = translator.translate_single(
        text=sentence,
        source_lang="en",
        target_lang="vi"
    )
    end_time = time.time()
    
    print(f"   Vietnamese: {vietnamese_translation}")
    print(f"   Time: {end_time - start_time:.2f}s")
    print("-" * 40)

In [None]:
# Test batch translation
print("🔄 Testing batch translation...")

# Longer texts for batch testing
batch_test_texts = [
    "Machine learning is a subset of artificial intelligence that focuses on the development of algorithms and statistical models.",
    "Climate change refers to long-term shifts in global temperatures and weather patterns.",
    "The Internet of Things (IoT) describes the network of physical objects that are embedded with sensors and software.",
    "Renewable energy sources such as solar, wind, and hydroelectric power are becoming increasingly important.",
    "Digital transformation is the integration of digital technology into all areas of business."
]

print("\n" + "="*60)
print("BATCH TRANSLATION TESTS")
print("="*60)

# Measure batch translation time
start_time = time.time()
batch_translations = translator.translate_batch(
    texts=batch_test_texts,
    source_lang="en",
    target_lang="vi",
    show_progress=True
)
end_time = time.time()

print(f"\nBatch translation completed in {end_time - start_time:.2f}s")
print(f"Average time per sentence: {(end_time - start_time) / len(batch_test_texts):.2f}s")

# Display results
for i, (original, translation) in enumerate(zip(batch_test_texts, batch_translations), 1):
    print(f"\n{i}. English: {original}")
    print(f"   Vietnamese: {translation}")
    print("-" * 40)

In [None]:
# Interactive translation function
def interactive_translate():
    """Interactive translation function for user input"""
    print("🎯 Interactive Translation Mode")
    print("Type 'quit' to exit")
    print("-" * 40)
    
    while True:
        try:
            # Get user input
            user_text = input("\nEnter English text to translate: ").strip()
            
            if user_text.lower() in ['quit', 'exit', 'q']:
                print("👋 Goodbye!")
                break
            
            if not user_text:
                print("⚠️  Please enter some text to translate.")
                continue
            
            # Translate the text
            print("🔄 Translating...")
            start_time = time.time()
            translation = translator.translate_single(
                text=user_text,
                source_lang="en",
                target_lang="vi"
            )
            end_time = time.time()
            
            # Display results
            print(f"✅ English: {user_text}")
            print(f"🇻🇳 Vietnamese: {translation}")
            print(f"⏱️  Time: {end_time - start_time:.2f}s")
            
        except KeyboardInterrupt:
            print("\n👋 Goodbye!")
            break
        except Exception as e:
            print(f"❌ Error: {e}")

# Uncomment the line below to run interactive mode
# interactive_translate()

In [None]:
# Model performance and memory usage analysis
print("📊 Model Performance Analysis")
print("="*50)

# Check GPU memory usage if CUDA is available
if torch.cuda.is_available():
    print(f"GPU Memory Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    print(f"GPU Memory Cached: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")
    print(f"GPU Memory Free: {(torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_reserved()) / 1024**3:.2f} GB")
else:
    print("Running on CPU")

# Model information
print(f"\nModel Device: {translator.device}")
print(f"Model Name: {translator.model_name}")
print(f"Batch Size: {translator.batch_size}")
print(f"Max Length: {translator.max_length}")

# Test different sentence lengths
length_test_sentences = [
    "Hi!",  # Very short
    "How are you doing today?",  # Short
    "The rapid advancement of artificial intelligence has transformed many industries.",  # Medium
    "In the ever-evolving landscape of modern technology, artificial intelligence and machine learning have emerged as transformative forces that are reshaping industries, revolutionizing business processes, and fundamentally changing how we interact with digital systems in our daily lives."  # Long
]

print(f"\n📈 Translation Time vs Sentence Length:")
print("-" * 50)

for i, sentence in enumerate(length_test_sentences, 1):
    word_count = len(sentence.split())
    char_count = len(sentence)
    
    start_time = time.time()
    translation = translator.translate_single(sentence, "en", "vi")
    end_time = time.time()
    
    print(f"{i}. Words: {word_count:2d} | Chars: {char_count:3d} | Time: {end_time - start_time:.3f}s")
    print(f"   EN: {sentence[:60]}{'...' if len(sentence) > 60 else ''}")
    print(f"   VI: {translation[:60]}{'...' if len(translation) > 60 else ''}")
    print()