# Inference with LoRA Fine-tuned Qwen Model

This notebook demonstrates how to use your fine-tuned model for:
- Translation (EN ‚Üî RU)
- Question Answering (EN & RU)

In [1]:
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

  from .autonotebook import tqdm as notebook_tqdm


## Configuration

In [4]:
BASE_MODEL = "Qwen/Qwen2.5-0.5B"
LORA_ROOT = "./qwen2.5_lora"   # folder with checkpoints or final model
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16

print(f"Device: {DEVICE}")

Device: cuda


## Load Model

In [5]:
def get_model_path(lora_root):
    """Get the path to the LoRA model (final > checkpoint > epoch)"""
    # Check for final model
    final_path = os.path.join(lora_root, "final")
    if os.path.exists(final_path):
        print(f"‚úÖ Using final model: {final_path}")
        return final_path
    
    # Check for checkpoints
    checkpoints = [
        d for d in os.listdir(lora_root)
        if d.startswith("checkpoint-") and os.path.isdir(os.path.join(lora_root, d))
    ]
    if checkpoints:
        checkpoints.sort(key=lambda x: int(x.split("-")[1]))
        checkpoint_path = os.path.join(lora_root, checkpoints[-1])
        print(f"‚úÖ Using latest checkpoint: {checkpoint_path}")
        return checkpoint_path
    
    # Check for epoch folders
    epochs = [
        d for d in os.listdir(lora_root)
        if d.startswith("epoch_") and os.path.isdir(os.path.join(lora_root, d))
    ]
    if epochs:
        epochs.sort(key=lambda x: int(x.split("_")[1]))
        epoch_path = os.path.join(lora_root, epochs[-1])
        print(f"‚úÖ Using latest epoch: {epoch_path}")
        return epoch_path
    
    raise RuntimeError(f"No model found in {lora_root}")

LORA_PATH = get_model_path(LORA_ROOT)

‚úÖ Using final model: ./qwen2.5_lora/final


In [6]:
print("üî§ Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL,
    trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token

üî§ Loading tokenizer...


In [7]:
print("ü§ñ Loading base model...")
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    dtype=DTYPE,
    device_map="auto",
    trust_remote_code=True
)

ü§ñ Loading base model...


In [8]:
print("üß© Loading LoRA adapters...")
model = PeftModel.from_pretrained(
    base_model,
    LORA_PATH,
    torch_dtype=DTYPE
)
model.eval()

print("‚úÖ Model loaded successfully!")

üß© Loading LoRA adapters...
‚úÖ Model loaded successfully!


## Inference Function

In [9]:
def generate_response(prompt, max_new_tokens=512, temperature=0.0, top_p=1.0):
    """
    Generate a response from the model.
    
    Args:
        prompt: The user's question or instruction
        max_new_tokens: Maximum tokens to generate
        temperature: Sampling temperature (0 = greedy/deterministic)
        top_p: Nucleus sampling parameter
    
    Returns:
        Generated response text
    """
    # Format the message
    message = f"<|user|>\n{prompt}\n<|assistant|>\n"
    
    # Tokenize
    inputs = tokenizer(
        message,
        return_tensors="pt",
        padding=False
    ).to(model.device)
    
    # Generate
    with torch.no_grad():
        output_ids = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_new_tokens=max_new_tokens,
            do_sample=temperature > 0,
            temperature=temperature if temperature > 0 else None,
            top_p=top_p if temperature > 0 else None,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode only the generated part
    output_text = tokenizer.decode(
        output_ids[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    )

    print(f"Prompt: {prompt}")
    print(f"\nResponse: {output_text}")
    
    return output_text

## Example 1: Translation (Russian ‚Üí English)

In [10]:
PROMPT = "Translate the following question to English:\n\n–ö–æ–≥–¥–∞ –±—ã–ª–∞ –≤–æ–π–Ω–∞?"

response = generate_response(PROMPT, max_new_tokens=128)

Prompt: Translate the following question to English:

–ö–æ–≥–¥–∞ –±—ã–ª–∞ –≤–æ–π–Ω–∞?

Response: When was the war?



## Example 2: Translation (English ‚Üí Russian)

In [11]:
PROMPT = "Translate the following question to Russian:\n\nWhat is artificial intelligence?"

response = generate_response(PROMPT, max_new_tokens=128)

Prompt: Translate the following question to Russian:

What is artificial intelligence?

Response: –ß—Ç–æ —Ç–∞–∫–æ–µ –∏—Å–∫—É—Å—Å—Ç–≤–µ–Ω–Ω—ã–π –∏–Ω—Ç–µ–ª–ª–µ–∫—Ç?



## Example 3: Question Answering (English)

In [14]:
PROMPT = "What is machine learning?"

response = generate_response(PROMPT, max_new_tokens=256)

Prompt: What is machine learning?

Response: Machine learning is a subset of artificial intelligence that involves the development of algorithms and statistical models that enable computers to learn and improve from experience. It involves the use of statistical techniques to enable computers to automatically learn and improve from data, without being explicitly programmed to perform a specific task.

Here are some key aspects of machine learning:

1. **Data Collection**: Machine learning requires data to train models. This data can be in the form of text, images, audio, or any other type of data that can be processed by the algorithm.

2. **Model Training**: The algorithm learns from the data to identify patterns and make predictions or decisions. This is typically done using supervised learning, where the algorithm is given labeled data and learns to predict the output based on the input.

3. **Evaluation**: The performance of the model is evaluated using metrics such as accuracy, pr

## Example 4: Question Answering (Russian)

In [18]:
PROMPT = "–ß—Ç–æ —Ç–∞–∫–æ–µ –º–∞—à–∏–Ω–Ω–æ–µ –æ–±—É—á–µ–Ω–∏–µ?"

response = generate_response(PROMPT, max_new_tokens=256)

Prompt: –ß—Ç–æ —Ç–∞–∫–æ–µ –º–∞—à–∏–Ω–Ω–æ–µ –æ–±—É—á–µ–Ω–∏–µ?

Response: –ú–∞—à–∏–Ω–Ω–æ–µ –æ–±—É—á–µ–Ω–∏–µ ‚Äî —ç—Ç–æ –ø—Ä–æ—Ü–µ—Å—Å, –Ω–∞–ø—Ä–∞–≤–ª–µ–Ω–Ω—ã–π –Ω–∞ –æ–±—É—á–µ–Ω–∏–µ –º–∞—à–∏–Ω–Ω–æ–≥–æ –∏–Ω—Ç–µ—Ä—Ñ–µ–π—Å–∞ (–º–∞—à–∏–Ω–Ω–æ–≥–æ –æ–±—É—á–µ–Ω–∏—è) –∫ –æ–ø—Ä–µ–¥–µ–ª–µ–Ω–Ω—ã–º –∑–∞–¥–∞—á–∞–º, –∫–æ—Ç–æ—Ä—ã–µ –º–æ–≥—É—Ç –±—ã—Ç—å —Ä–µ—à–µ–Ω—ã —Å –ø–æ–º–æ—â—å—é –º–∞—à–∏–Ω–Ω–æ–≥–æ –æ–±—É—á–µ–Ω–∏—è. –ú–∞—à–∏–Ω–Ω–æ–µ –æ–±—É—á–µ–Ω–∏–µ –≤–∫–ª—é—á–∞–µ—Ç –≤ —Å–µ–±—è —Ä–∞–∑—Ä–∞–±–æ—Ç–∫—É –∏ –≤–Ω–µ–¥—Ä–µ–Ω–∏–µ –∞–ª–≥–æ—Ä–∏—Ç–º–æ–≤ –º–∞—à–∏–Ω–Ω–æ–≥–æ –æ–±—É—á–µ–Ω–∏—è –¥–ª—è —Ä–µ—à–µ–Ω–∏—è –∑–∞–¥–∞—á, —Ç–∞–∫–∏—Ö –∫–∞–∫ –∫–ª–∞—Å—Å–∏—Ñ–∏–∫–∞—Ü–∏—è, —Ä–µ–≥—Ä–µ—Å—Å–∏—è, —Ä–µ–≥—É–ª–∏—Ä–æ–≤–∞–Ω–∏–µ –∏ –ø—Ä–µ–¥—Å–∫–∞–∑–∞–Ω–∏–µ.

–û—Å–Ω–æ–≤–Ω—ã–µ —ç—Ç–∞–ø—ã –º–∞—à–∏–Ω–Ω–æ–≥–æ –æ–±—É—á–µ–Ω–∏—è –≤–∫–ª—é—á–∞—é—Ç:

1. **–ò–∑—É—á–µ–Ω–∏–µ –∑–∞–¥–∞—á–∏**: –û–ø—Ä–µ–¥–µ–ª–µ–Ω–∏–µ –∑–∞–¥–∞—á–∏, –∫–æ—Ç–æ—Ä—É—é –Ω—É–∂–Ω–æ —Ä–µ—à–∏—Ç—å, –∏ –∞–Ω–∞–ª–∏–∑ –µ–µ —Å–ª–æ–∂–Ω–æ—Å—Ç–∏.
2. **–ò—Å—Å–ª–µ–¥–æ–≤–∞–Ω–∏–µ –∞–ª–≥

## Other Inferences

Test multiple prompts at once

In [20]:
test_prompts = [
    "Translate the following question to Russian:\n\nHow does photosynthesis work?",
    "Translate the following question to English:\n\n–ö–∞–∫ —Ä–∞–±–æ—Ç–∞–µ—Ç —Ñ–æ—Ç–æ—Å–∏–Ω—Ç–µ–∑?",
    "What is the capital of Russia?",
    "–ö–∞–∫–∞—è —Å—Ç–æ–ª–∏—Ü–∞ –†–æ—Å—Å–∏–∏?",
]

for i, prompt in enumerate(test_prompts, 1):
    print(f"\n{'='*60}")
    print(f"Test {i}/{len(test_prompts)}")
    response = generate_response(prompt, max_new_tokens=256)


Test 1/4
Prompt: Translate the following question to Russian:

How does photosynthesis work?

Response: –ö–∞–∫ —Ä–∞–±–æ—Ç–∞–µ—Ç Photosynthesis?


Test 2/4
Prompt: Translate the following question to English:

–ö–∞–∫ —Ä–∞–±–æ—Ç–∞–µ—Ç —Ñ–æ—Ç–æ—Å–∏–Ω—Ç–µ–∑?

Response: How does photo synthesis work?


Test 3/4
Prompt: What is the capital of Russia?

Response: The capital of Russia is Moscow. It is the largest city in Russia and the country's political, economic, and cultural center. Moscow is located in the northwestern part of the country and is the capital of the Russian Federation.


Test 4/4
Prompt: –ö–∞–∫–∞—è —Å—Ç–æ–ª–∏—Ü–∞ –†–æ—Å—Å–∏–∏?

Response: –†–æ—Å—Å–∏—è –∏–º–µ–µ—Ç –Ω–µ—Å–∫–æ–ª—å–∫–æ —Å—Ç–æ–ª–∏—Ü, –∫–æ—Ç–æ—Ä—ã–µ —è–≤–ª—è—é—Ç—Å—è –≤–∞–∂–Ω—ã–º–∏ –¥–ª—è —ç–∫–æ–Ω–æ–º–∏–∫–∏, –ø–æ–ª–∏—Ç–∏–∫–∏ –∏ –∫—É–ª—å—Ç—É—Ä–Ω–æ–≥–æ –Ω–∞—Å–ª–µ–¥–∏—è —Å—Ç—Ä–∞–Ω—ã. –í–æ—Ç –Ω–µ–∫–æ—Ç–æ—Ä—ã–µ –∏–∑ –Ω–∏—Ö:

1. **–ú–æ—Å–∫–≤–∞** ‚Äî —Å—Ç–æ–ª–∏—Ü–∞ –∏ —Ü–µ–Ω—Ç—Ä —ç–∫–æ–Ω–æ–º–∏–∫–∏, –ø–æ–ª–∏—Ç–∏–∫–∏ –∏ –∫—