# ДЗ №2
Есть обученный Transformer Decoder. Нужно реализовать разные способы генерации текста для заранее обученного Transformer Decoder.

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from math import log
from collections import deque

In [3]:
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-0.5B-Instruct').eval()
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-0.5B-Instruct')

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)  

# Промпты
input_text_hedgehog = '<|im_start|>system\nYou are a storyteller. Generate a story based on user message.<|im_end|>\n<|im_start|>user\nGenerate me a short story about a tiny hedgehog named Sonic.<|im_end|>\n<|im_start|>assistant\n'
input_text_json = '<|im_start|>system\nYou are a JSON machine. Generate a JSON with format {"contractor": string with normalized contractor name, "sum": decimal, "currency": string with uppercased 3-letter currency code} based on user message.<|im_end|>\n<|im_start|>user\nTransfer 100 rubles and 50 kopeck to Mike<|im_end|>\n<|im_start|>assistant\n'

config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

2025-04-28 20:44:45.649038: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745873085.839818      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745873085.891922      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

## Задача 1. Greedy Decoding

In [27]:
def greedy_decode(input_text, model, tokenizer, device, max_length=1000):
    inputs = tokenizer(input_text, return_tensors='pt').to(device)
    input_ids = inputs.input_ids
    attention_mask = inputs.attention_mask
    eos_token_id = tokenizer.eos_token_id or 151645  # ID EOS токена

    generated_ids = input_ids.clone()
    current_attention_mask = attention_mask.clone()

    for i in range(max_length):
        with torch.no_grad():
            outputs = model(input_ids=generated_ids, attention_mask=current_attention_mask)
        
        next_token_logits = outputs.logits[:, -1, :]
        next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)

        generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
        current_attention_mask = torch.cat([current_attention_mask, torch.ones((1, 1), dtype=torch.long).to(device)], dim=-1)

        if next_token_id.item() == eos_token_id:
            break

    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    
story = greedy_decode(input_text_hedgehog, model, tokenizer, device)
json_output = greedy_decode(input_text_json, model, tokenizer, device)

print("Сказка о Сонике:\n", story)
print("\nСгенерированный JSON:\n", json_output)

Сказка о Сонике:
 system
You are a storyteller. Generate a story based on user message.
user
Generate me a short story about a tiny hedgehog named Sonic.
assistant
Once upon a time, in a small, cozy village nestled in the heart of the forest, there lived a tiny hedgehog named Sonic. Sonic was a curious and adventurous creature, always eager to explore the world around him. One day, while wandering through the forest, Sonic stumbled upon a hidden cave.

Inside the cave, Sonic discovered a treasure chest filled with magical items. As he opened the chest, he was amazed to see that the items were not just ordinary, but enchanted. Sonic was thrilled to find that he could use the items to help others in need.

From that day on, Sonic became a hero in the village. He used his magical powers to help people in need, and soon, the village was filled with people who were grateful for the help they received from Sonic.

Sonic's story became a legend, and people from all over the village would tell

### **1) Если запустить алгоритм несколько раз, то будут ли различаться генерации?**
   
Так как мы используем жадный алгоритм, который всегда выбирает токен с максимальной вероятностью (=выбор детерминирован), то нет, отличаться не будет.

P.S. Насколько я понял, всё-таки есть шанс получить разные логиты, если при запусках используется разные девайсы (CPU/GPU). Но тогда стоит добавить 
```
torch.backends.cudnn.deterministic = True  # Делает CUDA операции детерминированными
torch.backends.cudnn.benchmark = False     # Отключает автооптимизации
```

### **2) Какие есть проблемы с таким подходом к генерации в случае с генерацией сказки и в случае с генерацией JSON?**

Для сказки: Текст всегда повторяется из-за отсутствия случайности.

Для JSON: Возможны синтаксические ошибки (например, незакрытые кавычки или скобки), если EOS сработает раньше завершения структуры. Для текущего промта модель не поняла, что значит копейки и какой должен быть код валюты. Скорее всего, нужно дать примеры заполнения в промте

## Задача 2. Sampling

In [28]:
def sample_decode(
    input_text, 
    model, 
    tokenizer, 
    device,
    max_length=1000
):  
    inputs = tokenizer(input_text, return_tensors='pt').to(device)
    input_ids = inputs.input_ids
    attention_mask = inputs.attention_mask
    eos_token_id = tokenizer.eos_token_id or 151645  # ID EOS токена

    generated_ids = input_ids.clone()
    current_attention_mask = attention_mask.clone()

    for i in range(max_length):
        with torch.no_grad():
            outputs = model(input_ids=generated_ids, attention_mask=current_attention_mask)
        
        logits = outputs.logits[:, -1, :] 
        probs = torch.softmax(logits, dim=-1)
        next_token_id = torch.multinomial(probs, num_samples=1)
        
        generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
        current_attention_mask = torch.cat([current_attention_mask, torch.ones((1, 1), dtype=torch.long).to(device)], dim=-1)

        if next_token_id.item() == eos_token_id:
            break

    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)


# Устанавливаем seed для воспроизводимости
torch.manual_seed(42)
for i in range(3):
    print(f'Generation {i + 1}')
    print('Story:', sample_decode(input_text_hedgehog, model, tokenizer, device))
    print('JSON:', sample_decode(input_text_json, model, tokenizer, device))
    print()

Generation 1
Story: system
You are a storyteller. Generate a story based on user message.
user
Generate me a short story about a tiny hedgehog named Sonic.
assistant
Once upon a time, in a faraway land of rolling hills and magical forests, there lived a tiny hedgehog named Sonic. Sonic was a curious and adventurous fox, with a sharp acute eye and a warm heart. One day, Sonic discovered a hidden passage in a lush, sprawling meadow. The path he followed had marked the edge of an enchanted meadow, where flowers bloomed in a burst of vibrant colors that left his eyes dazzled.

With a bouncy jump, Sonic rounded the bend and was greeted with an expansive vista. The meadow was a paradise for flora and fauna, with tall magical trees and a countless tapestry of fauna, including lianas and a giant water source crucial to the entire village. The sky overhead was clean, streaked with streams of liquid gold.

Continuing down the winding path, Sonic climbed over towering vines and through dense fore

### **1) Если запустить алгоритм несколько раз, то будут ли различаться генерации?**
   
Зависит от того, зафиксирован ли seed, потому что torch.multinomial случайно выбирает токен на основе вероятностей:

а) Если да, то генерация будет одинаковой при каждом запуске

б) Если нет, то вывод будет отличаться

### **2) Какие есть проблемы с таким подходом к генерации в случае с генерацией сказки и в случае с генерацией JSON?**

Для сказки: Иногда генерирует неестественные или грамматически неправильные предложения. Может уходить от темы 
Для JSON: Часто нарушает структуру JSON

## Задача 3. Sampling meets Temperature

In [6]:
def sample_with_temperature_decode(
    input_text, 
    model, 
    tokenizer, 
    device,
    max_length=1000,
    temperature=1.0  
):  
    inputs = tokenizer(input_text, return_tensors='pt').to(device)
    input_ids = inputs.input_ids
    attention_mask = inputs.attention_mask
    eos_token_id = tokenizer.eos_token_id or 151645

    generated_ids = input_ids.clone()
    current_attention_mask = attention_mask.clone()

    for _ in range(max_length):
        with torch.no_grad():
            outputs = model(input_ids=generated_ids, attention_mask=current_attention_mask)
        
        logits = outputs.logits[:, -1, :]
        logits /= temperature
        
        probs = torch.softmax(logits, dim=-1)
        next_token_id = torch.multinomial(probs, num_samples=1)
        
        generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
        current_attention_mask = torch.cat(
            [current_attention_mask, torch.ones((1, 1), dtype=torch.long).to(device)],
            dim=-1
        )

        if next_token_id.item() == eos_token_id:
            break

    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# Устанавливаем seed для воспроизводимости
torch.manual_seed(42)
temperatures = [0.001, 0.1, 0.5, 1.0, 10.0]

for temp in temperatures:
    print(f'Temperature: {temp}')
    print('Story:', sample_with_temperature_decode(input_text_hedgehog, model, tokenizer, device, temperature=temp))
    print('JSON:', sample_with_temperature_decode(input_text_json, model, tokenizer, device, temperature=temp))
    print()

Temperature: 0.001
Story: system
You are a storyteller. Generate a story based on user message.
user
Generate me a short story about a tiny hedgehog named Sonic.
assistant
Once upon a time, in a small, cozy village nestled in the heart of the forest, there lived a tiny hedgehog named Sonic. Sonic was a curious and adventurous creature, always eager to explore the world around him. One day, while wandering through the forest, Sonic stumbled upon a hidden cave.

Inside the cave, Sonic discovered a treasure chest filled with magical items. As he opened the chest, he was amazed to see that the items were not just ordinary, but enchanted. Sonic was thrilled to find that he could use the items to help others in need.

From that day on, Sonic became a hero in the village. He used his magical powers to help people in need, and soon, the village was filled with people who were grateful for the help they received from Sonic.

Sonic's story became a legend, and people from all over the village wo

### 1) Как отличаются генерации при разных температурах?

0.001: Почти фиксированный вывод (близко к greedy search)

0.1: Консервативные, но немного разнообразные варианты

0.5: Умеренная креативность

1.0: Баланс между креативностью и осмысленностью

10.0: Полный хаос, случайные токены

### 2) Закономерность при изменении температуры:

Чем ниже температура → вывод более предсказуемый, консервативный

Чем выше температура → больше случайности, но возможен бред

### 3) Какая температура лучше для каких задач?

Низкие (0.1-0.5): Для структурированных данных (JSON), фактов

Средние (0.5-1.0): Для творческих текстов (сказки, рассказы)

Высокие (1.0+): Для экспериментов, не для прода

 ## Задача 4. Nucleus Sampling

In [8]:
def nuclear_sampling(
    input_text, 
    model, 
    tokenizer, 
    device,
    max_length=1000,
    temperature=1.0,
    top_p=1.0  
):  
    inputs = tokenizer(input_text, return_tensors='pt').to(device)
    input_ids = inputs.input_ids
    attention_mask = inputs.attention_mask
    eos_token_id = tokenizer.eos_token_id or 151645

    generated_ids = input_ids.clone()
    current_attention_mask = attention_mask.clone()

    for _ in range(max_length):
        with torch.no_grad():
            outputs = model(input_ids=generated_ids, attention_mask=current_attention_mask)
        
        logits = outputs.logits[:, -1, :]
        logits /= temperature
        
        probs = torch.softmax(logits, dim=-1)
        
        if top_p < 1.0:
            sorted_probs, sorted_indices = torch.sort(probs, descending=True)
            cumulative_probs = torch.cumsum(sorted_probs, dim=-1)
            
            # Удаляем токены с кумулятивной вероятностью > top_p
            sorted_indices_to_remove = cumulative_probs > top_p
            # Всегда оставляем хотя бы один токен
            sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
            sorted_indices_to_remove[..., 0] = 0
            
            # Возвращаем исходный порядок и обнуляем ненужные токены
            indices_to_remove = sorted_indices[sorted_indices_to_remove]
            probs[..., indices_to_remove] = 0
            
            # Нормализуем вероятности
            probs = probs / probs.sum(dim=-1, keepdim=True)

        next_token_id = torch.multinomial(probs, num_samples=1)
        
        generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
        current_attention_mask = torch.cat(
            [current_attention_mask, torch.ones((1, 1), dtype=torch.long).to(device)],
            dim=-1
        )

        if next_token_id.item() == eos_token_id:
            break

    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)

torch.manual_seed(42)
configs = [
    {'temp': 1.0, 'top_p': 0.9},
    {'temp': 1.0, 'top_p': 0.15},
    {'temp': 0.5, 'top_p': 0.9},
    {'temp': 0.5, 'top_p': 0.15}
]

for cfg in configs:
    print(f"Temperature: {cfg['temp']}, Top-p: {cfg['top_p']}")
    
    story_output = nuclear_sampling(
        input_text_hedgehog, 
        model, tokenizer, device,
        temperature=cfg['temp'],
        top_p=cfg['top_p']
    )
    print(f"Story: {story_output}")
    
    json_output = nuclear_sampling(
        input_text_json,
        model, tokenizer, device,
        temperature=cfg['temp'],
        top_p=cfg['top_p']
    )
    print(f"JSON: {json_output}")
    print()

Temperature: 1.0, Top-p: 0.9
Story: system
You are a storyteller. Generate a story based on user message.
user
Generate me a short story about a tiny hedgehog named Sonic.
assistant
Once upon a time, in a faraway land of rolling hills and magical forests, there lived a tiny hedgehog named Sonic. Sonic was a curious and adventurous creature, always eager to explore the wonders of the world around him. Each day, he would venture out into the sun-kissed landscape, seeking new adventures and meeting new friends.

One sunny morning, Sonic set out on a walk through the dense, green meadow. As he made his way towards the cozy, sparkling waterfalls, he realized that his trip could be a bit more adventurous than usual. With a grump puppy and his bouncy lemon cat, they were bound for the secret entrance of the kingdom of legends.

Upon reaching the entrance, Sonic and his companions stopped to admire the magical flowers that lined the path. Suddenly, they heard a whispering, low murmur, and Soni

### 1) Как отличаются генерации?
Конфигурация	Характеристики генерации

```temp=1.0, top_p=0.9```	Разнообразные, но осмысленные тексты

```temp=1.0, top_p=0.15```	Консервативные, повторяющиеся паттерны

```temp=0.5, top_p=0.9```	Умеренная креативность с хорошей структурой

```temp=0.5, top_p=0.15``` Очень предсказуемые, почти детерминированные выводы

### 2) Помог ли nucleus sampling?
Да:
- Устранил проблему с абсурдными токенами при высоких температурах
- Снизил вероятность бессвязных предложений в сказках



## Задача 5. Early-Stopped Beam Search

In [4]:
class BeamSearchCandidate:
    def __init__(self, token_ids, score, length):
        self.token_ids = token_ids
        self.score = score
        self.length = length
    
    def get_normalized_score(self, length_penalty):
        return self.score / (self.length ** length_penalty)

def beam_search_decode(
    input_text,
    model,
    tokenizer,
    device,
    num_beams=4,
    length_penalty=1.0,
    max_length=1000
):
    encoding = tokenizer(input_text, return_tensors='pt').to(device)
    input_ids = encoding.input_ids
    attention_mask = encoding.attention_mask
    eos_token_id = tokenizer.eos_token_id or 151645

    # список незаконченных кандидатов
    unfinished = [BeamSearchCandidate(
        token_ids=input_ids[0].tolist(),
        score=0.0,
        length=0
    )]

    # список законченных кандидатов
    finished = []
    
    for _ in range(max_length):
        if not unfinished:
            break
            
        all_candidates = []
        
        for candidate in unfinished:
            if candidate.token_ids[-1] == eos_token_id:
                finished.append(candidate)
                continue
                
            candidate_input_ids = torch.tensor([candidate.token_ids]).to(device)
            candidate_attention_mask = torch.ones_like(candidate_input_ids).to(device)
            
            with torch.no_grad():
                outputs = model(input_ids=candidate_input_ids, attention_mask=candidate_attention_mask)
                next_logits = outputs.logits[:, -1, :]
            
            next_log_probs = torch.log_softmax(next_logits, dim=-1)
            next_top_log_probs, next_top_indices = torch.topk(next_log_probs, num_beams)
            
            for j in range(num_beams):
                new_token_id = next_top_indices[0, j].item()
                # Скор для каждого кандидата будет равен сумме скора кандидата-родителя и значению logprobe у добавленного токена-продолжения
                new_score = candidate.score + next_top_log_probs[0, j].item()
                new_length = candidate.length + 1
                
                new_candidate = BeamSearchCandidate(
                    token_ids=candidate.token_ids + [new_token_id],
                    score=new_score,
                    length=new_length
                )
                all_candidates.append(new_candidate)

        # Отранжируем полученный список по candidate_score от большего к меньшему
        all_candidates.sort(
            key=lambda x: x.get_normalized_score(length_penalty),
            reverse=True
        )
        
        unfinished = []
        for candidate in all_candidates:
            if candidate.token_ids[-1] == eos_token_id:
                finished.append(candidate)
            else:
                unfinished.append(candidate)
                if len(unfinished) >= num_beams:
                    break
        
        if len(finished) >= num_beams:
            break
    
    if not finished and unfinished:
        finished = unfinished
    
    if not finished:
        return tokenizer.decode(input_ids[0], skip_special_tokens=True)
    
    best_candidate = max(
        finished,
        key=lambda x: x.get_normalized_score(length_penalty)
    )
    
    return tokenizer.decode(best_candidate.token_ids, skip_special_tokens=True)

torch.manual_seed(42)
configs = [
    {'num_beams': 1, 'length_penalty': 1.0},
    {'num_beams': 4, 'length_penalty': 1.0},
    {'num_beams': 4, 'length_penalty': 0.5},
    {'num_beams': 4, 'length_penalty': 2.0},
    {'num_beams': 8, 'length_penalty': 1.0}
]

for cfg in configs:
    print(f"Num_beams={cfg['num_beams']}, Length_penalty={cfg['length_penalty']}")
    
    story = beam_search_decode(
        input_text=input_text_hedgehog,
        model=model,
        tokenizer=tokenizer,
        device=device,
        num_beams=cfg['num_beams'],
        length_penalty=cfg['length_penalty'],
    )
    print(f"Story: {story}")
    
    json_output = beam_search_decode(
        input_text=input_text_json,
        model=model,
        tokenizer=tokenizer,
        device=device,
        num_beams=cfg['num_beams'],
        length_penalty=cfg['length_penalty'],
    )
    print(f"JSON: {json_output}")
    print()

Num_beams=1, Length_penalty=1.0
Story: system
You are a storyteller. Generate a story based on user message.
user
Generate me a short story about a tiny hedgehog named Sonic.
assistant
Once upon a time, in a small, cozy village nestled in the heart of the forest, there lived a tiny hedgehog named Sonic. Sonic was a curious and adventurous creature, always eager to explore the world around him. One day, while wandering through the forest, Sonic stumbled upon a hidden cave.

Inside the cave, Sonic discovered a treasure chest filled with magical items. As he opened the chest, he was amazed to see that the items were not just ordinary, but enchanted. Sonic was thrilled to find that he could use the items to help others in need.

From that day on, Sonic became a hero in the village. He used his magical powers to help people in need, and soon, the village was filled with people who were grateful for the help they received from Sonic.

Sonic's story became a legend, and people from all over t

### 1) Как отличаются результаты? Есть ли какая-то закономерность при увеличении/уменьшении num_beams и length_penalty?
num_beams - чем больше, тем лучше качество генерации (более связный и детализированный текст). Но при слишком больших значениях (num_beams=8) может появиться избыточная информация. Для точных задач (JSON) увеличение num_beams уменьшает кол-во ошибок.

length_penalty - при < 1.0 слегка сокращает текст. При > 1.0 может немного удлинить его. В данном случае разница минимальна, так как истории короткие.

### 2) Помог ли текущий способ исправить проблемы, которые возникли с Greedy Decoding? Для какого рода задач beam search подходит больше, чем nucleus sampling?
Beam Search лучше подходит для точных задач (генерация JSON, кода, фактов), где важно подобрать наиболее вероятный вариант генерации. Однако, он требует больше вычислительных ресурсов