# Advanced Text Generation Techniques with Transformers 🚀

In this advanced lab, we dive deeper into efficient text generation techniques using Transformers. We'll explore two batching strategies: normal batching and sorted batching, to optimize our text generation tasks.

**Objectives:**
- 🧰 Implement advanced text generation functions.
- 📊 Compare normal vs. sorted batching efficiency.
- ⏱ Measure and understand execution time improvements.


## Setup and Imports 🛠

Before diving into the code, let's ensure we have all the necessary tools:

- `transformers` & `datasets`: For our model and data.
- `torch`: For tensor operations.
- `tqdm`: For progress tracking.
- `contextlib` & `time`: For measuring execution time.


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
import torch
from tqdm.auto import tqdm
from contextlib import contextmanager
import time


## Time Tracking Utility ⏱

To compare the efficiency of our batching strategies, we'll use a context manager to track the execution time:

- **Purpose:** Measure the time it takes to execute a block of code.
- **Output:** Prints the execution time in seconds.


In [None]:
@contextmanager
def track_time():
    start = time.time()
    yield
    end = time.time()
    print(f"Execution time: {end - start:.2f}s")

## Model and Tokenizer Setup 🧩
 
Setting up our model and tokenizer is crucial for text generation:

- **Model:** "TheFuzzyScientist/diabloGPT_open-instruct" for instructive text generation.
- **Tokenizer:** "microsoft/DialoGPT-medium" with padding adjusted.
- **Device:** Utilize CUDA for GPU acceleration.


In [None]:
model = AutoModelForCausalLM.from_pretrained("TheFuzzyScientist/diabloGPT_open-instruct").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium", padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

## Dataset Preparation and Initial Tokenization 📚

We'll work with a sample dataset for text generation tasks:

- **Dataset:** "hakurei/open-instruct-v1" converted to a pandas DataFrame.
- **Initial Tokenization:** Convert a sample of prompts to input IDs.


In [None]:
dataset = load_dataset("hakurei/open-instruct-v1", split="train")
dataset = dataset.to_pandas()

prompts = dataset["instruction"].sample(4).tolist()
inputs = tokenizer(prompts, padding=True)["input_ids"]

# print('\n\n'.join(tokenizer.batch_decode(inputs)))
print("\n\n".join(tokenizer.batch_decode(inputs)).replace(tokenizer.eos_token, "[PAD]"))


## Normal Batching Method 🔄

Normal batching processes prompts in fixed-size batches:

- **Chunker Function:** Splits our data into specified batch sizes.
- **Batch Generation:** Generates text for each batch of tokens.
- **Predict Function:** Orchestrates the batching and generation process.


In [None]:
# Normal batching
def chunker(seq, size):
    return (seq[pos : pos + size] for pos in range(0, len(seq), size))


def batch_generate_tokens(tokens):
    outputs = model.generate(tokens, max_new_tokens=64, pad_token_id=tokenizer.eos_token_id)

    return tokenizer.batch_decode(outputs, skip_special_tokens=True)


def predict_batch(prompts, batch_size):
    inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True, max_length=512)["input_ids"]

    for batch in chunker(inputs, batch_size):
        yield batch_generate_tokens(batch.to(model.device))

## Predicting with Normal Batching  ⚡

Let's generate text using the normal batching method:

- **Process:** Tokenize prompts, generate text in batches, and track execution time.
- **Observation:** Note the time it takes to process 3000 prompts.


In [None]:
prompts = dataset["instruction"].sample(3000).tolist()

with track_time():
    for batch_prediction in tqdm(predict_batch(prompts, 32)):
        print(len(batch_prediction))
        
# Execution time: 137.19s

## Sorted Batching Method  🔢

Sorted batching aims to improve efficiency by grouping prompts of similar lengths:

- **Strategy:** Sort prompts by length and batch accordingly.
- **Benefits:** Reduces padding, potentially speeding up computation.


In [None]:
# Sorted Batching
def predict_sorted_batches(prompts, max_batch_size):
    inputs = tokenizer(prompts, padding=False, truncation=True, max_length=512)["input_ids"]

    sorted_tokens = sorted(inputs, key=len)
    sorted_batches = {}
    for sorted_input in sorted_tokens:
        if not len(sorted_input):
            continue

        length = len(sorted_input)
        if length not in sorted_batches:
            sorted_batches[length] = []

        sorted_batches[length].append(sorted_input)

    for length, sorted_batch in sorted_batches.items():
        for batch in chunker(sorted_batch, max_batch_size):
            tensor_batch = torch.tensor(batch).to(model.device)
            yield batch_generate_tokens(tensor_batch)


## Predicting with Sorted Batching 🚀

Applying the sorted batching method:

- **Execution:** Similar to normal batching but with sorted prompts.
- **Comparison:** Observe the execution time difference from normal batching.


In [None]:
with track_time():
    for batch_prediction in tqdm(predict_sorted_batches(prompts, 32)):
        print(len(batch_prediction))

# Execution time: 72.74s

# Conclusion and Next Steps 🌈

Through this lab, we've explored advanced batching techniques for text generation with Transformers. We saw firsthand how sorted batching can significantly reduce execution time compared to normal batching.

**Encouraged Next Steps:**
- 🤖 Experiment with different models and datasets.
- 📐 Adjust batch sizes and observe the impact on performance.
- 🔄 Explore other optimization techniques for text generation.
