In [1]:
# hide_output
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "gpt2-xl"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/6.43G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [5]:
# hide_output
import pandas as pd

input_txt = "Transformers are the"
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
iterations = []
n_steps = 8
choices_per_step = 5

with torch.no_grad():
    for _ in range(n_steps):
        iteration = dict()
        iteration["Input"] = tokenizer.decode(input_ids[0])
        output = model(input_ids=input_ids)
        # Select logits of the first batch and the last token and apply softmax
        next_token_logits = output.logits[0, -1, :]
        next_token_probs = torch.softmax(next_token_logits, dim=-1)
        sorted_ids = torch.argsort(next_token_probs, dim=-1, descending=True)
        # Store tokens with highest probabilities
        for choice_idx in range(choices_per_step):
            token_id = sorted_ids[choice_idx]
            token_prob = next_token_probs[token_id].cpu().numpy()
            token_choice = (
                f"{tokenizer.decode(token_id)} ({100 * token_prob:.2f}%)"
            )
            iteration[f"Choice {choice_idx+1}"] = token_choice
        # Append predicted next token to input
        input_ids = torch.cat([input_ids, sorted_ids[None, 0, None]], dim=-1)
        iterations.append(iteration)

pd.DataFrame(iterations)

Unnamed: 0,Input,Choice 1,Choice 2,Choice 3,Choice 4,Choice 5
0,Transformers are the,most (8.53%),only (4.96%),best (4.65%),Transformers (4.37%),ultimate (2.16%)
1,Transformers are the most,popular (16.78%),powerful (5.37%),common (4.96%),famous (3.72%),successful (3.20%)
2,Transformers are the most popular,toy (10.63%),toys (7.23%),Transformers (6.60%),of (5.46%),and (3.76%)
3,Transformers are the most popular toy,line (34.38%),in (18.20%),of (11.71%),brand (6.10%),line (2.69%)
4,Transformers are the most popular toy line,in (46.29%),of (15.09%),", (4.94%)",on (4.40%),ever (2.72%)
5,Transformers are the most popular toy line in,the (65.99%),history (12.42%),America (6.91%),Japan (2.44%),North (1.40%)
6,Transformers are the most popular toy line in the,world (69.27%),United (4.55%),history (4.29%),US (4.23%),U (2.30%)
7,Transformers are the most popular toy line in ...,", (39.73%)",. (30.64%),and (9.87%),with (2.32%),today (1.74%)


In [3]:
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output = model.generate(input_ids, max_new_tokens=n_steps, do_sample=False)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Transformer are the most common types of mod. They are


In [25]:
max_length = 128
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)

In [4]:

output_greedy = model.generate(input_ids, max_length=max_length, do_sample=False)
print(tokenizer.decode(output_greedy[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, Davis, and the University of Colorado, Boulder, were conducting a study on the Andean cloud forest, which is home to the rare species of cloud forest trees.


The researchers were surprised to find that the unicorns were able to communicate with each other, and even with humans.


The researchers were surprised to find that the unicorns were able


In [7]:
import torch.nn.functional as F

def log_probs_from_logits(logits, labels):
    logp = F.log_softmax(logits, dim=-1)
    logp_label = torch.gather(logp, 2, labels.unsqueeze(2)).squeeze(-1)
    return logp_label

In [8]:
def sequence_logprob(model, labels, input_len=0):
    with torch.no_grad():
        output = model(labels)
        log_probs = log_probs_from_logits(
            output.logits[:, :-1, :], labels[:, 1:])
        seq_log_prob = torch.sum(log_probs[:, input_len:])
    return seq_log_prob.cpu().numpy()

In [9]:
logp = sequence_logprob(model, output_greedy, input_len=len(input_ids[0]))
print(tokenizer.decode(output_greedy[0]))
print(f"\nlog-prob: {logp:.2f}")

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, Davis, and the University of Colorado, Boulder, were conducting a study on the Andean cloud forest, which is home to the rare species of cloud forest trees.


The researchers were surprised to find that the unicorns were able to communicate with each other, and even with humans.


The researchers were surprised to find that the unicorns were able

log-prob: -165.87


In [None]:
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5,
                             do_sample=False)
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(tokenizer.decode(output_beam[0]))
print(f"\nlog-prob: {logp:.2f}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The discovery of the unicorns was made by a team of scientists from the University of California, Santa Cruz, and the National Geographic Society.


The scientists were conducting a study of the Andes Mountains when they discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English

log-prob: -55.23


In [None]:
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5,
                             do_sample=False, no_repeat_ngram_size=2)
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(tokenizer.decode(output_beam[0]))
print(f"\nlog-prob: {logp:.2f}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The discovery was made by a team of scientists from the University of California, Santa Cruz, and the National Geographic Society.

According to a press release, the scientists were conducting a survey of the area when they came across the herd. They were surprised to find that they were able to converse with the animals in English, even though they had never seen a unicorn in person before. The researchers were

log-prob: -93.12


In the context of **Large Language Models (LLMs)** like GPT, **temperature** is a parameter that controls the randomness of the model’s predictions. It plays a key role in balancing the creativity and predictability of the output. Let's break down the effects of temperature in detail:

### 1. **Definition of Temperature:**
Temperature is a scaling factor applied to the logits (raw model predictions) before they are passed through the softmax function, which converts them into probabilities. The temperature affects the probability distribution over the possible next tokens in a sequence.

- **Low Temperature (< 1)**: Makes the model more conservative, narrowing down the range of possible outputs by making the highest probability tokens more dominant.
- **High Temperature (> 1)**: Makes the model more creative and unpredictable by flattening the probability distribution, giving more chances to less likely tokens.

Mathematically, it is applied as:
\[
p_i = \frac{e^{logit_i / T}}{\sum_j e^{logit_j / T}}
\]
Where `T` is the temperature. As `T` increases, the probabilities become more evenly distributed, leading to more diverse outputs.

### 2. **Effects of Temperature:**

#### A. **Low Temperature (e.g., 0.1 to 0.5)**:
   - **Deterministic Output**: The model becomes more focused on choosing tokens with the highest probability. This makes the output more deterministic and predictable.
   - **Less Creativity**: The model sticks closely to the most probable continuation of the sequence, which often results in coherent, fact-based, or more “conservative” outputs. This is useful in scenarios where accuracy is more important than creativity, such as summarization or answering factual questions.
   - **Repetitive Outputs**: If set too low, the model may become overly deterministic, leading to repetitive or boring responses. For instance, it might repeatedly choose similar phrases or get stuck in loops.

   **Example** (low temperature of 0.2):
   - Input: *The cat sat on the*...
   - Output: *mat.*

#### B. **Medium Temperature (e.g., 0.7 to 1.0)**:
   - **Balanced Behavior**: This is often the default setting in many models, as it strikes a balance between creativity and coherence. The model generates responses that are varied, but still grounded in the most likely outputs.
   - **Natural Language Generation**: At this range, the outputs are diverse without sacrificing coherence, making it a good choice for conversational agents, creative writing, or tasks where both accuracy and fluency are important.
   - **Controlled Creativity**: The model is not too rigid but also not too random, allowing for outputs that sound natural without being overly surprising.

   **Example** (temperature of 0.8):
   - Input: *The cat sat on the*...
   - Output: *sofa, basking in the warm sunlight.*

#### C. **High Temperature (e.g., 1.2 to 2.0+)**:
   - **Increased Creativity**: The model starts to explore less likely options more often, leading to more creative, unexpected, or unusual outputs.
   - **Less Predictable**: The outputs become less focused on the most probable tokens and more on less likely ones. This can result in imaginative and creative continuations, but also potentially incoherent or nonsensical text.
   - **Risk of Gibberish**: If the temperature is set too high, the model may produce random or nonsensical results because the probability distribution becomes almost uniform, making it difficult for the model to prioritize sensible continuations.

   **Example** (high temperature of 1.5):
   - Input: *The cat sat on the*...
   - Output: *spaceship and meowed loudly as it prepared to launch.*

### 3. **Use Cases for Different Temperature Settings:**

#### Low Temperature (≤ 0.5):
   - **Technical or Scientific Texts**: Where accuracy and clarity are more important than creativity.
   - **Summarization**: Where the goal is to distill information without introducing variation or unexpected information.
   - **Factual Question-Answering**: When you want the model to stick closely to known facts.

#### Medium Temperature (0.7 to 1.0):
   - **Conversational Agents**: Where a balance between creativity and coherence is needed.
   - **Natural Language Generation**: For general-purpose generation tasks where outputs need to be fluent, but not overly deterministic.
   - **Storytelling**: When generating stories or creative text, medium temperature can allow for variability without losing coherence.

#### High Temperature (> 1.0):
   - **Creative Writing**: To encourage novel and diverse ideas for poetry, fiction, or other creative content.
   - **Brainstorming**: Where unusual or out-of-the-box ideas are welcome.
   - **Artwork Descriptions**: For generating unique or artistic language, where unpredictability can lead to interesting descriptions.

### 4. **Practical Example of Temperature Variation:**
Let’s see how a model’s response might change with different temperatures given the same input prompt:

**Prompt:** *"The wizard waved his wand, and suddenly..."*

- **Low temperature (0.2)**:
  *"...a bright light filled the room, revealing a hidden door in the wall."*
  
  (The continuation is logical and predictable.)

- **Medium temperature (0.8)**:
  *"...a whirlwind of colors erupted from the tip, spiraling outwards in a dance of magic."*
  
  (More creative and descriptive, but still coherent.)

- **High temperature (1.5)**:
  *"...a swarm of singing fish appeared, floating through the air, reciting poetry in ancient tongues."*
  
  (Unpredictable and imaginative, but possibly less coherent.)

### 5. **Summary:**
- **Low Temperature** leads to conservative, predictable, and often repetitive outputs.
- **High Temperature** introduces more creativity and randomness but can risk producing nonsensical or incoherent text.
- **Medium Temperature** is generally a balanced choice for coherent but varied outputs.

Choosing the right temperature depends on your use case—whether you need factual precision, natural conversation, or creative exploration.

In [12]:
output_temp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    temperature=2.0,
    top_k=0
)

print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


To participateling ones prayer with Prof Nazadows longstanding story rigged was realizing Euro Tab UCLA costumes saw till beginangering respondents/ bechige corner supplies weakening sun scientific debate urgently faced filmmakers no storytelling punishments virgin Jane chest verbs whites meanings welding Solutions portal need Flint ftutralicksonWould consider urgency200Gilisc resbians Canterbury trustee PastorMart Card Conferenceanti standards pilotscrew lifestyles shirtmen 500Samsung 261 smack Newwind PAC


In [13]:
output_temp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    temperature=0.5,
    top_k=0
)

print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers from the University of Melbourne and the University of Oxford in the UK had been studying the Andes Mountains, in South America, for several years.


The team had been studying the Andes Mountains, in South America, for several years. The researchers had been studying the Andes Mountains, in South America, for several years.

The team had been studying the Andes Mountains,


In [16]:
output_temp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    temperature=0.8,
    top_k=0,

)

print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


"We were astounded to be confronted with a herd of unicorns," said Sergio Gomez-Llamas, a biologist at the National Autonomous University of Mexico. "Never before have we observed such a thing."


The scientists said that its MacDonaldian origin meant that the unicorns could be the first domesticated animals to have been created by man in the modern world.


"It


In [27]:
output_temp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    temperature=0.5,
    top_k=50,

)

print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers were surprised to find the herd of unicorns, which is estimated to number around 150, in the remote, mountainous area of the Peruvian Andes. In fact, the unicorns were able to communicate and interact with humans, which is a first for the species.


The researchers, led by Dr. Michael Hofreiter from the University of California, Riverside, were able to record


In [30]:
output_temp = model.generate(
    input_ids,
    max_length=max_length,
    do_sample=True,
    temperature=0.5,
    top_p=0.90,
    top_k=0,


)

print(tokenizer.decode(output_temp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The research team, led by Dr. Carlos Quiroga of the University of Barcelona, was able to confirm that the unicorns were indeed real and not some kind of hallucination. The researchers were able to observe the unicorns using a high-speed camera and recorded the sounds of the animals using a microphone.


The researchers were able to track the movements of the animals using GPS and were
