* Brennan Duff
* Generative AI D01
* 1/28/2026
* Assignment 1, The objective is to observe and report on how the temperature parameter alters the confidence of an LLM and impacts the logical coherence of its output.

In [None]:
# Install & import the needed libraries

!pip install -q transformers torch # tensor operations & model execution

!pip install triton torchao # performance libraries



In [None]:
import torch
import torch.nn.functional as F
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import os
os.environ["TQDM_DISABLE"] = "1" # Disables progress bar widgets error caused by GPT


In [None]:
# Load tokenizer & model

tokenizer = GPT2Tokenizer.from_pretrained("gpt2") # converts text into token IDs
model = GPT2LMHeadModel.from_pretrained("gpt2") # language model
model.eval() # prevent dropout


In [None]:
# Enter your own input text

text = input("Enter a sentence: ") # accepts user input


In [None]:
# The tokenization step typically creates subword tokens, and not necessarily whole words

tokens = tokenizer.encode(text, return_tensors="pt")

print("Token IDs:", tokens.tolist()[0]) # display token ID and string representation
print("Tokens:")
for tid in tokens[0]:
    print(f"{tid.item():>6} → '{tokenizer.decode(tid)}'")


In [None]:
# The embeddings

with torch.no_grad():
    # Token embeddings
    token_embeds = model.transformer.wte(tokens)

    # Positional embeddings
    positions = torch.arange(tokens.size(1)).unsqueeze(0)
    pos_embeds = model.transformer.wpe(positions)

    # final input embeddings passed into transformer
    embeddings = token_embeds + pos_embeds

# (batch_size, sequence_length, embedding_dim)
print("Embedding shape:", embeddings.shape)


In [None]:
# The transformer forward pass ensures that each token now contains contextual information from previous tokens.
# This is the most important step conceptually, because this is where the model goes from isolated words to understanding a sentence.

with torch.no_grad():

    # Send the embedding vectors through all transformer layers (for GPT-2, it is 12 layers)
    outputs = model.transformer(inputs_embeds=embeddings)

    # Each layer, applies the self-attention mechanism and goes through a feed-forward NN
    hidden_states = outputs.last_hidden_state

print("Hidden state shape:", hidden_states.shape)


In [None]:
# Logits for the next token. This gives one score per vocabulary token (~50k tokens)

with torch.no_grad():
    last_hidden = hidden_states[:, -1, :]
    logits = model.lm_head(last_hidden)

print("Logits shape:", logits.shape)


In [None]:
# Softmax → probabilities: this is the actual probability distribution the model uses

probs = F.softmax(logits, dim=-1)

top_probs, top_ids = torch.topk(probs, k=10) # top 10 most likely next tokens

print("Top 10 next-token probabilities:")
for p, tid in zip(top_probs[0], top_ids[0]):
    token = tokenizer.decode(tid)
    print(f"{token!r:>12} : {p.item():.4f}")


In [None]:
# Sampling (temperature + top-k)

#    temperature = 0.2 (Set a low temperature value to generate predictable responses)
#    temperature = 1.5 (Set a high temperature value to generate more random and creative responses)
#    top_k = None (full distribution)

def sample_next_token(logits, temperature=2.0, top_k=40):
    logits = logits / temperature

    if top_k is not None:
        values, indices = torch.topk(logits, top_k) # keep only top_k logits
        probs = F.softmax(values, dim=-1)
        choice = torch.multinomial(probs, 1) # sample from restricted distribution
        return indices[0, choice]
    else:
        probs = F.softmax(logits, dim=-1) # sample from full vocabulary
        return torch.multinomial(probs, 1)

next_token_id = sample_next_token(logits, temperature=2.0, top_k=40) # sample single next token
print("Sampled token:", tokenizer.decode(next_token_id[0]))


In [None]:
# Full loop (generate multiple tokens)

def generate_step_by_step(prompt, steps=20):
    tokens = tokenizer.encode(prompt, return_tensors="pt") # encode initial prompt

    for _ in range(steps):
        with torch.no_grad():
            outputs = model(tokens) # full model forward pass
            logits = outputs.logits[:, -1, :] # logit for the last token
            next_token = sample_next_token(logits, temperature=2.0, top_k=40) # sample next token

        tokens = torch.cat([tokens, next_token], dim=1) # append token to sequence
        print(tokenizer.decode(tokens[0])) # print decoded output

generate_step_by_step(text, steps=20) # generate starting from user input


Experiments:


| Trial |	Temperature (T) |	Predicted Behavior | Model Response | Model Coherence (1-10) |
|-------|-----------------|--------------------|----------------|-----------------|
"The dog ran around the park" |	0.1	| Conservative | "The dog ran around the park, trying to get away from its owners.(\n)An officer arrived on scene and shot the dog"| 10 |
"A dog ran down the sidewalk" |	0.8	| Creative | "A dog ran down the sidewalk and struck a child in the leg and a man in the head.(\n)Authorities said the man"| 9 |
"The dog ran across the field" |	2.0	| Chaos | "The dog ran across the field without warning as they looked for help and stopped him, leaving her trapped underneath two big crates and in"| 6 |

Analysis:

* *Did your model repeat any words or phrases?*
  * Aside from common words (the, a, and, etc.), the model did not repeat any specific word or phrase more than others.

* *Did the model use real words, or did it start outputting random characters and punctuation? Explain how the "Probability Distribution" changed to allow this.*
  * The model did not output random characters or punctuation over the course of my testing. However, if it were to happen, it would have to be because a higher temperature setting, such as 2.0, allows for more low-probability tokens to be selected, as it would flatten the probability distribution.

* *If you were building a medical AI to give prescriptions or advice, which temperature would you use?*
  * I would use a temperature of .1, as it is the most logical and coherent of the 3 tested values.

* *If you were building an AI to write a surrealist dream-journal, which would you use?*
  * I would use a temperature of 2.0, as the higher temperature allows for more surreal and abstract scenarios.