# Logits

Logits refer to the raw, unnormalized scores output by the GPT model's final layer before applying any kind of normalization such as softmax. These logits represent the model's confidence in each possible token in the vocabulary being the next token in the sequence.

Have a look at an simple example:

In [8]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [12]:
# Load pre-trained model and tokenizer
model_name = 'gpt2'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

In [13]:
# Encode input text
input_text = "I am female. Hello, my name is"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

# Get model outputs
outputs = model(input_ids)

In [14]:
# Get logits
logits = outputs.logits

# Logits shape: (batch_size, sequence_length, vocab_size)
print("logit shape =", logits.shape)

# Get logits for the last token in the input sequence.
# The last token in the input sequence is used to predict the first token of 
# the output sequence
last_token_logits = logits[:, -1, :]
print("last token logits =", last_token_logits)

# Convert logits to probabilities using softmax
probs = torch.softmax(last_token_logits, dim=-1)
print("softmax probs for last token logits =", probs)

logit shape = torch.Size([1, 9, 50257])
last token logits = tensor([[-63.0282, -64.0266, -67.8727,  ..., -73.1530, -72.5244, -64.9166]],
       grad_fn=<SliceBackward0>)
softmax probs for last token logits = tensor([[1.8694e-04, 6.8882e-05, 1.4715e-06,  ..., 7.4914e-09, 1.4046e-08,
         2.8287e-05]], grad_fn=<SoftmaxBackward0>)


In [15]:
# Get the predicted token
predicted_token_id = torch.argmax(probs, dim=-1).item()
predicted_token = tokenizer.decode(predicted_token_id)

print(f"Predicted next token: {predicted_token}")

Predicted next token:  Sarah


: 