In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

In [3]:
# Load pre-trained GPT2 tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("gpt2-large")
model = AutoModelForCausalLM.from_pretrained("gpt2-large")

In [4]:
# ✅ Step 1: Tokenize a simple input sentence
sentence = "The future of AI is"
inputs = tokenizer(sentence, return_tensors="pt")
print("🔹 Token IDs:", inputs["input_ids"])
print("🔹 Tokenized Words:", [tokenizer.decode([token]) for token in inputs["input_ids"][0]])

🔹 Token IDs: tensor([[ 464, 2003,  286, 9552,  318]])
🔹 Tokenized Words: ['The', ' future', ' of', ' AI', ' is']


In [5]:
# ✅ Step 2: Decode the input IDs back to text
decoded = tokenizer.decode(inputs["input_ids"][0])
print("\n🔁 Decoded Sentence:", decoded)


🔁 Decoded Sentence: The future of AI is


In [6]:
# ✅ Step 3: Generate text from the input prompt
output = model.generate(
    inputs["input_ids"],
    max_length=50,
    do_sample=True,
    temperature=1.2,
    top_k=50,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id
)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In [7]:
# ✅ Step 4: Decode generated output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("\n🤖 Generated Text:\n", generated_text)


🤖 Generated Text:
 The future of AI is going to be heavily linked to whether we can develop new ways to measure intelligence in artificial systems.

"It's one thing to make sure the systems are intelligent," says Professor Steve Kawa, Professor of Artificial Intelligence in
