1.  Write a code to generate a random sentence using probabilistic modeling
(Markov Chain). Use the sentence "The cat is on the mat" as an example

In [5]:
import random

# Step 1: Define the input sentence
sentence = "The cat is on the mat"

# Step 2: Tokenize the sentence into words
words = sentence.split()

# Step 3: Build the transition matrix (Markov Chain)
transition_dict = {}

# Iterate through the words and create a dictionary of word transitions
for i in range(len(words) - 1):
    current_word = words[i]
    next_word = words[i + 1]

    # If current_word is not in the transition_dict, add it with an empty list
    if current_word not in transition_dict:
        transition_dict[current_word] = []

    # Append the next word to the list of transitions for the current_word
    transition_dict[current_word].append(next_word)

# Step 4: Define a function to generate a random sentence based on the transition dictionary
def generate_sentence(start_word, num_words=10):
    current_word = start_word
    generated_words = [current_word]

    for _ in range(num_words - 1):
        # Get the possible next words from the transition dictionary
        if current_word not in transition_dict:
            break  # If there are no transitions for the current word, stop generating
        next_word = random.choice(transition_dict[current_word])  # Randomly pick a next word
        generated_words.append(next_word)
        current_word = next_word  # Move to the next word in the chain

    return ' '.join(generated_words)

# Step 5: Generate a random sentence starting from 'The'
start_word = "The"
generated_sentence = generate_sentence(start_word)
print("Generated Sentence:", generated_sentence)


Generated Sentence: The cat is on the mat


2.  Build a simple Autoencoder model using Keras to learn a compressed
representation of a given sentence. Use a dataset of your choice

In [1]:
import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, LSTM, RepeatVector
from tensorflow.keras import regularizers

# Step 1: Load the dataset
max_words = 10000  # Number of words to consider in the dataset
maxlen = 100  # Maximum length of the sequences

# Load IMDB data (only train data)
(x_train, _), (_, _) = imdb.load_data(num_words=max_words)

# Pad sequences to ensure uniform length
x_train = pad_sequences(x_train, maxlen=maxlen)

# Step 2: Define the Autoencoder model architecture
input_dim = maxlen  # Input dimension: length of the padded sentences
embedding_dim = 128  # Embedding dimension for the input data
latent_dim = 32  # Dimensionality of the compressed representation

# Encoder
inputs = Input(shape=(input_dim,))
x = Embedding(input_dim=max_words, output_dim=embedding_dim, input_length=input_dim)(inputs)
x = LSTM(latent_dim, activation='relu')(x)

# Bottleneck (compressed representation)
encoded = Dense(latent_dim, activation='relu')(x)

# Decoder
x = RepeatVector(input_dim)(encoded)  # Repeat the compressed vector
x = LSTM(embedding_dim, return_sequences=True)(x)
x = Dense(max_words, activation='softmax')(x)

# Full Autoencoder Model
autoencoder = Model(inputs, x)

# Step 3: Compile and train the Autoencoder model
autoencoder.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Training the autoencoder (only on input data as output)
autoencoder.fit(x_train, np.expand_dims(x_train, -1), epochs=3, batch_size=64)

# Step 4: Use the trained model to generate compressed representations
encoder_model = Model(inputs, encoded)  # Extract encoder part for compression

# Generate compressed representation for the first sentence
compressed_representation = encoder_model.predict(x_train[:1])

print("Compressed representation of the first sentence: ", compressed_representation)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step




Epoch 1/3
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 176ms/step - accuracy: 0.0473 - loss: 6.9585
Epoch 2/3
[1m 74/391[0m [32m━━━[0m[37m━━━━━━━━━━━━━━━━━[0m [1m56s[0m 177ms/step - accuracy: 0.0523 - loss: 6.3595

KeyboardInterrupt: 

3. Use the Hugging Face transformers library to fine-tune a pre-trained GPT-2
model on a custom text data and generate text

In [4]:
!pip install transformers  datasets torch sentencepiece



In [5]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, TextDataset, DataCollatorForLanguage
from datasets import load_dataset


# Step 1: Load your custom dataset
# Assuming your custom dataset is in a text file (text_data.txt)
#

# For simplicity, we'll assume each line in 'text_data.txt' is a training instance
dataset = load_dataset("text", data_files={'text':"text_data.txt"})


# step 2 : load the pre- trained model and tokenizer

model_name  = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)


# Step 3 : tokenization to the dataset

tokenized_args = TrainingArguments(
    output_dir = "./gpt2-finetuned",
    num_train_epochs=3,        # Number of epochs
    per_device_train_batch_size=4,  # Batch size
    save_steps=10_000,         # Save model every 10k steps
    save_total_limit=2,        # Keep only 2 saved models
    logging_dir='./logs',      # Log directory for training logs
    logging_steps=200,         # Log every 200 steps
    prediction_loss_only=True,
)

# Step 5 : Inialize the trainer

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)

# Step 6 : Train the model

trainer.train()


# step 7 : Save the fine-tuned model

model.save_pretrained("./gpt2-finetuned")
tokenizer.save_pretrained("./gpt2-finetuned")


# Step 8 : Generate text using the fine-tuned model

def generate_text(prompt, max_length=100):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

# Example usage
prompt = "Once upon a time"
generated_text = generate_text(prompt)
print(generated_text)



ImportError: cannot import name 'DataCollatorForLanguage' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/__init__.py)

4.  Implement a text generation model using a simple Recurrent Neural
Network (RNN) in Keras. Train the model on a custom data and generate a
word

In [7]:
!pip install tensorflow




In [8]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import pad_sequences
from tensorflow.keras.optimizers import Adam

# Sample dataset: You can replace this with your custom dataset
text = """
Once upon a time, in a land far, far away, there was a kingdom ruled by a wise king.
The king had many subjects, but he cared most for the well-being of his people.
Every year, the king would gather his advisors to discuss the future of the kingdom.
One day, the king decided to go on an adventure, and he asked his subjects to join him.
"""

# Step 1: Preprocess the text data
# Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1  # Adding 1 for padding token

# Convert the text into sequences of words
sequences = []
for line in text.split('\n'):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        seq_in = token_list[:i]
        seq_out = token_list[i]
        sequences.append(seq_in + [seq_out])

# Step 2: Pad sequences to make them the same length
max_sequence_length = max([len(seq) for seq in sequences])
X = []
y = []

for seq in sequences:
    X.append(seq[:-1])
    y.append(seq[-1])

X = pad_sequences(X, maxlen=max_sequence_length-1)
y = np.array(y)

# Step 3: Define the RNN model
model = Sequential()
model.add(Embedding(total_words, 50, input_length=max_sequence_length-1))
model.add(SimpleRNN(100, activation='relu', return_sequences=False))
model.add(Dense(total_words, activation='softmax'))

# Step 4: Compile and train the model
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.fit(X, y, epochs=100, batch_size=32)

# Step 5: Generate text using the trained model
def generate_text(seed_text, model, tokenizer, max_sequence_length, n_words=50):
    for _ in range(n_words):
        # Tokenize the seed text
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_length-1, padding='pre')

        # Predict the next word
        predicted_probs = model.predict(token_list, verbose=0)
        predicted_word_index = np.argmax(predicted_probs)
        predicted_word = tokenizer.index_word[predicted_word_index]

        # Add the predicted word to the seed text
        seed_text += ' ' + predicted_word

    return seed_text

# Step 6: Test the model by generating text
seed_text = "Once upon a time"
generated_text = generate_text(seed_text, model, tokenizer, max_sequence_length)
print(generated_text)


Epoch 1/100




[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 22ms/step - accuracy: 0.0000e+00 - loss: 3.8933
Epoch 2/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.0312 - loss: 3.8782  
Epoch 3/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.1250 - loss: 3.8627 
Epoch 4/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.1875 - loss: 3.8505 
Epoch 5/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step - accuracy: 0.3021 - loss: 3.8408
Epoch 6/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.3333 - loss: 3.8226 
Epoch 7/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.3333 - loss: 3.8082
Epoch 8/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.3646 - loss: 3.7843 
Epoch 9/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

5.  Write a program to generate a sequence of text using an LSTM-based
model in TensorFlow, trained on a custom data of sentences

In [9]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import pad_sequences
from tensorflow.keras.optimizers import Adam

# Sample custom text dataset (you can replace this with your own dataset)
text = """
Once upon a time, in a land far, far away, there was a kingdom ruled by a wise king.
The king had many subjects, but he cared most for the well-being of his people.
Every year, the king would gather his advisors to discuss the future of the kingdom.
One day, the king decided to go on an adventure, and he asked his subjects to join him.
"""

# Step 1: Preprocess the data
# Tokenize the text data
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1  # Adding 1 for padding token

# Convert the text into sequences of words
sequences = []
for line in text.split('\n'):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        seq_in = token_list[:i]
        seq_out = token_list[i]
        sequences.append(seq_in + [seq_out])

# Step 2: Prepare data for the model
# Max sequence length
max_sequence_length = max([len(seq) for seq in sequences])
X = []
y = []

# Create input (X) and output (y) sequences
for seq in sequences:
    X.append(seq[:-1])  # Input: all but last word
    y.append(seq[-1])   # Output: last word

# Pad sequences to make them the same length
X = pad_sequences(X, maxlen=max_sequence_length-1, padding='pre')
y = np.array(y)

# Step 3: Define the LSTM-based model
model = Sequential()
model.add(Embedding(total_words, 50, input_length=max_sequence_length-1))
model.add(LSTM(100, activation='relu', return_sequences=False))
model.add(Dense(total_words, activation='softmax'))

# Step 4: Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

# Step 5: Train the model
model.fit(X, y, epochs=100, batch_size=32)

# Step 6: Generate text using the trained model
def generate_text(seed_text, model, tokenizer, max_sequence_length, n_words=50):
    for _ in range(n_words):
        # Tokenize the seed text
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_length-1, padding='pre')

        # Predict the next word
        predicted_probs = model.predict(token_list, verbose=0)
        predicted_word_index = np.argmax(predicted_probs)
        predicted_word = tokenizer.index_word[predicted_word_index]

        # Add the predicted word to the seed text
        seed_text += ' ' + predicted_word

    return seed_text

# Step 7: Test the model by generating text
seed_text = "Once upon a time"
generated_text = generate_text(seed_text, model, tokenizer, max_sequence_length)
print(generated_text)


Epoch 1/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 15ms/step - accuracy: 0.0208 - loss: 3.8923
Epoch 2/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.0521 - loss: 3.8888
Epoch 3/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.1250 - loss: 3.8847
Epoch 4/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.1042 - loss: 3.8803
Epoch 5/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.1458 - loss: 3.8762
Epoch 6/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.1354 - loss: 3.8715
Epoch 7/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.1250 - loss: 3.8670
Epoch 8/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.1146 - loss: 3.8602
Epoch 9/100
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

6. Build a program that uses GPT-2 from Hugging Face to generate a story
based on a custom prompt

In [10]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer from Hugging Face
model_name = "gpt2"  # You can use "gpt2-medium", "gpt2-large" for larger models

tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Make sure the model is in evaluation mode
model.eval()

# Function to generate story
def generate_story(prompt, max_length=200):
    # Encode the input prompt text to tokens
    input_ids = tokenizer.encode(prompt, return_tensors='pt')

    # Generate text using GPT-2
    with torch.no_grad():
        output = model.generate(
            input_ids,
            max_length=max_length,  # Maximum length of the generated story
            num_return_sequences=1,  # Generate 1 story
            no_repeat_ngram_size=2,  # Prevent repetition of n-grams
            temperature=0.7,  # Control randomness: lower is more deterministic
            top_p=0.9,  # Use nucleus sampling
            top_k=50,  # Number of top tokens to sample from
            do_sample=True,  # Enable sampling for randomness
            pad_token_id=tokenizer.eos_token_id  # Use EOS token for padding
        )

    # Decode the generated tokens into a human-readable text
    story = tokenizer.decode(output[0], skip_special_tokens=True)

    return story

# Example usage:
prompt = "Once upon a time, in a land far away, there was a kingdom ruled by a wise king"
story = generate_story(prompt)

print("Generated Story:")
print(story)


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated Story:
Once upon a time, in a land far away, there was a kingdom ruled by a wise king. And they came to him, and he was king over all the land. So he said unto him: Behold, my lord, if you will not speak a word against me, I will slay you. Then he did as I commanded him.

Now, let us not forget, for a long time after this, that this king was also the king of Israel. For he had no king, but the man of the sea, who was called the son of Nun, as it were. He had a son named A'ishah, named after the god of war, of a country called Canaan. That man was the God of War. But the Israelites were not able to speak against him because he, when he spoke against them, had the power of killing them. Now A-ishiah was not the father of Aqdas, the Canaanite, nor


7. Write a code to implement a simple text generation model using a GRU
based architecture in Keras

In [12]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout
from tensorflow.keras.optimizers import Adam

# Load the text dataset (Here, we are using a sample text dataset)
text = "This is an example sentence for the text generation model. " * 50  # Example data

# Create a set of unique characters in the text
chars = sorted(set(text))
char_to_index = {char: idx for idx, char in enumerate(chars)}
index_to_char = {idx: char for idx, char in enumerate(chars)}

# Convert text to integer sequences
sequences = []
next_chars = []
seq_length = 40  # Length of each input sequence

for i in range(0, len(text) - seq_length):
    sequences.append(text[i: i + seq_length])
    next_chars.append(text[i + seq_length])

# Vectorize the sequences
X = np.zeros((len(sequences), seq_length, len(chars)), dtype=np.bool_)
y = np.zeros((len(sequences), len(chars)), dtype=np.bool_)

for i, seq in enumerate(sequences):
    for j, char in enumerate(seq):
        X[i, j, char_to_index[char]] = 1
    y[i, char_to_index[next_chars[i]]] = 1

# Define the GRU-based model
model = Sequential()

# GRU layer with 128 units
model.add(GRU(128, input_shape=(seq_length, len(chars)), return_sequences=False))

# Dropout for regularization
model.add(Dropout(0.2))

# Output layer with softmax activation to predict the next character
model.add(Dense(len(chars), activation='softmax'))

# Compile the model
model.compile(optimizer=Adam(), loss='categorical_crossentropy')

# Train the model
model.fit(X, y, batch_size=128, epochs=10, verbose=1)

# Function to generate text using the trained model
def generate_text(model, length, seed_text, temperature=1.0):
    # Generate text starting from seed_text
    generated_text = seed_text
    for _ in range(length):
        # Prepare the input for prediction
        x_pred = np.zeros((1, seq_length, len(chars)))
        for t, char in enumerate(generated_text[-seq_length:]):
            x_pred[0, t, char_to_index[char]] = 1

        # Predict the next character
        preds = model.predict(x_pred, verbose=0)[0]

        # Apply temperature to the predictions (control randomness)
        preds = np.asarray(preds).astype('float64')
        preds = np.log(preds + 1e-7) / temperature
        exp_preds = np.exp(preds)
        preds = exp_preds / np.sum(exp_preds)

        # Sample a character index
        next_index = np.random.choice(len(chars), p=preds)
        next_char = index_to_char[next_index]

        # Append the predicted character to the generated text
        generated_text += next_char

    return generated_text

# Example of text generation
seed_text = "This is an"
generated_text = generate_text(model, length=200, seed_text=seed_text, temperature=1.0)

print("Generated text:")
print(generated_text)


Epoch 1/10


  super().__init__(**kwargs)


[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - loss: 2.9322
Epoch 2/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 2.5903
Epoch 3/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 2.3851
Epoch 4/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - loss: 1.9832
Epoch 5/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 1.3929
Epoch 6/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 0.7961
Epoch 7/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 0.3877
Epoch 8/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 0.1807
Epoch 9/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 0.0932
Epoch 10/10
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 0.0587
Generated text:
T

8.  Create a script to implement GPT-2-based text generation with beam
search decoding to generate text

In [14]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Prepare the model for inference
model.eval()

# Define the generation function with beam search
def generate_text_with_beam_search(prompt, num_beams=5, length_penalty=1.0, max_length=100):
    # Encode the prompt text
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]

    # Generate text using beam search
    output = model.generate(
        input_ids=input_ids,
        num_beams=num_beams,
        length_penalty=length_penalty,
        max_length=max_length,
        early_stopping=True
    )

    # Decode the generated token IDs back into text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text

# Example usage
prompt = "Once upon a time in a land far, far away,"
generated_text = generate_text_with_beam_search(prompt, num_beams=5, length_penalty=1.0, max_length=100)

# Output the generated text
print("Generated text:")
print(generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text:
Once upon a time in a land far, far away, and far away, there was a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a man, a


9.  Implement a text generation script using GPT-2 with a custom temperature
setting for diversity in output text

In [15]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Prepare the model for inference
model.eval()

# Define the text generation function with custom temperature
def generate_text_with_temperature(prompt, temperature=1.0, max_length=100):
    # Encode the prompt text
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]

    # Generate text using a custom temperature setting
    output = model.generate(
        input_ids=input_ids,
        temperature=temperature,  # Custom temperature setting for diversity
        max_length=max_length,
        num_return_sequences=1,  # Generate only one sequence
        no_repeat_ngram_size=2,  # Prevent repeating n-grams
        top_p=0.92,  # Top-p sampling for more diverse output
        top_k=50,  # Top-k sampling (optional, can help improve diversity)
        do_sample=True,  # Enable sampling (required for temperature)
    )

    # Decode the generated token IDs back into text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text

# Example usage
prompt = "In a world where technology and magic coexist,"
generated_text = generate_text_with_temperature(prompt, temperature=0.7, max_length=150)

# Output the generated text
print("Generated text:")
print(generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text:
In a world where technology and magic coexist, there is no shortage of magic to help you take control of the universe.

This is the magical universe of Twilight Sparkle. Her powers are the same as the rest of her friends, but she has special abilities and abilities that give her the power to travel through time, change the world, and transform things in the process. She has been transformed into a unicorn.


Contents show]
. . .


. .. .

 . ..

 (1)
: . ."
,



10.  Create a script to implement temperature sampling with GPT-2,
experimenting with different values to generate creative text

In [16]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Prepare the model for inference
model.eval()

# Define the text generation function with custom temperature
def generate_text_with_temperature(prompt, temperature=1.0, max_length=100):
    # Encode the prompt text
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]

    # Generate text using a custom temperature setting
    output = model.generate(
        input_ids=input_ids,
        temperature=temperature,  # Custom temperature setting for diversity
        max_length=max_length,
        num_return_sequences=1,  # Generate only one sequence
        no_repeat_ngram_size=2,  # Prevent repeating n-grams
        top_p=0.92,  # Top-p sampling for more diverse output
        top_k=50,  # Top-k sampling (optional, can help improve diversity)
        do_sample=True,  # Enable sampling (required for temperature)
    )

    # Decode the generated token IDs back into text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text

# Example usage
prompt = "Once upon a time in a faraway kingdom,"

# Experiment with different temperature values
temperature_values = [0.5, 0.7, 1.0, 1.5]

# Generate and print text for each temperature setting
for temp in temperature_values:
    print(f"--- Generating text with temperature = {temp} ---")
    generated_text = generate_text_with_temperature(prompt, temperature=temp, max_length=150)
    print(generated_text)
    print("\n" + "-"*50 + "\n")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


--- Generating text with temperature = 0.5 ---


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time in a faraway kingdom, the king was a man of the people, and he had a great deal of power, but he was not a king of men, nor was he a ruler of a nation. The king, however, was an emperor of his people.

The king had many things to do with the kingdom. He was the ruler, not the master. His dominion was limited to the land of Canaan. In the midst of all this, he also had the power to grant gifts to his subjects. These gifts were given to those who were worthy of them, to whom the gift was given. Thus, a gift from God was granted to a person who was worthy to receive it. For example, if

--------------------------------------------------

--- Generating text with temperature = 0.7 ---


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time in a faraway kingdom, the most powerful king of the land came down from the skies and commanded the armies of his people. He was the son of a noble man, a great warrior, and a very good man. But he had a terrible curse upon his soul, because he was a woman, who had been cursed by her husband for being so beautiful. She had an unblemished eye, but she was still a beautiful woman. And she had no right to be here with us. So she went to her king, to his father, saying, "My wife is here. We must go and have sex, if we have any of her." And he said, Let's go, then.

The

--------------------------------------------------

--- Generating text with temperature = 1.0 ---


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time in a faraway kingdom, the people of that state are called upon to aid one another and cooperate. Now it was said: If the inhabitants of the land are willing to take their share, that they might never be left behind by the powers of darkness. When the kingdom was once governed by a king who had not forsaken a royal oath, he became king himself, but did not give in to his authority. It was only when the power of his will became known, to which he replied that "The people are free from darkness, and do not seek to break their oaths with the aid of a man whose power is not so large and his might so great."

In the midst of all the rest,

--------------------------------------------------

--- Generating text with temperature = 1.5 ---
Once upon a time in a faraway kingdom, you find a new city, a better place to hide your bad dreams; but a bad king will still try to hold you forever before he'll allow you to walk free. A brave warrior from a world where they had made you th