# TEXT GENERATION USING PRETRAINED TRANSFORMER-BASED LANGUAGE MODEL

_**Using a pretrained transformer-based language model with Hugging Face Transformer library to generate a new story-like text against a prompt like "Once upon a time...".**_

This experiment uses a GPT2 - a small generative pretrained transformer based small language model from OpenAI (https://huggingface.co/openai-community/gpt2). It has just 124 million parameters - the lowest number of parameters in GPT-2 family of models and makes inferences tractable over commodity computers.

## Importing Packages

In [None]:
from transformers import pipeline, set_seed
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel

2025-12-06 16:40:05.250267: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-06 16:40:05.250705: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-12-06 16:40:05.290765: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-06 16:40:06.264460: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To tur

In [None]:
# Sets the prompt text for model to follow to continue generating the content
prompt = "Once upon a time in China"

## Text Generation using Pipeline (Inference API)

In [None]:
# Sets seed in random, numpy, tf and/or torch (if installed)
set_seed(42)

# Initializes an instance of pipeline - an abstraction of all other pipelines
generator_pipeline = pipeline(
    task="text-generation",     # Task (such as "summarization", "question-answering") to return pipeline for
    model="gpt2"                # The model that will be used by the pipeline to make predictions
    )

Device set to use cpu


In [None]:
# Generates content (sequences) based on the prompt provided
# NOTE: The following steps may take few minutes to complete

generated_sequences = generator_pipeline(
    text_inputs = prompt,       # Input prompt for generated sequence to follow in semantic way
    max_length=200,             # Controls the maximum number of tokens in the generated output including paddings for shorter outputs, if any
    truncation=True,            # Enables truncation beyond mentioned length
    num_return_sequences=5      # Total number of generated sequences to return
    )

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In [5]:
# Prints the pipeline generated sequences

for idx, sequence in enumerate(generated_sequences):
    print(idx+1, "\t" + sequence['generated_text'])
    print('-' * 80, "\n")

1 	Once upon a time in China, you can be sure that you will not be treated like a beggar.

You can't just go around the country and buy a car and then go away. You must be accompanied by a person who knows what you can do.

In general, you should stay in a hotel until you are in a decent place to live, for the time being. The longer you stay, the more likely you are to have problems. You may even go to bed in a hotel, when you are not in a good mood.

If you are unhappy in the city, you need to get out of the country, and find a place to live.

In general, you should stay in a hotel until you are in a decent place to live, for the time being. The longer you stay, the more likely you are to have problems. You may even go to bed in a hotel, when you are not in a good mood. If you are in poverty, you need to get out of the city, and find a place to live.

A few things you should take into consideration when deciding to go to the city:

If you are in a city where the government has decided

## Text Generation using Model-Specific Classes

In [None]:
# Initializes GPT2 model transformer with a language modeling head 
# (linear layer with weights tied to the input embeddings) on top
model = TFGPT2LMHeadModel.from_pretrained(
    "gpt2",   # Name of the pretrained model (or path to the saved model)
    use_safetensors=False                   # Recommended to be set to `True`, but is set to `False` for Keras 2.x compatibility in Keras 3.x installation
    )

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [9]:
# Initializes model-specific tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

In [10]:
# Tokenizes and encodes the prompt text
encoded_prompt = tokenizer.encode(
    prompt,                     # Prompt text to be encoded
    add_special_tokens=False,   # Not to add special tokens such as 'SOS' and 'EOS' automatically
    return_tensors="tf"         # Returns TensorFlow tf.constant objects
    )

# Prints the encoded prompt
print(encoded_prompt)

tf.Tensor([[7454 2402  257  640  287 2807]], shape=(1, 6), dtype=int32)


In [11]:
# Generates sequences
generated_sequences = model.generate(
    input_ids = encoded_prompt,     # Prompt for the generated sequence to follow
    do_sample=True,                 # Enables sampling the next token from the distribution over the 
                                    # vocabulary instead of choosing the most likely next token
    max_length = 200,               # Restricts the length of the generated sequence
    num_return_sequences = 5        # Total number of sequences to be generated
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In [12]:
# [OPTIONAL] Prints the encodings of one of the generated sequences [for reference]
generated_sequences[0]

<tf.Tensor: shape=(200,), dtype=int32, numpy=
array([ 7454,  2402,   257,   640,   287,  2807,    25,   198,   198,
           1,   818,  2805,    11,  1946,    11,   379,   257,  1171,
       10542, 12007,   416,   262,  3999, 14884,  3615,    11, 10826,
         286,   262, 14884,  3615,   286,  2807,    11,   511,  5694,
        4606,    11,  1866,   286,   262,  7793,    65,  1434,   290,
        3756,  1866,   286,   262,  4380,   338, 29235,  5407,   357,
       45710,     8,  9141,   257,  2276,    12, 30280,  1181, 10474,
         287,   262, 11618, 23200,   286,   406,  1530,   272,   284,
       15402,   257,   366, 10057,  1512, 25070,     1,   329,   262,
        7989,   290,  1099,   286,   705, 42017, 12148,     6,   355,
        5625,   284,  1181,  2450,   287,  2807,   338,  2316,   351,
        2869,    11,  4505,   290,  2520,  4969,  1399,   464,  2766,
         286,   777,  2678,   788, 26443,   262,   685,    34,  4805,
          60,   355,   705, 41131,   290, 29

In [None]:
# Decodes all the generated sequences
for idx, sequence in enumerate(generated_sequences):
    text = tokenizer.decode(
        sequence,                           # List of tokenized input ids.
        clean_up_tokenization_spaces=True   # Removes space artifacts inserted while encoding the sequence, e.g, "state-of-the-art" gets encoded as "state - of - the - art".
        )
    print(idx+1, "\t" + text)
    print("-" * 80, "\n")

1 	Once upon a time in China:

"In March, 2014, at a public ceremony hosted by the Chinese Communist Party, representatives of the Communist Party of China, their Central Committee, members of the Politburo and leading members of the People's Liberation Army (PLA) attended a general-organized state assembly in the Beijing suburb of Lushan to condemn a "systematic disregard" for the principle and law of 'democracy promotion' as applied to state policy in China's relations with Japan, Australia and South Korea…The leaders of these countries then denounced the [CPR] as 'racist and divisive'. "When the PLA members of the People's Liberation Army (PLA) announced the party's resolution 'disagreement' with the law of 'democracy promotion' of the People's Republic of China, the PLA members of the Peoples Liberation Army (PLA) reacted with 'insult of them'. "When the government of Japan called for an impeachment of its President, the leaders of
--------------------------------------------------

## Observations

- Smallest of GPT family of transformer based pretrained models with approximately 0.1 billion parameters were used for the above text generation task.

- Both pipeline based and model-specific class based approach were considered to generate text.

- Pipeline approach was as easy as specifying the task of interest and optionally the name of model to build the pipeline that encapsulates all low-level processeses such as tokenizing input (prompt), encoding the tokens, generating new text tokens and then decoding them back to get human-readable text.

- Both the approaches supported not just one but a sequence of generated text.