##Controlled text generation refers to generating text while exerting specific control over certain aspects of the generated output. For example, you might want to control the tone (e.g., formal or informal), style (e.g., academic or conversational), or length of the text. This kind of generation is useful in scenarios like content writing, social media post generation, and creative writing, where a particular style or tone is important.

In this project, I'll implement controlled text generation using GPT-2, where we will control the tone of the generated text by conditioning it with a tone-specific input.

In [1]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

In [2]:
# 1. Loading the pre-trained GPT-2 model and tokenizer
model_name = "gpt2"  # GPT-2 model (can be replaced with a larger model like gpt2-medium)
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

model.eval()  # Set model to evaluation mode


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [3]:
# 2. Generating controlled text based on a given tone or style
def generate_controlled_text(prompt, tone="formal", max_length=150, temperature=0.7, top_p=0.9):
    # Condition the input based on tone (e.g., formal or informal)
    conditioned_prompt = f"Write a {tone} paragraph: {prompt}"
    inputs = tokenizer.encode(conditioned_prompt, return_tensors='pt')

    # Generating text with the model
    with torch.no_grad():
        outputs = model.generate(inputs, max_length=max_length, num_return_sequences=1,
                                 temperature=temperature, top_p=top_p, no_repeat_ngram_size=2)

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text


In [4]:
# 3. Examples of controlled text generation
prompt = "The company's recent product launch"
generated_text_formal = generate_controlled_text(prompt, tone="formal")
generated_text_informal = generate_controlled_text(prompt, tone="informal")

print("Generated Formal Text:")
print(generated_text_formal)

print("\nGenerated Informal Text:")
print(generated_text_informal)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open

Generated Formal Text:
Write a formal paragraph: The company's recent product launch is a good example of how the company is trying to make its products more accessible to the general public.

The company has also been working on a new product that will allow users to use the app to access the web. The new feature will be available in the App Store on July 1.

Generated Informal Text:
Write a informal paragraph: The company's recent product launch was a success.

The company has been in the news recently for its use of a "smart" phone, which is a device that can be used to track your location. The device is called a GPS device, and it's designed to be able to detect your movements and track you. It's also designed for use in a variety of other applications, including medical devices, medical imaging, etc. But the company is not the only company to use the device. In fact, the U.S. government has recently banned the use and use by some companies of GPS devices. And the government is als