## Example of text generation using GPT-J

The source code has been adapted and used from:
1. https://huggingface.co/docs/transformers/model_doc/gptj#generation
2. https://github.com/ymoslem/MT-LM

The below demo is a smple example of how to provide input strings as prompts for the model to generate new text related to the domain of teh input text.

In [2]:
import torch
from transformers import GPTJForCausalLM, AutoTokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", 
                                        revision="float16",
                                        torch_dtype=torch.float16,
                                        cache_dir = "models_cache/",
                                        pad_token_id=tokenizer.eos_token_id,
                                        low_cpu_mem_usage=True)
model.to(device)


In [None]:
def line_count(filename):
    f = open(filename, 'rb')
    print(f)
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.raw.read
    print(read_f)

    buf = read_f(buf_size)
    print(buf)
    while buf:
        lines += buf.count(b'\n')
        buf = read_f(buf_size)
    lines = lines +1
    return lines


target_file = "target.txt"
output_file = "output.txt"

tqdm_total = line_count(target_file)
print("Line count:", tqdm_total)

<_io.BufferedReader name='target.txt'>
<built-in method read of _io.FileIO object at 0x7fa750e421f8>
b'Google on Thursday announced the Pixel 7 and Pixel 7 Pro phones and its first watch, the Pixel Watch, in the New York City event.\nAs they seek to reduce their carbon footprint, power plants around the world are increasingly replacing coal with natural gas, which releases far less carbon into the atmosphere when burned for fuel.\nImage generation uses techniques from a subset of machine learning called deep learning, which has driven most of the advancements in the field of artificial intelligence since a landmark 2012 paper about image classification ignited renewed interest in the technology.'
Line count: 3


In [None]:
from tqdm import tqdm

import nltk.data

sent_splitter = nltk.data.load('tokenizers/punkt/english.pickle')

with open(target_file) as target, open (output_file, "a+") as output:
    output.seek(0)
    output.truncate()

    for line in tqdm(target, total=tqdm_total):
        line = line.strip()
        input_ids = tokenizer(line, return_tensors="pt").input_ids.to("cuda")

        generated_tokens = model.generate(input_ids,
                                          do_sample=True,
                                          max_length=300,
                                          top_k=50,
                                          top_p=0.95,
                                          num_return_sequences=5,
                                          early_stopping=True)

        generated_text_beam = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
        print(generated_text_beam)
        for genenrated_output in generated_text_beam:
            # Split generated text into sentences
            generated_lines = sent_splitter.tokenize(genenrated_output)
            # remove the first sent, which is the original one
            # remove the last sent as it might be truncated
            generated_lines = generated_lines[1:-1]
            # Add new lines between sentences, and exclude too short strings (less than 10 characters)
            generated_lines = "\n".join([text.strip() for text in generated_lines if len(text) > 10]) + "\n"
            print(generated_lines)
            output.write(generated_lines)

 33%|███▎      | 1/3 [00:25<00:50, 25.08s/it]

["Google on Thursday announced the Pixel 7 and Pixel 7 Pro phones and its first watch, the Pixel Watch, in the New York City event. It's an interesting mix that will be interesting to see how they all fit together.\n\nGoogle’s Pixel phones are the most important of the three as the company has not just created its own hardware, but has taken a hardware philosophy that is focused on user experience, AI, and productivity. The Pixel phones also help it get into the wearable segment and become the first major company to launch a standalone smartwatch.\n\nWith the new Google Pixel 7, Google is taking a modular approach with hardware — the company built Pixel Stand instead of relying on an accessory case, and the Pixel 7 is built for modularity by having both wireless and USB charging cables inside the phone. The company has also focused on its software — Android 10 has a refreshed UI, a revamped navigation, Google Play Protect, Doze and App Standby to keep the battery alive.\n\nHere's all y

 67%|██████▋   | 2/3 [00:45<00:22, 22.05s/it]

["As they seek to reduce their carbon footprint, power plants around the world are increasingly replacing coal with natural gas, which releases far less carbon into the atmosphere when burned for fuel.\n\nBut while coal is far more carbon-intensive than natural gas, according to a 2011 U.S. Energy Information Administration study, there are no accurate, nationwide-scale data on the carbon-intensity of other power-plant fuels, like biomass, oil and petroleum coke.\n\nThis means some carbon emissions data that should not be confused with others.\n\nFor instance, there's a misconception among many people and some lawmakers that coal-fired power plants are among the largest sources of greenhouse gases worldwide and responsible for the largest share of U.S. emissions that contribute to climate change. But the Energy Information Administration's data suggest otherwise:\n\nU.S. coal-fired power plants are responsible for just 10% of U.S. greenhouse gas emissions in 2010.\n\nThe same holds tru

100%|██████████| 3/3 [01:04<00:00, 21.47s/it]

['Image generation uses techniques from a subset of machine learning called deep learning, which has driven most of the advancements in the field of artificial intelligence since a landmark 2012 paper about image classification ignited renewed interest in the technology.\n\nTo put it mildly, this technology is amazing. Here’s one example: For the first time, you can now control an animated avatar and make it look like it’s actually you. This is the sort of thing we couldn’t have imagined just a few short years ago.\n\nBut as impressive as it may sound, you’re reading this article with the idea that you’re already familiar with all the technology you need to take full advantage of image generation. It’s true that you probably already have a basic understanding of the basics. However, I don’t want to limit you to the basics. I want you to have access to cutting-edge tools to expand the boundaries of what you can do with images.\n\nWe’re going to use a few of these new tools that have bec


