<a href="https://colab.research.google.com/github/anshupandey/Generative-AI-opensource/blob/main/HF_Accessing_pretrained_LLMs_for_text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation with Open-Source LLMs

This notebook demonstrates how to use several popular open-source Large Language Models (LLMs) to generate text. We will cover the following models:

1. **GPT-2** by OpenAI
2. **GPT-Neo** by EleutherAI
3. **BLOOM** by BigScience
4. **StableLM** by Stability AI



## 1. GPT-2

**Model Source:** OpenAI

**Model Type:** Transformer-based Language Model

**Description:** GPT-2 is a large transformer-based language model trained on a diverse dataset of internet text. It can generate coherent and contextually relevant text based on a given prompt.

In [5]:
import torch
torch.cuda.is_available()

False

In [6]:
import torch

# Check if GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used for computation.")
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU instead.")

GPU is not available, using CPU instead.


In [7]:
from transformers import pipeline, set_seed

def generate_text_with_gpt2(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='openai-community/gpt2',device=device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

In [8]:
# Example usage
prompt = "Once upon a time"
print("GPT-2 Output:")
print(generate_text_with_gpt2(prompt))

GPT-2 Output:


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Once upon a time, what seems to you that the lightest person could make a living being with that power, you felt the power to control it."\n\n\nA little later, with the arrival of the angel of death by his side, it'}]


## 2. GPT3-finnish-small

**Model Source:** TurkuNLP

**Model Type:** Transformer-based Language Model

**Description:** Generative Pretrained Transformer with 186M parameteres for Finnish.

TurkuNLP Finnish GPT-3-models are a model family of pretrained monolingual GPT-style language models that are based on BLOOM-architecture

In [9]:
from transformers import pipeline, set_seed

def generate_text_with_gpt3(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='TurkuNLP/gpt3-finnish-small',device=device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

In [10]:
# Example usage
prompt = "Once upon a time"
print("GPT-2 Output:")
print(generate_text_with_gpt3(prompt))

GPT-2 Output:


config.json:   0%|          | 0.00/561 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/743M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/218 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/6.23M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'Once upon a time, but I think I have to be back to the school. I have to say that I have to be back to the school. I have to say that I have to be back to the school. I have to'}]


## 3. GPT-Neo

**Model Source:** EleutherAI

**Model Type:** Transformer-based Language Model

**Description:** GPT-Neo is an open-source equivalent to OpenAI's GPT-3. It is designed to be a large-scale language model capable of generating high-quality text.

In [11]:
def generate_text_with_gptneo(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B',device=device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

# Example usage
print("\nGPT-Neo Output:")
prompt = "Once upon a time"
print(generate_text_with_gpt3(prompt))


GPT-Neo Output:


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'Once upon a time, but I think I have to be back to the school. I have to say that I have to be back to the school. I have to say that I have to be back to the school. I have to'}]


## 4. BLOOM

**Model Source:** BigScience

**Model Type:** Transformer-based Language Model

**Description:** BLOOM is a large language model developed by the BigScience project. It aims to provide an open and accessible large-scale language model for research and practical applications.

In [12]:
def generate_text_with_bloom(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='bigscience/bloom-560m',device=device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

# Example usage
print("\nBLOOM Output:")
prompt = "Once upon a time"
print(generate_text_with_bloom(prompt))


BLOOM Output:


config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'Once upon a time, the\nworld was in a state of confusion. The world was in a state of\ndisorder. The world was in a state of confusion. The world was in a\nstate of confusion. The world was in a'}]


## 5. StableLM

**Model Source:** Stability AI

**Model Type:** Transformer-based Language Model

**Description:** StableLM is an open-source language model developed by Stability AI, designed to generate text with high coherence and relevance.

**Restart Kernel if there is memory error**

In [13]:
from transformers import pipeline, set_seed

In [None]:
def generate_text_with_stablelm(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='StabilityAI/stablelm-base-alpha-3b',device=device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

# Example usage
print("\nStableLM Output:")
prompt = "Once upon a time"
print(generate_text_with_stablelm(prompt))


StableLM Output:


config.json:   0%|          | 0.00/708 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/21.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/10.2G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.66G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## 6. Gemma

**Model Source:** Google

**Model Type:** Transformer-based Language Model

**Description:**
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

In [7]:
# Use a pipeline as a high-level helper
from transformers import pipeline, set_seed

def generate_text_with_gemma(messages, max_length=1000, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='google/gemma-2b-it',trust_remote_code=True,device=device)
    set_seed(seed)
    outputs = generator(messages, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

In [8]:
# Example usage
print("Gemma Output:")
messages = [
    {"role": "user", "content": "Write a poem on city Dubai in 1000 words?"},
]
print(generate_text_with_gemma(messages))

Gemma Output:


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': [{'role': 'user', 'content': 'Write a poem on city Dubai in 1000 words?'}, {'role': 'assistant', 'content': "A mirage in the desert's embrace,\nDubai, a city that never sleeps.\nTowers pierce the sky, a symphony of steel,\nA testament to ambition, a beacon of wealth.\n\nGlittering skyscrapers, reaching for the sky,\nA canvas of glass, where dreams can fly.\nFrom Burj Khalifa's spire, a crown upon the land,\nTo Dubai Fountain's dance, a mesmerizing stand.\n\nA city of contrasts, old and new,\nA blend of tradition and modern lore.\nPalm Jumeirah's crescent, a marvel to behold,\nA haven of luxury, a story to be told.\n\nThe Dubai souks, a vibrant display,\nWhere treasures from every land come to play.\nSpice and spices, textiles so fine,\nA cultural tapestry, a vibrant line.\n\nThe Dubai Mall, a shopper's delight,\nWhere luxury brands ignite the night.\nThe Burj Al Arab, a haven of grace,\nA timeless landmark, a timeless space.\n\nThe Dubai Fountain show, a symphony of

## 7. Opt-125M

**Model Source:** Facebook

**Model Type:** Transformer-based Language Model

**Description:** OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI.

In [None]:
def generate_text_with_opt(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='facebook/opt-125m',trust_remote_code=True,device=device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

In [None]:
generate_text_with_opt("Manila is a city of bustling street, ")

## Conclusion

In this notebook, we demonstrated how to use various open-source LLMs to generate text. Each model has unique characteristics and can be chosen based on specific requirements for text generation tasks.