<a href="https://colab.research.google.com/github/anshupandey/Working_with_Large_Language_models/blob/main/WWL_C9_text_generation_with_llms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation with Open-Source LLMs

This notebook demonstrates how to use several popular open-source Large Language Models (LLMs) to generate text. We will cover the following models:

1. **GPT-2** by OpenAI
2. **GPT-Neo** by EleutherAI
3. **BLOOM** by BigScience
4. **StableLM** by Stability AI


## 1. GPT-2

**Model Source:** OpenAI

**Model Type:** Transformer-based Language Model

**Description:** GPT-2 is a large transformer-based language model trained on a diverse dataset of internet text. It can generate coherent and contextually relevant text based on a given prompt.

In [1]:
from transformers import pipeline, set_seed

def generate_text_with_gpt2(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='openai-community/gpt2')
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs



GPT-2 Output:


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Once upon a time, what seems to you that the lightest person could make a living being with that power, you felt the power to control it."\n\n\nA little later, with the arrival of the angel of death by his side, it'}]


In [2]:
# Example usage
prompt = "Once upon a time"
print("GPT-2 Output:")
print(generate_text_with_gpt2(prompt))

GPT-2 Output:


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Once upon a time, what seems to you that the lightest person could make a living being with that power, you felt the power to control it."\n\n\nA little later, with the arrival of the angel of death by his side, it'}]


## 2. GPT3-finnish-small

**Model Source:** TurkuNLP

**Model Type:** Transformer-based Language Model

**Description:** Generative Pretrained Transformer with 186M parameteres for Finnish.

TurkuNLP Finnish GPT-3-models are a model family of pretrained monolingual GPT-style language models that are based on BLOOM-architecture

In [3]:
from transformers import pipeline, set_seed

def generate_text_with_gpt3(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='TurkuNLP/gpt3-finnish-small')
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

In [5]:
# Example usage
prompt = "Once upon a time"
print("GPT-2 Output:")
print(generate_text_with_gpt3(prompt))

GPT-2 Output:


config.json:   0%|          | 0.00/561 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/743M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/218 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/6.23M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'Once upon a time, but I think I have to be back to the school. I have to say that I have to be back to the school. I have to say that I have to be back to the school. I have to'}]


## 3. GPT-Neo

**Model Source:** EleutherAI

**Model Type:** Transformer-based Language Model

**Description:** GPT-Neo is an open-source equivalent to OpenAI's GPT-3. It is designed to be a large-scale language model capable of generating high-quality text.

In [6]:
def generate_text_with_gptneo(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B')
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

# Example usage
print("\nGPT-Neo Output:")
prompt = "Once upon a time"
print(generate_text_with_gpt3(prompt))


GPT-Neo Output:


config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Once upon a time, a man named Jeff Bezos was told to change his name to Jeffrey “Bezos” Bezos. Once upon a time, an employee at Dunder Mifflin was told to change his name to Darryl �'}]


## 4. BLOOM

**Model Source:** BigScience

**Model Type:** Transformer-based Language Model

**Description:** BLOOM is a large language model developed by the BigScience project. It aims to provide an open and accessible large-scale language model for research and practical applications.

In [7]:
def generate_text_with_bloom(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='bigscience/bloom-560m')
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

# Example usage
print("\nBLOOM Output:")
prompt = "Once upon a time"
print(generate_text_with_bloom(prompt))


BLOOM Output:


config.json:   0%|          | 0.00/693 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'Once upon a time, the\nworld was in a state of confusion. The world was in a state of\ndisorder. The world was in a state of confusion. The world was in a\nstate of confusion. The world was in a'}]


## 5. StableLM

**Model Source:** Stability AI

**Model Type:** Transformer-based Language Model

**Description:** StableLM is an open-source language model developed by Stability AI, designed to generate text with high coherence and relevance.

In [1]:
from transformers import pipeline, set_seed

In [2]:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
def generate_text_with_stablelm(prompt, max_length=50, num_return_sequences=1, seed=42):
    generator = pipeline('text-generation', model='StabilityAI/stablelm-base-alpha-3b').to(device)
    set_seed(seed)
    outputs = generator(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return outputs

# Example usage
print("\nStableLM Output:")
prompt = "Once upon a time"
print(generate_text_with_stablelm(prompt))


StableLM Output:


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## 6. Falcon

**Model Source:** TII UAE

**Model Type:** Transformer-based Language Model

**Description:** Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.

In [None]:
from transformers import pipeline, set_seed

In [1]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tiiuae/falcon-7b", trust_remote_code=True)

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

configuration_falcon.py:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

KeyboardInterrupt: 

## 7. Opt-125M

**Model Source:** Facebook

**Model Type:** Transformer-based Language Model

**Description:** OPT was first introduced in Open Pre-trained Transformer Language Models and first released in metaseq's repository on May 3rd 2022 by Meta AI.

In [None]:
import torch
torch.cuda.is_available()

In [2]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="facebook/opt-125m")

config.json:   0%|          | 0.00/651 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/251M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

In [5]:
pipe("Manila is a city of bustling street, ")

[{'generated_text': "Manila is a city of bustling street,  and it's a city of people.  "}]

## Conclusion

In this notebook, we demonstrated how to use various open-source LLMs to generate text. Each model has unique characteristics and can be chosen based on specific requirements for text generation tasks.