# Text Generation

To install the necessary dependencies for this code, run the following command:

```python
!pip install datasets evaluate transformers[torch] rich
```

This command installs the required packages for working with datasets, performing evaluations, using transformers with PyTorch, and utilizing the rich library.

By executing this command, you ensure that all the necessary modules are available for running the code successfully.

Make sure to run this command before running any other code that depends on these packages, to avoid import errors or missing dependencies.

Note that the exclamation mark at the beginning of the command specifies that it should be executed as a shell command rather than a Python statement.

In [None]:
!pip install datasets evaluate transformers[torch] rich



The `transformers` module is imported, which allows for the usage of pre-trained models in natural language processing tasks.
The `pipeline` function from the `transformers` module is imported, which enables the creation of a pipeline of tasks such as text classification or named entity recognition.
The `print` function from the `rich` module is imported, which provides enhanced formatting options when printing output to the console.
The `pprint` function from the `rich.pretty` module is imported, which enables pretty-printing of Python objects with enhanced formatting.

References - https://huggingface.co/blog/how-to-generate

In [None]:
from transformers import pipeline
from rich import print
from rich.pretty import pprint

The code utilizes the GPT-2 language model to generate text.
A pipeline object is created with the 'text-generation' task and the 'gpt2' model.
The generator is then used to generate text by passing the input prompt "Hello, I'm a language model".
The generated text is stored in the output variable.
Finally, the generated text is printed to the console.

In [None]:
generator = pipeline('text-generation', model = 'gpt2')
output = generator("Hello, I'm a language model")
print(output)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The code imports the necessary modules for working with transformers and PyTorch.

The variable `torch_device` is assigned the value "cuda" if a GPU is available, otherwise it is assigned the value "cpu".

The `AutoTokenizer` class from the transformers library is used to create a tokenizer object, which is initialized with the "gpt2" model.

The `pad_token_id` parameter is set to the eos_token_id of the tokenizer in order to add the EOS token as a PAD token, which helps avoid warnings during model training.

The `AutoModelForCausalLM` class is used to create a model object, which is initialized with the "gpt2" model and the pad_token_id. The model is then moved to the specified torch_device.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

torch_device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = AutoModelForCausalLM.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id).to(torch_device)

The `model_inputs` variable is used to tokenize the given sentence using the tokenizer specified. The input sentence is "I enjoy walking with my cute dog". The tokenizer returns a tensor representation of the tokens. The `return_tensors='pt'` argument specifies that the tokenizer should return PyTorch tensors. The resulting tensor is then moved to the specified `torch_device` for further processing.

In [None]:
model_inputs = tokenizer('I enjoy walking with my cute dog', return_tensors='pt').to(torch_device)

The `print(model_inputs["input_ids"])` statement prints the value of the "input_ids" key in the `model_inputs` dictionary.

The `tokenizer.decode(model_inputs["input_ids"][0])` statement decodes the first element of the "input_ids" list using the tokenizer object.

The code can be used to inspect the "input_ids" values in the `model_inputs` dictionary.

The output of the first statement is a list of "input_ids" values.

The output of the second statement is the decoded value of the first "input_ids" element.

In [None]:
print(model_inputs["input_ids"])
print(tokenizer.decode(model_inputs["input_ids"][0]))

The `generate` method of the `model` object is used to generate output based on the given input. It takes in `model_inputs` as the parameter, which should be a dictionary containing the necessary inputs for the model. Additionally, the `max_new_tokens` parameter is set to 40, which specifies the maximum number of tokens that should be generated in the output.

The generated output is stored in the `greedy_output` variable. This output is generated using a greedy algorithm, which means that at each step, the model selects the token with the highest probability and appends it to the output sequence.

The `generate` method is commonly used in natural language processing tasks such as text generation, summarization, and dialogue systems. It leverages the power of pre-trained models to generate human-like text based on the given input.

It is important to note that the performance and quality of the generated output may vary depending on the specific model being used, the quality of the input, and other factors such as the training data and parameters used during training. Therefore, it is recommended to experiment with different inputs and parameters to achieve the desired output.

In [None]:
greedy_output = model.generate(**model_inputs, max_new_tokens=40)



The `greedy_output` is a function that takes in no arguments.
It returns a list of strings, representing the output of a greedy algorithm.
The greedy algorithm is used to solve a specific problem, but the details of the problem are not specified in the code.
The function does not modify any external variables or objects.
The time complexity of the `greedy_output` function is not provided in the code.

In [None]:
greedy_output

tensor([[   40,  2883,  6155,   351,   616, 13779,  3290,    11,   475,   314,
          1101,   407,  1654,   611,   314,  1183,  1683,   307,  1498,   284,
          2513,   351,   616,  3290,    13,   314,  1101,   407,  1654,   611,
           314,  1183,  1683,   307,  1498,   284,  2513,   351,   616,  3290,
            13,   198,   198,    40,  1101,   407,  1654]])

The `tokenizer.decode()` function is used to decode the output of the `greedy_output` variable.

It takes the first element of the `greedy_output` list and decodes it using the tokenizer.

The decoded output is then printed out using the `print()` function.

This code snippet assumes that the `tokenizer` and `greedy_output` variables have been defined earlier in the code.

The purpose of this code is to display the decoded output of the `greedy_output` using the specified tokenizer.

In [None]:
print(tokenizer.decode(greedy_output[0]))

The `beam_output` variable is assigned the result of calling the `generate` method on the `model` object. The method takes in `model_inputs`, which are additional arguments passed to the `generate` method. The `max_new_tokens` argument specifies the maximum number of tokens that the model can generate. The `num_beams` argument determines the number of beams to use in the beam search algorithm. The `early_stopping` argument is a boolean value that determines if the generation process should stop when the model predicts an end-of-sentence token.

After generating the output, the code prints a separator line consisting of 100 dashes. The `tokenizer.decode` method is then called on `beam_output[0]` to convert the generated tokens back into a human-readable string. The `skip_special_tokens=True` argument is used to exclude any special tokens, such as padding or end-of-sentence tokens, from the decoded output. The resulting string is then printed as the final output.

In [None]:
beam_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

The `model.generate()` function is used to generate text output using the trained model. It takes in several arguments:

- `model_inputs`: This argument contains the inputs to the model, which are passed as keyword arguments. The specific inputs required depend on the model architecture and task.

- `max_new_tokens`: This argument specifies the maximum number of new tokens that can be generated by the model.

- `num_beams`: This argument determines the number of beams to use during beam search decoding. Each beam represents a possible sequence of generated tokens.

- `no_repeat_ngram_size`: This argument specifies the size of n-grams that should not be repeated in the generated output. This helps to prevent the model from generating repetitive sequences.

- `early_stopping`: This argument enables or disables early stopping during decoding. If set to True, the decoding process will stop when all beams have reached the end token.

The generated output is then printed using the `print()` function. The `tokenizer.decode()` function is used to convert the token IDs in `beam_output` to human-readable text by skipping special tokens.

In [None]:
beam_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Nice, that looks much better! We can see that the repetition does not appear anymore. Nevertheless, n-gram penalties have to be used with care. An article generated about the city New York should not use a 2-gram penalty or otherwise, the name of the city would only appear once in the whole text!

The `model.generate()` function is used to generate text outputs using a language model. It takes several arguments:

- `model_inputs`: This is a placeholder for the inputs to the model. It should be replaced with the actual inputs you want to generate text for.
- `max_new_tokens`: This argument determines the maximum number of new tokens that can be generated in the output text. In this case, it is set to 40.
- `num_beams`: This argument determines the number of beams to use in beam search. Beam search is a technique used to generate multiple potential outputs and select the most likely one. Here, 5 beams are used.
- `no_repeat_ngram_size`: This argument determines the size of n-grams that should not be repeated in the output text. In this case, n-grams of size 2 are not allowed to repeat.
- `num_return_sequences`: This argument determines the number of different sequences to be returned. In this case, 5 sequences are returned.
- `early_stopping`: This argument determines whether to stop generation when all beams have finished or not. If set to True, generation will stop when all beams have reached the end of the sequence.

After generating the outputs, the code prints each output sequence

In [None]:
beam_outputs = model.generate(
    **model_inputs,
    max_new_tokens=40,
    num_beams=5,
    no_repeat_ngram_size=2,
    num_return_sequences=5,
    early_stopping=True
)

print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))

The code sets the seed to 42 in order to reproduce the same results. The seed can be changed to get different results.
The code then activates sampling and deactivates top_k sampling by setting top_k to 0.
The `generate` function is called on the `model` with the specified `model_inputs`, `max_new_tokens`, `do_sample`, and `top_k` parameters.
The output of the generation is printed, with the generated text being decoded using the `tokenizer` and skipping any special tokens.
The generated text is printed as the output, separated by a line of 100 dashes.

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
from transformers import set_seed
set_seed(42)

# activate sampling and deactivate top_k by setting top_k sampling to 0
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

The code sets the seed to 42 in order to reproduce the results. It then uses the `generate` function of the `model` object with certain parameters. The `model_inputs` are passed as arguments, along with `max_new_tokens` set to 40, `do_sample` set to True, `top_k` set to 0, and `temperature` set to 0.7. The `generate` function generates output based on these inputs. Finally, the decoded output is printed after removing any special tokens.

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# use temperature to decrease the sensitivity to low probability candidates
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=0,
    temperature=0.7,
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


The code sets the seed to 42 in order to reproduce the same results. The seed can be changed to get different results.

The code then sets the value of top_k to 50. This parameter is used during text generation later on.

The model.generate() function is called with model_inputs as the input and several other parameters. These parameters include max_new_tokens set to 40, do_sample set to True, and top_k set to 80.

The function generates a sample output based on the provided inputs and parameters.

Finally, the output is printed, decoded using the tokenizer, and displayed.

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# set top_k to 50
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=80
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


The code snippet sets the seed to 42 in order to reproduce the results. The seed can be changed if desired.

The `top_k` variable is set to 50. This variable is later used in the `generate` method to limit the number of possible tokens to consider during generation.

The `generate` method is called on the `model` object, passing in `model_inputs` as arguments. The method generates a sequence of tokens based on the given inputs. The generated sequence is limited to a maximum of 40 new tokens.

The `do_sample` parameter is set to True, allowing the model to randomly select the next token based on the probability distribution of the tokens.

The `top_p` parameter is set to 0.92, which controls the cumulative probability of the tokens to consider during generation.

The `top_k` parameter is set to 0, meaning no limit is placed on the number of tokens to consider based on their probability.

The generated output is then printed, with the special tokens skipped using the `skip_special_tokens=True` argument in the `decode` method.

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# set top_k to 50
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_p=0.92,
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


The code sets the seed to 42 in order to reproduce the same results. You can change the seed value if desired.

The variable `sample_output` stores the generated output using the `model.generate` function. It takes in `model_inputs` as its arguments, and sets `max_new_tokens` to 40. The output is generated using sampling with a top-p probability of 0.92 and a top-k value of 50.

The code then prints the generated output by decoding `sample_output` using the tokenizer. It skips any special tokens in the output.

The output is displayed as a string preceded by a line of 100 dashes.

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# set top_k to 50
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_p=0.92,
    top_k=50
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


In [None]:
model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = 'cuda:0'
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

In [None]:
prompt = """
Extract the main person and place from a sentence:

###
Paul is playing football in New York with Heather.
Person: Paul, Place: New York, Person: Heather
###
Jeff is in a hurry to go to Boston.
Person: Jeff, Place: Boston
###
Max is going to Phildelphia.
Person: Max, Place: Philadelphia
###
Sam is from Phoenix
"""

In [None]:
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=10, temperature=0.01, eos_token_id=tokenizer.encode("###"), pad_token_id = tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(input_ids[0]):-1]))

In [None]:
prompt = """
Extract the sentiment from the text.

###
The food and service was excellent. 5 stars!
Sentiment: Positive
###
Delicious meals and great ambience, will recommend this place.
Sentiment: Positive
###
Terrible experience. Avoid this restaurant.
Sentiment: Negative
###
Food tasted awful, will not come back again.
"""

In [None]:
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=10, temperature=0.01, eos_token_id=tokenizer.encode("###"), pad_token_id = tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(input_ids[0]):-1]))

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_name = "meta-llama/Llama-2-7b-chat-hf"
token = "<huggingface token>"
model = AutoModelForCausalLM.from_pretrained(model_name,use_auth_token=token)
tokenizer = AutoTokenizer.from_pretrained(model_name,use_auth_token=token)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto"
)

In [None]:
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

In [None]:
prompt_template = """
System: {B_SYS} You are a helpful assistant and your name is S2Bot.
Given relevant parts of a document and a question, create a final answer.
{E_SYS}
User:
{B_INST}
Use the following portion of a long document to see if any of the text is relevant to answer the question. Answer concisely in one sentence.
Context:
{context}
Question:
{question}
{E_INST}
Assistant:
"""
context = "The winners were announced during the awards ceremony on March 12, 2023. \
      Everything Everywhere All at Once became the first science-fiction film to win Best Picture,[10] and it was the third film alongside 1951's A Streetcar Named Desire and 1976's Network to win three acting awards.[11] \
      Best Director winners Daniel Kwan and Daniel Scheinert became the third pair of directors to win for the same film.[a] For the first time since the 7th ceremony in 1935, all five Best Actor nominees were first time nominees.[12] Michelle Yeoh was the first Asian winner for Best Actress and the second woman of color overall after Halle Berry, who won for her performance in 2001's Monster's Ball.[13] Furthermore, she was the first woman to identify as Asian to be nominated in that category.[b] Ke Huy Quan became the first Vietnamese person to win an Oscar and the second Asian winner for Best Supporting Actor after Haing S. Ngor, who won for his role in 1984's The Killing Fields.[15][16].format(context=context,)"
question = "Who won 2023 oscar?"
prompt = prompt_template.format(B_SYS=B_SYS,context=context,question=question,E_SYS=E_SYS, B_INST=B_INST,E_INST=E_INST)
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.6,
    top_p = 0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=1024,
)
for seq in sequences:
    print(f"{seq['generated_text']}")
