## Open notebook in:
| Colab                                 |  Gradient                                                                                                                                         |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nicolepcx/Transformers-in-Action/blob/main/CH06/CH06_text_generation_coding_examples.ipynb)                                              | [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/Nicolepcx/Transformers-in-Action/blob/main/CH06/CH06_text_generation_coding_examples.ipynb)|             

In [1]:
# Clone repo, if it's not already cloned, to be sure all runs on smoothly
# on Colab, Kaggle or Paperspace
import os

if not os.path.isdir('Transformers-in-Action'):
    !git clone https://github.com/Nicolepcx/Transformers-in-Action.git
else:
    print('Repository already exists. Skipping clone.')


current_path = %pwd
if '/Transformers-in-Action' in current_path:
    new_path = current_path + '/utils'
else:
    new_path = current_path + '/Transformers-in-Action/utils'
%cd $new_path


Cloning into 'Transformers-in-Action'...
remote: Enumerating objects: 279, done.[K
remote: Counting objects: 100% (127/127), done.[K
remote: Compressing objects: 100% (98/98), done.[K
remote: Total 279 (delta 62), reused 71 (delta 29), pack-reused 152[K
Receiving objects: 100% (279/279), 2.24 MiB | 7.11 MiB/s, done.
Resolving deltas: 100% (134/134), done.
/content/Transformers-in-Action/utils


# Install requirements

In [2]:
from requirements import *

In [3]:
install_base_packages()
install_required_packages_ch06()

[1mInstalling base requirements...
[0m
✅ transformers==4.26.1 installation completed successfully!

✅ datasets==2.10.1 installation completed successfully!

[1mInstalling chapter 6 requirements...
[0m
✅ accelerate==0.26.1 installation completed successfully!



In [4]:
from setup import *

In [5]:
useGPU()

Have fun with this chapter!🥳


#Imports

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

# About this notebook

In this notebook, you will explore the different sampling methods and how they change the output of the model. You will also use the Falcon 7B model using various sampling strategies. This practical exercise will demonstrate how the choice of sampling method can significantly influence the model's output.

I enchourage you to experiment with the different strategies! Try out how Greedy search, Beam search, Top-K sampling, and Top-p (nucleus) sampling changes the output. Observe how each strategy affects the style and content of the generated text.

As highlighted in the book, adjust the temperature setting to see its effect on the model's creativity. Notice how a temperature closer to 1 tends to produce more varied outputs.

Examine also the differences in outputs between the methods. This will give you a deeper understanding of how each sampling technique guides the model's decision-making process.

This hands-on exploration complements the theoretical knowledge from the book, giving you a tangible feel for the nuances of each sampling method with GPT 2 and Falcon 7B.

# Deterministic Sampling

In [None]:
#@title Greedy Search
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')


greedy_output = model.generate(
    input_ids,
    max_length=100
)


# Now we print the result
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
print(f"{tokenizer.decode(greedy_output[0], skip_special_tokens=True)}\n")
print("-" * 140, "\n")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
In a world where AI has become ubiquitous, it's hard to imagine how it will ever be able to replace humans.

"We're going to have to see how we do it," said Dr. Michael S. Hirsch, a professor of psychology at the University of California, San Francisco. "We're going to have to see how we do it in a way that's not just a matter of human beings, but also of machines."

The AI revolution is already happening.

-------------------------------------------------------------------------------------------------------------------------------------------- 



## Beam Search

![Google Drive Image](https://drive.google.com/uc?export=view&id=10iJOlzJt-1-bFWI2mxZ7UhbJ8RKL2Eco)

In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')


beam_outputs = model.generate(
    input_ids,
    max_length=100,
    num_beams=5,
    num_return_sequences=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)


# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, beam_output in enumerate(beam_outputs):
    print(f"Output {i+1}:\n{tokenizer.decode(beam_output, skip_special_tokens=True)}\n")
    print("-" * 140, "\n")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, it's hard to imagine a better time to be a part of it than right now.

"I think we're going to see a lot more of that in the next few years," he said.

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, it's hard to imagine a better time to be a part of it than right now.

"I think we're going to see a lot more of that in the next few years," he says.

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 3:
In a world where AI has become ubiquitous, it's hard to imagine a better time to be a part of it than right now.

"I think we're going to see a lot more of that in the next few years," he said. "It's a very exciting time."

----------------------------------------------------------

# Probabilistic Methods

##Top-k Sampling

![Google Drive Image](https://drive.google.com/uc?export=view&id=1n984J6XPmVi-b1uvfLiDOkNBgS11YrVd)

In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')


top_k_outputs = model.generate(
    input_ids,
    max_length=400,
    num_return_sequences=5,
    do_sample=True,
    top_k=50,
    temperature=1.5
)


# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, top_k_output in enumerate(top_k_outputs):
    print(f"Output {i+1}:\n{tokenizer.decode(top_k_output, skip_special_tokens=True)}\n")
    print("-" * 140, "\n")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, it's tempting to see it as inevitable, especially to see people like them. Some will argue that they should make good use of this technological leap to start a new society. They have been, but they might have fewer kids in the future. There is a growing public understanding that when there isn't enough children to do productive work, then AI could at worst be used to improve their lives.

Perhaps we are in that area too, but there are still ways AI could benefit you to be better. You can take our suggestion for creating better software and use it to make a difference against AI: get the AI off the shelf. Take the opportunity to support better people on both the web and at an ethical level. Go beyond talking over social media: you could take a step further using that to connect yourself online.

But even more important is using this opportunity as an impetus for future activism when these social tools aren't necessarily ne

## Nucleus (Top-p) Sampling

![Google Drive Image](https://drive.google.com/uc?export=view&id=1pJM6jtIO29qOx2JTPAKHLxSqzjtRowbZ)


In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with nucleus sampling
nucleus_outputs = model.generate(
    input_ids,
    max_length=100,
    do_sample=True,
    top_p=0.92,  # Set p for nucleus sampling
    num_return_sequences=5  # How many outputs to generate
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(nucleus_outputs):
    print(f"Output {i+1}:\n{tokenizer.decode(output, skip_special_tokens=True)}\n")
    print("-" * 140, "\n")


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, this is not the case.

The most recent example is Google's own self-driving car, which, at the time of this writing, was able to successfully navigate a highway in Los Angeles without any human intervention. Now that its self-driving car is on the road, it's almost certain that autonomous driving will be the next major industry.

That's bad news for the American auto industry, who have been trying to make it safe

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, there will be little reason to keep up.

If anything, the most obvious way to avoid using AI systems is to avoid creating them, and to instead use them as tools to create tools that we're not meant to be using. The only thing that's a great way to avoid this is to develop your own personal AI AI toolkit so that you can 

## Temperature Sampling

![Google Drive Image](https://drive.google.com/uc?export=view&id=142UUXEJh32oKF90iBYN5jZvVE42pUqJI)


In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with temperature sampling
temperature_outputs = model.generate(
    input_ids,
    max_length=100,
    do_sample=True,
    temperature=0.7,  # Control randomness with the temperature parameter
    num_return_sequences=5  # How many outputs to generate
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(temperature_outputs):
    print(f"Output {i+1}:\n{tokenizer.decode(output, skip_special_tokens=True)}\n")
    print("-" * 140, "\n")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, it's easy to see the potential for it to become a problem.

"It's very important to understand the potential for AI to change, to develop," says Kishore Kumar, president of the AI Foundation. "The AI Foundation is making a conscious effort to make sure that people come out and say, 'We should all be working with this AI.' We have no idea how it will come to be.

"We have to

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, it's no wonder that there are so many new inventions.

In 2015, Google gave us a glimpse of how the AI revolution could be harnessed. In an repressive and totalitarian world, it's easy to imagine that no one would be doing anything about it, and there are few people who would do anything about it. As Google is developing its autonomous cars, many will be work

In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with temperature sampling
temperature_outputs = model.generate(
    input_ids,
    max_length=100,
    do_sample=True,
    temperature=0.7,  # Control randomness with the temperature parameter
    num_return_sequences=5,  # How many outputs to generate
    eos_token_id=tokenizer.eos_token_id,  # Stop generation at EOS token
    pad_token_id=tokenizer.pad_token_id   # If the output is shorter than max_length, it'll be padded
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(temperature_outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True).strip()
    print(f"Output {i+1}:\n{generated_text}\n")
    print("-" * 140, "\n")


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, it is difficult to be confident that we will ever be able to build a fully autonomous car. However, even if we did, it would still take a considerable amount of time and effort to create one. And the fact that AI is now so commonplace doesn't mean that it will be replaced by a more powerful robot.

This is why I believe that the future of autonomous vehicles will have to come from a technology that is already being used for real

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, it is difficult for the company to ignore the fact that AI is just one of many ways that AI processes human behavior.

With such a large number of applications, it becomes difficult to see how the company is going to avoid the subject matter of this post. But for now, let's look at the main points.

What

In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with temperature sampling
temperature_outputs = model.generate(
    input_ids,
    max_length=200,
    do_sample=True,
    temperature=0.7,
    num_return_sequences=5
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(temperature_outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True).strip()
    # Check if EOS token exists in the output and trim up to that
    if '<|endoftext|>' in generated_text:
        generated_text = generated_text.split('<|endoftext|>')[0].strip()
    else:
        # Alternatively, trim up to the last period for coherence
        generated_text = '. '.join(generated_text.split('. ')[:-1]) + '.'
    print(f"Output {i+1}:\n{generated_text}\n")
    print("-" * 140, "\n")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous,posal might be less of a stretch. A recent survey of 3,000 companies by the British Computer Corporation found that AI could easily outperform the human-computer ratio of 1; this is more than double the ratio of 1 (and a lot more than the ratio of 1 to 3). And while some AI researchers, like Ray Kurzweil, say that AI could be used to solve problems in a way that is similar to human-computer interactions, the project is looking further into the future.

A new paper published recently in the journal Proceedings of the National Academy of Sciences suggests that a new kind of AI might be possible. It's called The Road to AI. It's a much-anticipated piece of research that was first conceived as part of a three-year project by Karp Karpov, an AI researcher at the University of Leeds, UK, and now at the University of Nottingham in the United Kingdom.

-------------------------------------------------------------------------------

In [None]:
# Instantiate the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with temperature sampling
temperature_outputs = model.generate(
    input_ids,
    max_length=400,  # More generous max_length
    do_sample=True,
    temperature=0.7,
    num_return_sequences=5,
    eos_token_id=tokenizer.eos_token_id
)

# Post-process outputs: Trim after the last full stop before a desired length
def trim_output(text, desired_max_length=250):
    if len(text) <= desired_max_length:
        return text
    last_full_stop = text.rfind('.', 0, desired_max_length)
    if last_full_stop == -1:
        return text[:desired_max_length]  # Fallback: truncate at desired_max_length
    return text[:last_full_stop + 1]

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(temperature_outputs):
    full_output = tokenizer.decode(output, skip_special_tokens=True)
    trimmed_output = trim_output(full_output)
    print(f"Output {i+1}:\n{trimmed_output}\n")
    print("-" * 140, "\n")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, it's important to remember that robots are not just about us; they are also about you.

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, the potential of humans, robots, and AI-powered car- and-car-driving machines will continue to grow exponentially.

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 3:
In a world where AI has become ubiquitous, it seems that many of us have become better at our job at it.

Ahead of the release of the AI and robotics products, I asked a few researchers if there were any trends that are affecting the industry.

-----------------------------------------------------------------------------------------------------------------------------------------

# Using Falcon 7B with Sampling Methods

Here you can try out the Falcon 7B model with different variations of sampling methods. This should give you an idea how the output of your chosen model can vary based on the sampling strategy. Take a close look at the different outputs of the LLM. Also, examine how the following model outputs differ from each other. As mentioned in the book, the closer the temperature parameter is set to 1, the more "creative" the model will be.

In [None]:
# Model and tokenizer
model_name = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define the input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with temperature sampling
outputs = model.generate(
    input_ids,
    max_length=200,  # max_length to match the desired output
    do_sample=True,
    temperature=0.7,  # Control randomness with the temperature parameter
    top_k=10,
    num_return_sequences=5,  # How many outputs to generate
    eos_token_id=tokenizer.eos_token_id,  # Stop generation at EOS token
    pad_token_id=tokenizer.pad_token_id  # If the output is shorter than max_length, it'll be padded
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True).strip()
    print(f"Output {i+1}:\n{generated_text}\n")
    print("-" * 140, "\n")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
The current implementation of Falcon calls `torch.scaled_dot_product_attention` directly, this will be deprecated in the future in favor of the `BetterTransformer` API. Please install the latest optimum library with `pip install -U optimum` and call `model.to_bettertransformer()` to benefit from `torch.scaled_dot_product_attention` and future performance optimizations.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, we are witnessing the rise of an unprecedented wave of innovation. As AI is now available to everyone, we are seeing more and more businesses using AI to improve their operations, enhance their products and services, and increase their bottom lines.
But, how is AI being used?
In this article, we will look at some of the most common ways that AI is being used in the business world and how it is being applied to different industries.
How is AI Being Used in the Business World?
AI is a powerful tool that can be used to improve business operations, enhance products and services, and increase revenues. In fact, AI can be used in almost every industry and business function.
Here are some of the most common ways that AI is being used in the business world:
- Improving Customer Experience.
- Streamlining Business Processes.
- Optimizing Supply Chains.
- Improving Sales and Marketing.
- Improving Product and Service

-------------

In [None]:
# Model and tokenizer
model_name = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define the input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output with temperature sampling
outputs = model.generate(
    input_ids,
    max_length=100,  # Adjusted max_length to match the desired output
    do_sample=True,
    temperature=0.1,  # Control randomness with the temperature parameter
    top_k=10,
    num_return_sequences=5,  # How many outputs to generate
    eos_token_id=tokenizer.eos_token_id,  # Stop generation at EOS token
    pad_token_id=tokenizer.pad_token_id  # If the output is shorter than max_length, it'll be padded
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True).strip()
    print(f"Output {i+1}:\n{generated_text}\n")
    print("-" * 140, "\n")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
The current implementation of Falcon calls `torch.scaled_dot_product_attention` directly, this will be deprecated in the future in favor of the `BetterTransformer` API. Please install the latest optimum library with `pip install -U optimum` and call `model.to_bettertransformer()` to benefit from `torch.scaled_dot_product_attention` and future performance optimizations.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, the need for human-like intelligence is more important than ever.
The ability to understand and respond to human language is a key component of this.
In this article, we’ll explore how to use AI to create a human-like AI that can understand and respond to human speech.
We’ll also discuss how to use AI to create a human-like AI that can understand and respond to human emotions.
What is AI?

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, it’s easy to forget that it’s still a relatively new technology.
But as the technology continues to evolve, it’s becoming more and more important to understand how it works and how it can be used to improve our lives.
Here are some of the most important things you need to know about AI.
What is AI?
AI is a technology that uses artificial intell

In [None]:

# Instantiate the model and tokenizer
model_name = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define an input prompt
prompt = "In a world where AI has become ubiquitous,"

# Encode the input prompt and prepare it for the model
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text output
outputs = model.generate(
    input_ids,
    max_length=200,
    do_sample=True,
    temperature=0.7,
    num_return_sequences=5
)

# Print the resulting sentences
print("\033[1m" + "Output:\n" + 140 * '=' + "\033[0m")
for i, output in enumerate(outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True).strip()
    # Check if the EOS token exists in the output and trim up to that
    eos_token = tokenizer.eos_token
    if eos_token and eos_token in generated_text:
        generated_text = generated_text.split(eos_token)[0].strip()
    else:
        # Alternatively, trim up to the last period for coherence
        generated_text = '. '.join(generated_text.split('. ')[:-1]) + '.'
    print(f"Output {i+1}:\n{generated_text}\n")
    print("-" * 140, "\n")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
The current implementation of Falcon calls `torch.scaled_dot_product_attention` directly, this will be deprecated in the future in favor of the `BetterTransformer` API. Please install the latest optimum library with `pip install -U optimum` and call `model.to_bettertransformer()` to benefit from `torch.scaled_dot_product_attention` and future performance optimizations.


[1mOutput:
Output 1:
In a world where AI has become ubiquitous, the need for high-quality data is paramount. The data has to be as accurate as possible to ensure that the algorithms are performing at their best.
How many times have you seen a bot or AI-powered app give you a “no results found” message when you’ve asked for something that should be in their database?
It happens more often than you’d think, and it’s a great example of how data quality is essential for the success of any machine learning project.
If you want your data to be useful, it has to be accurate. If it’s not accurate, it could result in some serious problems for your business.
For example, if your algorithm is trained on inaccurate data, it will only be as accurate as the training data you provided.

-------------------------------------------------------------------------------------------------------------------------------------------- 

Output 2:
In a world where AI has become ubiquitous, how do we ensure tha