NLP’s **Transformer** is a new architecture that aims to solve tasks sequence-to-sequence while easily handling long-distance dependencies. Computing the input and output representations without using sequence-aligned RNNs or convolutions and it relies entirely on self-attention.
![image.png](attachment:86a9dc0c-ec08-4397-99ba-2782140591e5.png)

In general, the Transformer model is based on the encoder-decoder architecture. The encoder is the gray rectangle on the left and the decoder is on the right. The encoder and decoder consist of two and three sublayers, respectively. Multi-head self-awareness, fully connected feedforward network, and encoder decoder self-awareness in the case of decoders (called multi-head attention) with the following visualizations).

Encoder: The encoder is responsible for stepping through the input time steps and encoding the entire sequence into a fixed-length vector called a context vector.
Decoder: The decoder is responsible for stepping through the output time steps while reading from the context vector.

Let’s see how this setup of the encoder and the decoder stack works:

![image.png](attachment:d3850377-9292-4480-b11d-3f3b1c661ac4.png)

1.The word embeddings of the input sequence are passed to the first encoder
2.These are then transformed and propagated to the next encoder
3.The output from the last encoder in the encoder-stack is passed to all the decoders in the decoder-stack as shown in the figure.

**Word embeddings** are a way of representing words in a continuous, numerical space. In natural language processing (NLP), it is often useful to represent words as numerical vectors, rather than as discrete symbols. Word embeddings can be used in a variety of NLP tasks, including language translation, text classification, and language generation.

There are several different ways to create word embeddings, but one of the most common methods is to use a neural network to learn a dense, continuous representation of words from a large dataset. This is known as "learning" the word embeddings, as opposed to "pre-trained" word embeddings which are created by training a model on a specific task and then using the learned embeddings as input to another model.

**Word embeddings** can be created using various neural network architectures, such as feedforward networks, recurrent neural networks (RNNs), and transformers. The specific architecture used to learn the embeddings will depend on the specific NLP task and the characteristics of the dataset.

Once the word embeddings have been learned, they can be used as input to other NLP models, such as language models or text classifiers. The embeddings can also be used to measure the similarity between words, or to perform tasks such as word analogies.


**Word embeddings** are a way of representing words as numerical vectors. In natural language processing tasks, it is often useful to represent words as numerical values that can be input to machine learning models. Word embeddings provide a way to do this by mapping each word to a fixed-length vector of real numbers.

**Word embeddings** are typically learned from large amounts of text data and can capture the relationships between words in a way that is useful for a variety of NLP tasks. For example, word embeddings can capture semantic relationships between words (e.g. "king" and "queen" are related), as well as syntactic relationships (e.g. "run" and "ran" are related).

There are several different methods for learning word embeddings, including word2vec, GloVe, and fastText. These methods typically involve training a neural network on a large dataset of text and using the weights of the network as the word embeddings.

Once learned, word embeddings can be used as input to various NLP tasks, such as language modeling, machine translation, and text classification.


In [1]:
!pip install transformers

[0m

**GPT2LMHeadModel** is a PyTorch model that is part of the transformers library. It is a version of the GPT-2 model that has been modified to output a probability distribution over the tokens in a text sequence, with the goal of predicting the next token in the sequence. The GPT2Tokenizer is a tokenizer that is used to preprocess text data in a way that is compatible with the GPT-2 model. It is also part of the transformers library.


**GPT-2** (Generative Pre-training Transformer 2) is a large-scale language model developed by OpenAI that can generate human-like text. It was trained on a dataset of 8 million web pages and is capable of generating realistic and coherent text that can sometimes even be difficult to distinguish from text written by humans.

**GPT-2Tokenize**r is a tokenizer specific to the GPT-2 model. A tokenizer is a tool that is used to break a piece of text into smaller pieces called tokens, which are then used by a language model to predict the next word in a sequence. The GPT-2Tokenizer is designed to work with the GPT-2 model and can be used to preprocess text data for use with the model.

In [2]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

To use the GPT-2Tokenizer, we will first need to install the transformers library, which provides the GPT-2Tokenizer class. We can install the library using pip install transformers.

Once the transformers library is installed, we can use the GPT2Tokenizer.from_pretrained() method to instantiate a GPT-2Tokenizer object. The method takes a single argument, which is the name of the GPT-2 model we want to use. In this case, we are using the "gpt2-large" model.

The GPT-2Tokenizer will be automatically downloaded and instantiated for us, and we can then use it to tokenize text by calling the tokenize() method on the tokenizer object. For example:

Example:

text = "This is a piece of text that I want to tokenize."
tokens = tokenizer.tokenize(text)

The tokens variable will then contain a list of the tokens produced by the tokenizer.

GPT-2 (Generative Pre-training Transformer 2) is a large-scale language model developed by OpenAI. It was trained on a dataset of 8 million web pages and is capable of generating realistic and coherent text that can sometimes even be difficult to distinguish from text written by humans.

GPT-2 is available in several different sizes, with the "gpt2-large" model being one of the larger models. It has a vocabulary size of 50257 and a total of 1.5 billion parameters. It is generally able to generate higher-quality text than the smaller GPT-2 models, but it is also more computationally expensive to use.

We can use the GPT-2 model to generate text by providing it with a prompt and letting it predict the next word in the sequence. We can also use it for tasks such as language translation, summarization, and question answering.

In [3]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
model = GPT2LMHeadModel.from_pretrained("gpt2-large", pad_token_id=tokenizer.eos_token_id)

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.02G [00:00<?, ?B/s]

The GPT2LMHeadModel is a class provided by the transformers library that allows we to fine-tune a pre-trained GPT-2 model for a specific language modeling task. It is a subclass of the torch.nn.Module class and can be used to define a model that takes in a sequence of tokens and predicts the next token in the sequence.

To create a GPT2LMHeadModel model, we can use the from_pretrained() method, which takes two arguments:

The name of the pre-trained GPT-2 model we want to use. In this case, you are using the "gpt2-large" model.
The ID of the padding token. In this case, we are using the ID of the end-of-sequence (EOS) token, which is the token that indicates the end of a sequence.
The GPT2LMHeadModel will be automatically downloaded and instantiated for us, and we can then use it to fine-tune the model on our own language modeling tasks. For example, we can use the forward() method to pass input through the model and make predictions, or we can use the fit() method to train the model on own own dataset.

Once the GPT2LMHeadModel object is created, we can use it to generate text by calling the generate() method on the model and passing it a prompt to start from. The model will then use its language modeling capabilities to predict the next word in the sequence and generate text that is coherent and realistic.

Example:
prompt = "The quick brown fox jumps over the lazy dog."
generated_text = model.generate(prompt, max_length=100)
The generated_text variable will contain a list of the generated tokens, which we can then convert back into a string using the tokenizer.decode() method.





In [4]:
input_sentence = "Will Artificial Intelligence take the world ?"

input_ids = tokenizer.encode(input_sentence, return_tensors="pt")
input_ids

output = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

tokenizer.decode(output[0], skip_special_tokens=True)

"Will Artificial Intelligence take the world?\n\nI think it will. I think we're going to see a lot more of it in the next 10 years than we've ever seen before. And it's not just AI. It's also robotics, autonomous vehicles, self-driving cars. There are so many things that we can do with artificial intelligence, and I don't see any reason why we shouldn't be able to use it to solve some of the problems we have today."

The encode() method of the GPT-2Tokenizer can be used to convert a string of text into a sequence of tokens that can be used as input to the GPT-2 model. The method takes two arguments: the input sentence to be encoded, and the type of tensor to return.

The input_sentence argument should be a string of text that you want to encode. The return_tensors argument specifies the type of tensor that should be returned. If you set return_tensors to "pt", the method will return a PyTorch tensor. If you set it to "tf", the method will return a TensorFlow tensor. If you set it to "np", the method will return a NumPy array.

For example:

input_sentence = "This is a sentence that I want to encode."
tokens = tokenizer.encode(input_sentence, return_tensors="pt")
The tokens variable will then contain a tensor with the encoded tokens. You can then pass this tensor to the GPT-2 model for processing.

In [5]:
input_sentence = "Will Machine Learning take the world ?"

input_ids = tokenizer.encode(input_sentence, return_tensors="pt")
input_ids

output = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

tokenizer.decode(output[0], skip_special_tokens=True)

"Will Machine Learning take the world?\n\nMachine learning has been around for a long time, but it's only in the last few years that we've really started to see the potential of the technology. It's been used in a wide range of applications, from image recognition to speech recognition, and now we're seeing it being used to improve the way we interact with our devices. We've already seen a number of companies use machine learning in their products, such as Apple's Siri and Google Now"

The generate() method of the GPT2LMHeadModel can be used to generate text using the GPT-2 language model. The method takes several arguments that control the generation process:

input_ids: a tensor containing the input sequence to start from. This should be a tensor of shape (batch_size, sequence_length) containing the encoded tokens.
max_length: the maximum length of the generated sequence. The generation process will stop once the generated sequence reaches this length.
num_beams: the number of beams to use during the beam search process. The beam search process is used to search for the most likely next tokens in the sequence, and using multiple beams can improve the quality of the generated text.
no_repeat_ngram_size: the size of the ngrams that should not be repeated in the generated sequence. Setting this value to 2, for example, will prevent the model from generating any repeated bigrams.
early_stopping: a boolean flag indicating whether to stop the generation process early if a stop token is encountered.
The generate() method returns a tuple containing the generated tokens and the log probability of the generated sequence. You can then use the tokenizer.decode() method to convert the generated tokens into a string of text.

For example:
input_ids = torch.tensor([[1, 2, 3]])
generated_tokens, log_prob = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

generated_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
The generated_text variable will contain the generated text as a string.

In [6]:
input_sentence = "Will Data Science take the world ?"

input_ids = tokenizer.encode(input_sentence, return_tensors="pt")
input_ids

output = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

tokenizer.decode(output[0], skip_special_tokens=True)

"Will Data Science take the world?\n\nData science is the science of data. It's the study of how data can be used to make better decisions, and how to use that data to improve the lives of people around the globe. Data science has been around for a long time, but it's only in the last few years that it has really taken off as a field of study. There are a lot of great resources out there to help you learn more about data science, so we've"

In [7]:
input_sentence = "Will Software Engineering take the world ?"

input_ids = tokenizer.encode(input_sentence, return_tensors="pt")
input_ids

output = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

tokenizer.decode(output[0], skip_special_tokens=True)

'Will Software Engineering take the world?\n\nSoftware engineering is the science and practice of designing, developing, testing, and deploying software systems. Software engineers are responsible for the design and development of software products and services. They design, develop, test and deploy software to support the business objectives of the organization.\nThe software engineering profession is highly specialized and requires a high level of technical and managerial skills. It is a highly competitive field with a wide range of opportunities for advancement and advancement within the industry.'

1.The decode() method of the GPT-2Tokenizer can be used to convert a sequence of tokens back into a string of text. The method takes a single argument, which is the sequence of tokens to be decoded, and returns the decoded string.

You can also pass an optional skip_special_tokens argument, which is a boolean flag indicating whether to skip special tokens when decoding. Special tokens are tokens that have a specific meaning in the context of the language model, such as the padding token or the start-of-sequence token. If skip_special_tokens is set to True, these tokens will be skipped during decoding.

For example:

output = [[1, 2, 3, 4, 5, 6]]
decoded_text = tokenizer.decode(output[0], skip_special_tokens=True)
The decoded_text variable will contain the decoded string of text.

2.The decode() method of the GPT-2Tokenizer can be used to convert a sequence of tokens into a string of text. The method takes two arguments: the tokens to be decoded, and a boolean flag indicating whether to skip special tokens.

The output argument should be a tensor or list of integers containing the encoded tokens. The skip_special_tokens flag controls whether special tokens such as the start and end of sequence tokens should be included in the decoded string. If skip_special_tokens is set to True, these special tokens will be skipped and only the actual text tokens will be included in the output string.

For example:

output = torch.tensor([[1, 2, 3, 4, 5]])
text = tokenizer.decode(output[0], skip_special_tokens=True)
The text variable will then contain the decoded text as a string.



