The project is simple we give input as a phrase or sentence or question and as an output we get a small essay generated by this application. Since by name we can understand it is a text generator application. Example
As we can see from above example how the application was working. To handle the UI part we imported gradio library which makes work much more easier
As we know Transformers consists of Encoders and Decoders having Self Attention and Feed Forward Neural Network at Encoders and Masked self attention, Encoder-Decoder self attention and Feed Forward NN at Decoders.
GPT2 has beautiful architecture in a sense that it doesnot have much of Encoders, It relys mostly on Decoders. It is also known as Transformer Decoder since most of the architecture relies on decoders of transformer.
Like in smartphones based on the sentence we are typing it tries to predict the next word GPT2 works in a similary way but one that is much larger and more sophisticated than what our phone has. The way these models actually work is that after each token is produced, that token is added to the sequence of inputs. And that new sequence becomes the input to the model in its next step. This is an idea called “auto-regression”.The common difference between Self attention and masked self attention is self attention block allows to peak at the right side of the word i.e future word but in masked self attention (GPT2) it can onnly peak to the present word and previous word to the left.
GPT performs way better than berts in generative analysis because GPT gives one token at a time and takes in consideration at next step.
First we get the Token embedding i.e input embedding we also get Positional Encoding but after that we get Decode transformer block only.