**Initialization**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [2]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**Downloading Libraries and Dependencies**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [4]:
#@ IMPORTING MODULES: UNCOMMENT BELOW:
# !pip install transformers[sentencepiece]
import torch
import json
from transformers import T5Tokenizer, T5Config
from transformers import T5ForConditionalGeneration
display_architecture=True

#@ IGNORING WARNINGS: 
import warnings
warnings.filterwarnings("ignore")

**The T5 Model**
- The encoder and decoder layers of **Transformers** became blocks and sub-layers became sub-components containing self-attention layer and a feed forward network. Self-attention is order independent. It uses relative position embeddings. 

In [6]:
#@ INITIALIZING T5 MODEL:
model = T5ForConditionalGeneration.from_pretrained("t5-large")          # Initializing pretrained model.
tokenizer = T5Tokenizer.from_pretrained("t5-large")                     # Initializing pretrained tokenizer.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")   # Initialization. 

**Architecture of T5 Model**

In [7]:
#@ ARCHITECTURE OF T5 MODEL:
if display_architecture:
    print(model.config)                                                 # Inspecting T5 architecture. 

T5Config {
  "_name_or_path": "t5-large",
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 4096,
  "d_kv": 64,
  "d_model": 1024,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 24,
  "num_heads": 16,
  "num_layers": 24,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefix": "translate English to German: "
    },
    "translat

In [8]:
#@ ARCHITECTURE OF T5 MODEL:
if display_architecture:
    print(model)                       # Inspecting T5 architecture. 

T5ForConditionalGeneration(
  (shared): Embedding(32128, 1024)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 1024)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=1024, out_features=1024, bias=False)
              (k): Linear(in_features=1024, out_features=1024, bias=False)
              (v): Linear(in_features=1024, out_features=1024, bias=False)
              (o): Linear(in_features=1024, out_features=1024, bias=False)
              (relative_attention_bias): Embedding(32, 16)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseReluDense(
              (wi): Linear(in_features=1024, out_features=4096, bias=False)
              (wo): Linear(in_features=4096, out_features=1024, bias=False)
              (

In [9]:
#@ ARCHITECTURE OF T5 MODEL:
if display_architecture:
    print(model.encoder)                       # Inspecting T5 encoder architecture. 

T5Stack(
  (embed_tokens): Embedding(32128, 1024)
  (block): ModuleList(
    (0): T5Block(
      (layer): ModuleList(
        (0): T5LayerSelfAttention(
          (SelfAttention): T5Attention(
            (q): Linear(in_features=1024, out_features=1024, bias=False)
            (k): Linear(in_features=1024, out_features=1024, bias=False)
            (v): Linear(in_features=1024, out_features=1024, bias=False)
            (o): Linear(in_features=1024, out_features=1024, bias=False)
            (relative_attention_bias): Embedding(32, 16)
          )
          (layer_norm): T5LayerNorm()
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (1): T5LayerFF(
          (DenseReluDense): T5DenseReluDense(
            (wi): Linear(in_features=1024, out_features=4096, bias=False)
            (wo): Linear(in_features=4096, out_features=1024, bias=False)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (layer_norm): T5LayerNorm()
          (dropout): Dropo

In [10]:
#@ ARCHITECTURE OF T5 MODEL:
if display_architecture:
    print(model.decoder)                       # Inspecting T5 decoder architecture. 

T5Stack(
  (embed_tokens): Embedding(32128, 1024)
  (block): ModuleList(
    (0): T5Block(
      (layer): ModuleList(
        (0): T5LayerSelfAttention(
          (SelfAttention): T5Attention(
            (q): Linear(in_features=1024, out_features=1024, bias=False)
            (k): Linear(in_features=1024, out_features=1024, bias=False)
            (v): Linear(in_features=1024, out_features=1024, bias=False)
            (o): Linear(in_features=1024, out_features=1024, bias=False)
            (relative_attention_bias): Embedding(32, 16)
          )
          (layer_norm): T5LayerNorm()
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (1): T5LayerCrossAttention(
          (EncDecAttention): T5Attention(
            (q): Linear(in_features=1024, out_features=1024, bias=False)
            (k): Linear(in_features=1024, out_features=1024, bias=False)
            (v): Linear(in_features=1024, out_features=1024, bias=False)
            (o): Linear(in_features=1024, out_feat

In [11]:
#@ ARCHITECTURE OF T5 MODEL:
if display_architecture:
    print(model.forward)                       # Inspecting T5 architecture. 

<bound method T5ForConditionalGeneration.forward of T5ForConditionalGeneration(
  (shared): Embedding(32128, 1024)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 1024)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=1024, out_features=1024, bias=False)
              (k): Linear(in_features=1024, out_features=1024, bias=False)
              (v): Linear(in_features=1024, out_features=1024, bias=False)
              (o): Linear(in_features=1024, out_features=1024, bias=False)
              (relative_attention_bias): Embedding(32, 16)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseReluDense(
              (wi): Linear(in_features=1024, out_features=4096, bias=False)
              (wo): Linear(in_features=

**Summarizing Documents with T5**

In [19]:
#@ INITIALIZING SUMMARIZATION FUNCTION:
def summarize(text, ml):                                        # Defining function.
    device = torch.device("cpu")
    preprocess_text = text.strip().replace("\n", "")            # Processing text.
    prepared_text = "Summarize: " + preprocess_text             # Preparing text. 
    print("Preprocessed and prepared text: \n", prepared_text)  # Inspecting text. 
    tokenized_text = tokenizer.encode(prepared_text, 
                                      return_tensors="pt"
                                      ).to(device)              # Tokenization and encoding.
    summary_ids = model.generate(tokenized_text, num_beams=4,
                                 no_repeat_ngram_size=2,
                                 min_length=30, max_length=ml,
                                 early_stopping=True)           # Initializing summarization. 
    output = tokenizer.decode(summary_ids[0], 
                              skip_special_tokens=True)         # Initializing decoding.
    return output

In [20]:
#@ SUMMARIZING DOCUMENTS WITH T5:
text = """
The United States Declaration of Independence was the first Etext
released by Project Gutenberg, early in 1971. The title was stored
in an emailed instruction set which required a tape or diskpack be
hand mounted for retrieval. The diskpack was the size of a large
cake in a cake carrier, cost $1500, and contained 5 megabytes, of
which this file took 1-2%. Two tape backups were kept plus one on
paper tape. The 10,000 files we hope to have online by the end of
2001 should take about 1-2% of a comparably priced drive in 2001.
"""
print("Number of characters: ", len(text))                              # Inspection.
summary = summarize(text, 50)                                           # Summarization.
print("\n Summarized Text: \n", summary)                                # Inspection.

Number of characters:  530
Preprocessed and prepared text: 
 Summarize: The United States Declaration of Independence was the first Etextreleased by Project Gutenberg, early in 1971. The title was storedin an emailed instruction set which required a tape or diskpack behand mounted for retrieval. The diskpack was the size of a largecake in a cake carrier, cost $1500, and contained 5 megabytes, ofwhich this file took 1-2%. Two tape backups were kept plus one onpaper tape. The 10,000 files we hope to have online by the end of2001 should take about 1-2% of a comparably priced drive in 2001.

 Summarized Text: 
 : The 10,000 files we hope to have online by the end of 2001 should take about 1-2% of a comparably priced drive in 2001...:marize::. The Declaration of Independence


In [21]:
#@ SUMMARIZING DOCUMENTS WITH T5:
text = """
No person shall be held to answer for a capital, or otherwise infamous 
crime,
unless on a presentment or indictment of a Grand Jury,exceptin cases 
arising
 in the land or naval forces, or in the Militia, when in actual service
in time of War or public danger; nor shall any person be subject for
the same offense to be twice put in jeopardy of life or limb;
nor shall be compelled in any criminal case to be a witness against 
himself,
nor be deprived of life, liberty, or property, without due process of 
law;
nor shall private property be taken for public use without just 
compensation.
"""
print("Number of characters: ", len(text))                              # Inspection.
summary = summarize(text, 50)                                           # Summarization.
print("\n Summarized Text: \n", summary)                                # Inspection.

Number of characters:  594
Preprocessed and prepared text: 
 Summarize: No person shall be held to answer for a capital, or otherwise infamous crime,unless on a presentment or indictment of a Grand Jury,exceptin cases arising in the land or naval forces, or in the Militia, when in actual servicein time of War or public danger; nor shall any person be subject forthe same offense to be twice put in jeopardy of life or limb;nor shall be compelled in any criminal case to be a witness against himself,nor be deprived of life, liberty, or property, without due process of law;nor shall private property be taken for public use without just compensation.

 Summarized Text: 
 . No person shall be held to answer for a capital, or otherwise infamous crime.....: No one shall ever be convicted of any crime, except for the commission of


In [22]:
#@ SUMMARIZING DOCUMENTS WITH T5:
text = """
The law regarding corporations prescribes that a corporation 
can be incorporated in the state of Montana to serve any lawful 
purpose. In the state of Montana, a corporation has all the powers 
of a natural person for carrying out its business activities. The 
corporation can sue and be sued in its corporate name. It has 
perpetual succession. The corporation can buy, sell or otherwise 
acquire an interest in a real or personal property. It can conduct 
business, carry on operations, and have offices and exercise the powers 
in a state, territory or district in possession of the U.S., or in a 
foreign country. It can appoint officers and agents of the corporation 
for various duties and fix their compensation.
The name of a corporation must contain the word "corporation" or 
its abbreviation "corp." The name of a corporation should not be 
deceptively similar to the name of another corporation incorporated 
in the same state. It should not be deceptively identical to the 
fictitious name adopted by a foreign corporation having business 
transactions in the state.
"""
print("Number of characters: ", len(text))                              # Inspection.
summary = summarize(text, 50)                                           # Summarization.
print("\n Summarized Text: \n", summary)                                # Inspection.

Number of characters:  1082
Preprocessed and prepared text: 
 Summarize: The law regarding corporations prescribes that a corporation can be incorporated in the state of Montana to serve any lawful purpose. In the state of Montana, a corporation has all the powers of a natural person for carrying out its business activities. The corporation can sue and be sued in its corporate name. It has perpetual succession. The corporation can buy, sell or otherwise acquire an interest in a real or personal property. It can conduct business, carry on operations, and have offices and exercise the powers in a state, territory or district in possession of the U.S., or in a foreign country. It can appoint officers and agents of the corporation for various duties and fix their compensation.The name of a corporation must contain the word "corporation" or its abbreviation "corp." The name of a corporation should not be deceptively similar to the name of another corporation incorporated in the same state. 