# Transformers for Text Summerization

As we discussed earlier transformers are great for many NLP tasks and text summerization is one such task.

In 2019, a team of researchers formed a new model based on the assertion every NLP problem can be represented as a Text-To-Text function(and if we in fact think about it, most of the tasks we expect a transformer to do falls in to such category). The idea was to train a transformer using transfer learning techniques during the training phase and finetune using text to text approach. 

This idea led to new performance level for transformer models and the model is called Text-To-Text Transfer Transformer or T5 for short. The researchers wanted the T5 model to be have task agnostic training process. To acheive that purpose they simply added a prefix to the sequences defining what need to be done. For example,

- 'translate English to German: \[sequence]'
- 'summerize: \[sequence]'

This way T5 models can get different tasks in single format. This led T5 to be used in many use cases with same parameters.

### T5 Architecture

As mentioned earlier T5 researchers was not interested in finding a new transformer architecture. Instead they were interested in making the transformer input agnostic. Therefore they used the original transformer architecture for their purpose. But they slightly changed few functionalities to match their need.

- T5 self attention is `order independent`. Means works on a set rather that a list of sequential tokens like previous models.

- Instead of using positional embedding technique of original transformer, this uses a relative postional embedding technique. Also usage of this embeddings are bit different compared to previous models.


> Read more descriptive details about the T5 model for better explanations.

***

### Text summerization with T5

We will use huggingface provided T5 model for this task.



In [1]:
import torch
import json

from transformers import T5ForConditionalGeneration, T5Config, T5Tokenizer

In [None]:
model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

In [12]:
device = torch.device('cpu') # can use gpu if needed

In [4]:
print(model.config)

T5Config {
  "_name_or_path": "t5-small",
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefix": "tran

I am using the T5 small model. Instead we can use the other t5 model types as well. Serach on hugging face for more details.

In [5]:
print(model)

T5ForConditionalGeneration(
  (shared): Embedding(32128, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Drop

In [25]:
def summerize(model, tokenizer, device, max_length, text):

    preprocessed_text = text.strip().replace('\n', '')
    t5_prepared = f'summerize: {text}'

    t5_tokenized_text = tokenizer.encode(t5_prepared, return_tensors='pt').to(device)

    print("t5 prepared text: ", t5_prepared)

    model_out = model.generate(t5_tokenized_text,
                                num_beams = 4,
                                no_repeat_ngram_size=2,
                                min_length=30,
                                max_length=max_length,
                                early_stopping=True)

    output = tokenizer.decode(model_out[0], skip_special_tokens=True)
    return output

In [28]:
text ="""
Less than a decade after helping the Allied Forces win World War II by breaking the Nazi encryption machine Enigma, mathematician Alan Turing changed history a second time with a simple question: “Can machines think?” 
Turing’s 1950 paper “Computing Machinery and Intelligence” and its subsequent Turing Test established the fundamental goal and vision of AI.   
At its core, AI is the branch of computer science that aims to answer Turing’s question in the affirmative. It is the endeavor to replicate or simulate human intelligence in machines. The expansive goal of AI has given rise to many questions and debates. So much so that no singular definition of the field is universally accepted.
"""

In [29]:
summerize(model, tokenizer, device, 200, text)

t5 prepared text:  summerize: 
Less than a decade after helping the Allied Forces win World War II by breaking the Nazi encryption machine Enigma, mathematician Alan Turing changed history a second time with a simple question: “Can machines think?” 
Turing’s 1950 paper “Computing Machinery and Intelligence” and its subsequent Turing Test established the fundamental goal and vision of AI.   
At its core, AI is the branch of computer science that aims to answer Turing’s question in the affirmative. It is the endeavor to replicate or simulate human intelligence in machines. The expansive goal of AI has given rise to many questions and debates. So much so that no singular definition of the field is universally accepted.



'World War II by breaking the Nazi encryption machine Enigma. Turing’s 1950 paper “Computing Machinery and Intelligence” and its subsequent Toring Test established the fundamental goal and vision of AI. At its core, AI is the branch of computer science that aims to answer the question in the affirmative.'

hmm, interesting!

This is with the t5 small model. We can expect  better results with larger model. But based on the datasets the model got trained result may vary on different types of inputs. So its better to explore such aspects as well.