In [1]:
import os
import logging

import nltk
import numpy as np
import tensorflow as tf
from tensorflow import keras

# Only log error messages
tf.get_logger().setLevel(logging.ERROR)

os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [2]:
# The percentage of the dataset you want to split as train and test
TRAIN_TEST_SPLIT = 0.1

MAX_INPUT_LENGTH = 1024  # Maximum length of the input to the model
MIN_TARGET_LENGTH = 5  # Minimum length of the output by the model
MAX_TARGET_LENGTH = 128  # Maximum length of the output by the model
#MAX_TARGET_LENGTH = 400
BATCH_SIZE = 8  # Batch-size for training our model
LEARNING_RATE = 2e-5  # Learning-rate for training our model
MAX_EPOCHS = 1  # Maximum number of epochs we will train the model for

# This notebook is built on the t5-small checkpoint from the Hugging Face Model Hub
#MODEL_CHECKPOINT = "t5-small"
MODEL_CHECKPOINT = "facebook/bart-base"

In [3]:
from transformers import TFAutoModelForSeq2SeqLM, DataCollatorForSeq2Seq

model = TFAutoModelForSeq2SeqLM.from_pretrained("./t5_small")

All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at ./t5_small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [4]:
optimizer = keras.optimizers.Adam(learning_rate=LEARNING_RATE)
model.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


In [5]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT)

In [21]:
if MODEL_CHECKPOINT in ["t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b"]:
    prefix = "summarize: "              #for tokenizer and model
else:
    prefix = ""

In [21]:
#example = "Fresh shelling from Ukraine rocked Belgorod overnight, the governor said in a video posted Sunday morning, as Russian dissidents ramp up pressure on the western border region.Gov. Vyacheslav Gladkov said there had been Ukrainian attacks in several locations under his administration.`The night was rather turbulent,` Gladkov said. `There is a lot of destruction. There is no information about casualties.`Due to the violence, 4,000 people are being housed in temporary accommodations. Children in the area are being moved to a camp in Crimea for their own safety, Gladkov added.Dissidents appear near shelled area: Also Sunday, the Freedom for Russia Legion, one of two dissident Russian units fighting under Ukrainian command, posted a video which they said showed their fighters on the streets of a village on the outskirts of Shebekino, one of the areas Gladkov said was attacked.The footage appeared to show the legion in Novaya Tavolzhanka, according to geolocation by CNN, and groups of people moving through the streets as a unit.“We’re going in! The advance assault group of the Legion and the Russian Volunteer Corp entering the suburb of Shebekino,” the group said in the clip's caption.CNN cannot verify the legion’s claim, but the video’s release will be seen as a further attempt to destabilize Russia in the information space as well as disrupting its military plans.Meetings requested: In another bold move, the legion posted a video in which its leader and that of a second dissident group, the Russian Volunteer Corps, request a meeting with Gladkov. In exchange, they offered to release two Russian soldiers allegedly in their custody.The video shows the purported soldiers giving their names and those of their hometowns in Russia. The dissident leaders -- who have made no secret of their opposition to Russian President Vladimir Putin -- say they want to talk to Gladkov about the fate of the country and the war. No threat is made to the lives of the men they are holding."

In [23]:
example_pref = prefix + example

In [24]:
model_input = tokenizer(example_pref, max_length=MAX_INPUT_LENGTH, truncation=True)

In [25]:
from transformers import pipeline

summarizer = pipeline("summarization", model=model, tokenizer=tokenizer, framework="tf")

summarizer(
    example,
    min_length=MIN_TARGET_LENGTH,
    max_length=MAX_TARGET_LENGTH,
)

[{'summary_text': "MG Rover's proposed tie-up with Chinese carmaker Shanghai Automotive could result in 3,000 jobs being lost if the deal goes ahead, according to the Financial Times."}]