<font size="5">Article Summarizer Application using PEGASUS model

<font color=blue><font size="3">in this pipeline, i've created a simple application for summarization that you can use without any previous experience in Natural Language Processing, HuggingFace Transformers, Pre-Trained Models, Tokenization or Encoding & Decoding Architectures.

<font color=blue><font size="3">I use the "Newspaper3k" library to extract the text from any article links and "Gradio" library to create a friendly UI and then summarize them with "PEGASUS" model for abstractive summarization using "PEGASUS-XSUM" pre-trained model.

<font color=blue><font size="3">run the whole code once to launch the UI interface in a new tab on the default browser, and then copy and paste the link of the article that needed to be summarized inside the URL box, click submit and wait for the summary output

In [1]:
#install newspaper3k : for extracting & curating articles

!pip install newspaper3k transformers gradio --quiet

You should consider upgrading via the 'C:\Users\user\miniconda3\python.exe -m pip install --upgrade pip' command.


In [2]:
#Loading some dependencies

#nlkt : provides newspaper3k with tokenizing functionalities
#transformers :  provides general-purpose architectures (PEGASUS) for Natural Language Understanding (NLU)
#Gradio : provides you with A friendly customizable graphical web interface so that anyone can use it

from newspaper import Article
from newspaper import Config
import nltk
nltk.download('punkt')

from transformers import pipeline
import gradio as gr
from gradio.mix import Parallel, Series

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [3]:
#Instead of making the users copy the text from any website and paste it in a summary code, this simple method 
#is for them to paste the link and then create a summary for the users without any programing.
#this function will extract the article text using the newspaper3k library

#USER_AGENT : allow us to extract information from some websites without getting "http" errors.

def auto_link(url):
  USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
  configuration = Config()
  configuration.browser_user_agent = USER_AGENT
  configuration.request_timeout = 10
  abstract = Article(url, config =configuration)
  abstract.download()
  abstract.parse()
  text = abstract.text
  return text

In [4]:
#Gradio will help us to create a friendly User Interface.

#example_urls: provide the users with some examples, includes links with random articles from websites like wikipedia and CNN

#google/pegasus-xsum : i used this pre-trained model because after many trials, its output was the most precise summary


extract = gr.Interface(auto_link, 'text', 'text')
abstractive_summarization = gr.Interface.load("huggingface/google/pegasus-xsum")

example_urls = [['https://www.ibm.com/cloud/learn/machine-learning/'],
                ['https://www.cnn.com/2021/11/24/us/arctic-ocean-early-warming-climate/index.html/'],
                ['https://www.britannica.com/topic/Pyramids-of-Giza']]

title =  '''
        Life is Short, Copy $ Paste your Link below and let PEGASUS summarize your article.
        (you could use one of these links below to see how it works)
        
        '''

UI_summarizer = Series(extract, abstractive_summarization, inputs = gr.inputs.Textbox (lines = 2, label = 'URL'),
               outputs = 'text', title = 'Abstractive Summarization App', theme = 'huggingface', description = title, 
               examples= example_urls)

Fetching model from: https://huggingface.co/google/pegasus-xsum
Running on local URL:  http://127.0.0.1:7861/
Running on public URL: https://43499.gradio.app

This share link will expire in 72 hours. To get longer links, send an email to: support@gradio.app


(<Flask 'gradio.networking'>,
 'http://127.0.0.1:7861/',
 'https://43499.gradio.app')

In [None]:
#(share=True): allows to share and run this task on other devices without this jupyter notebook for 72 hours only (public URL)
#(inbrowser=True): allows to launch the interface in a new tab on the default browser

UI_summarizer.launch(share = True, inbrowser = True)