<a href="https://colab.research.google.com/github/araghavendra16/Website-Text-Summarizer/blob/main/Website_Text_Summarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [60]:
#!pip install newspaper3k transformers gradio --quiet 

In [3]:
from newspaper import Article
from newspaper import Config
import nltk
nltk.download('punkt')

from transformers import pipeline
import gradio as gr
from gradio.mix import Parallel, Series

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [7]:
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'

config = Config()
config.browser_user_agent = USER_AGENT
config.request_timeout = 10

url = 'https://www.orfonline.org/research/india-at-the-centre-of-the-indian-ocean-submarine-cable-network/'
article = Article(url, config=config)

In [8]:
article.download() 

In [9]:
article.html



In [10]:
article.parse() 

authors = ", ".join(author for author in article.authors)
title = article.title
date = article.publish_date
text = article.text
image = article.top_image
videos = article.movies
url = article.url

In [11]:
print("Information about the article")
print("=" * 30)
print(f"Title: {title}")
print(f"Author(s): {authors}")
print(f"Publish date: {date}")
print(f"Image: {image}")
print(f"Videos: {videos}")
print(f"Article link: {url}")
print(f"Content: {text[:100] + '...'}")

Information about the article
Title: India at the Centre of the Indian Ocean Submarine Cable Network: Trusted Connectivity in Practice
Author(s): Kaush Arha
Publish date: None
Image: https://www.orfonline.org/favicon.ico
Videos: []
Article link: https://www.orfonline.org/research/india-at-the-centre-of-the-indian-ocean-submarine-cable-network/
Content: Introduction

It is in India’s strategic interest to be the leader of trusted connectivity in data f...


In [12]:
article.nlp()

In [13]:
keywords = article.keywords
keywords.sort()
print(keywords)

['cable', 'cables', 'centre', 'connectivity', 'data', 'digital', 'global', 'india', 'indian', 'indias', 'network', 'ocean', 'practice', 'submarine', 'trusted']


In [14]:
print(f"Summary: \n{article.summary}")

Summary: 
IntroductionIt is in India’s strategic interest to be the leader of trusted connectivity in data flows across the Indian Ocean.
India may pursue five pathways to distinguish itself as the preferred provider of trusted connectivity across the Indian Ocean.
In the specialised submarine cable industry, there are a limited number of ships for cable deployment and maintenance.
It should leverage the Quad group for greater investment and priority to submarine cables and trusted connectivity across the Indian Ocean.
Trusted connectivity and improved India-Europe submarine cable connections should feature prominently in the European Union’s Trade and Technology Council deliberations with India.


In [15]:
io1 = gr.Interface.load('huggingface/sshleifer/distilbart-cnn-12-6')
io2 = gr.Interface.load("huggingface/facebook/bart-large-cnn")
io3 = gr.Interface.load("huggingface/google/pegasus-xsum")  
io4 = gr.Interface.load("huggingface/sshleifer/distilbart-cnn-6-6")                   

iface = Parallel(io1, io2, io3, io4,
                 theme='huggingface', 
                 inputs = gr.inputs.Textbox(lines = 10, label="Text"))

iface.launch()



Fetching model from: https://huggingface.co/sshleifer/distilbart-cnn-12-6
Fetching model from: https://huggingface.co/facebook/bart-large-cnn
Fetching model from: https://huggingface.co/google/pegasus-xsum
Fetching model from: https://huggingface.co/sshleifer/distilbart-cnn-6-6




Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



In [16]:
def extract_article_text(url):
  USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
  config = Config()
  config.browser_user_agent = USER_AGENT
  config.request_timeout = 10

  article = Article(url, config=config)
  article.download()
  article.parse()
  text = article.text
  return text

In [59]:
extractor = gr.Interface(extract_article_text, 'text', 'text')
summarizer = gr.Interface.load('huggingface/sshleifer/distilbart-cnn-12-6')

sample_url = [['https://www.orfonline.org/'],
              ['https://jiss.org.il/en/']]

desc =  '''
        Using Hugging Face models to summarize articles. 
        Using distilbert model 
        '''

iface = Series(extractor, summarizer, 
  inputs = gr.inputs.Textbox(
      lines = 2,
      label = 'URL'
  ),
  outputs = 'text',
  title = 'Website text summarizer',
  theme = 'huggingface',
  description = desc,
  examples=sample_url)

iface.launch()



Fetching model from: https://huggingface.co/sshleifer/distilbart-cnn-12-6




Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



In [40]:
from newspaper import Article
url = 'https://tasty.co/recipe/tamagoyaki-japanese-egg-omelet'
article = Article(url)

In [41]:
article.download()

In [55]:
article.html

''

In [43]:
article.parse()

In [44]:
article.authors

['Rie Mcclenny']

In [47]:
article.text

'Opens in a new window Opens an external site in a new window Opens an external site\n\nTasty Logo BuzzFeed Logo Clock Play Pinterest Facebook Email Instagram Link SMS Twitter YouTube WhatsApp X Search Clock Right Arrow Arrow Down Caret down Caret up Caret left Caret right Hamburger Menu Pop Out Thumbs up Thumbs up buy Speech Audio on Replay Plus Minus Walmart Grocery Pickup Sad smiley face No results Swap More Your grocery bag Success Shop Tasty Merch'

In [48]:
article.nlp()

In [49]:
article.keywords

['japanese',
 'egg',
 'tamagoyaki',
 'grocery',
 'external',
 'recipe',
 'right',
 'thumbs',
 'opens',
 'tasty',
 'arrow',
 'logo',
 'window',
 'omelet',
 'clock',
 'caret']

In [50]:
article.summary

'Opens in a new window Opens an external site in a new window Opens an external siteTasty Logo BuzzFeed Logo Clock Play Pinterest Facebook Email Instagram Link SMS Twitter YouTube WhatsApp X Search Clock Right Arrow Arrow Down Caret down Caret up Caret left Caret right Hamburger Menu Pop Out Thumbs up Thumbs up buy Speech Audio on Replay Plus Minus Walmart Grocery Pickup Sad smiley face No results Swap More Your grocery bag Success Shop Tasty Merch'

In [51]:
import newspaper

In [52]:
indic_paper = newspaper.build('http://swarajyamag.com')

CRITICAL:newspaper.network:[REQUEST FAILED] 404 Client Error: Not Found for url: https://swarajyamag.com/feeds
CRITICAL:newspaper.network:[REQUEST FAILED] 404 Client Error: Not Found for url: https://swarajyamag.com/rss


In [53]:
for article in indic_paper.articles:
  print(article.url)

http://swarajyamag.com/politics/papalpreet-singh-believed-to-be-brain-behind-fugitive-amritpals-escape-arrested-by-punjab-police
http://swarajyamag.com/world/can-us-democracy-survive-joe-biden-not-to-speak-of-donald-trump
http://swarajyamag.com/economy/first-indian-production-line-for-apple-products-tata-group-all-set-to-take-over-wistrons-bengaluru-plan
http://swarajyamag.com/news-headlines/tamil-nadu-dmk-government-brings-resolution-in-state-assembly-urging-president-to-prescribe-time-period-to-governors-for-giving-assent-to-bills
http://swarajyamag.com/politics/sharad-pawars-differing-take-on-adani-what-hed-said-about-the-businessman-in-2015-book
http://swarajyamag.com/news-headlines/union-health-ministry-launches-nationwide-two-day-drill-to-assess-covid-preparedness-amid-rising-cases
http://swarajyamag.com/world/china-simulated-precision-strikes-against-key-targets-on-taiwan-on-second-day-of-drills
http://swarajyamag.com/politics/opposition-unity-tested-again-after-savarkar-cracks-

In [54]:
for category in indic_paper.category_urls():
  print(category)

http://swarajyamag.com/support
http://swarajyamag.com
https://swarajyamag.com
http://swarajyamag.com/all-issues
http://swarajyamag.com/heritage
http://swarajyamag.com/signin
http://swarajyamag.com/write-for-us
http://swarajyamag.com/style-guide
http://swarajyamag.com/presskit
http://swarajyamag.com/headlines
