In [None]:
!pip install newspaper3k transformers gradio --quiet

[K     |████████████████████████████████| 211 kB 32.9 MB/s 
[K     |████████████████████████████████| 4.0 MB 40.7 MB/s 
[K     |████████████████████████████████| 1.7 MB 48.5 MB/s 
[K     |████████████████████████████████| 87 kB 5.7 MB/s 
[K     |████████████████████████████████| 7.4 MB 66.8 MB/s 
[K     |████████████████████████████████| 81 kB 9.3 MB/s 
[K     |████████████████████████████████| 6.5 MB 43.9 MB/s 
[K     |████████████████████████████████| 77 kB 6.9 MB/s 
[K     |████████████████████████████████| 895 kB 54.3 MB/s 
[K     |████████████████████████████████| 596 kB 59.3 MB/s 
[K     |████████████████████████████████| 255 kB 63.8 MB/s 
[K     |████████████████████████████████| 53 kB 2.1 MB/s 
[K     |████████████████████████████████| 54 kB 3.1 MB/s 
[K     |████████████████████████████████| 1.1 MB 36.3 MB/s 
[K     |████████████████████████████████| 84 kB 2.6 MB/s 
[K     |████████████████████████████████| 211 kB 58.2 MB/s 
[K     |███████████████████████████

#Import Libraries

In [None]:
from newspaper import Article
from newspaper import Config
import nltk
nltk.download('punkt')

from transformers import pipeline
import gradio as gr
from gradio.mix import Parallel, Series

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


####**newspaper3k** — News, full-text, and article metadata extraction in Python 3.
####**nlkt** — To provide newspaper3k with tokenizing functionalities
####**transformers** — Provides you with thousands of state-of-the-art pre-trained models for a variety of natural language processing (NLP) tasks
####**Gradio** — A customizable graphical interface for Machine Learning models or even arbitrary Python functions

##Set up a user agent, which can allow us to grab information from certain websites without getting HTTP errors such as HTTP 403 Forbidden client error.

In [None]:
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'

config = Config()
config.browser_user_agent = USER_AGENT
config.request_timeout = 10

url = 'https://indianexpress.com/article/technology/science/elon-musk-spacex-20-year-anniversary-major-milestones-7821218/'
article = Article(url, config=config)

#Downlaod Article 

In [None]:
article.download()
article.html

'<!DOCTYPE html>\n<html lang="en" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://ogp.me/ns/fb#">\n<head>\n<meta charset="UTF-8">\n\t<meta name="viewport" content="width=990" />\n<!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->\n<!--[if lt IE 9]>\n      <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>\n      <script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>\n<![endif]-->\n\t<script type="text/javascript">\n\t\tvar story_id = 7821218\t</script>\n\t\t\t<title>Elon Musk&#8217;s SpaceX completes 20 years: A look at its important milestones | Technology News,The Indian Express</title><meta name="description" content="Read more to find out how SpaceX grew from a propulsion engineer&#039;s hobby to an industry giant that rivals national space agencies." /><meta name="news_keywords" content="Elon Musk, SpaceX, Falcon 1, Falcon 9, NASA, SpaceX Dragon, Starlink, Starship, Space, Moon, Launch vehicle

# Segmentation/Parsing of Article

In [None]:
article.parse() 

authors = ", ".join(author for author in article.authors)
title = article.title
date = article.publish_date
text = article.text
image = article.top_image
videos = article.movies
url = article.url


In [None]:
print("Information about the article")
print("=" * 30)
print(f"Title: {title}")
print(f"Author(s): {authors}")
print(f"Publish date: {date}")
print(f"Image: {image}")
print(f"Videos: {videos}")
print(f"Article link: {url}")
print(f"Content: {text[:100] + '...'}")

Information about the article
Title: Elon Musk’s SpaceX completes 20 years: A look at its important milestones
Author(s): Var Follow_Widget_Data, Af_Article_Count, Ie_Mobile_Check, No, Ajax_Url, Https, Indianexpress.Com, Wp-Admin, Admin-Ajax.Php, Tracking_C
Publish date: 2022-03-15 20:04:29+05:30
Image: https://images.indianexpress.com/2022/03/SpaceX-20-years.jpg
Videos: []
Article link: https://indianexpress.com/article/technology/science/elon-musk-spacex-20-year-anniversary-major-milestones-7821218/
Content: Elon Musk’s SpaceX marks twenty years today. It has become one of the biggest private space companie...


## Extraction of important keywords using NLP

In [None]:
article.nlp()


####Calling .keywords on our article, and sorting them alphabetically, we see the important keywords are actors, ai, audio, voice, and so on.

In [None]:
keywords = article.keywords
keywords.sort()
print(keywords)

['20', 'company', 'completes', 'elon', 'falcon', 'important', 'launch', 'launched', 'look', 'milestones', 'mission', 'musks', 'orbit', 'private', 'spacecraft', 'spacex', 'successfully']


In [None]:
keywords = "\n".join(keyw for keyw in keywords)

In [None]:
print(f"Article Keywords: \n{keywords}")

Article Keywords: 
20
company
completes
elon
falcon
important
launch
launched
look
milestones
mission
musks
orbit
private
spacecraft
spacex
successfully


##Newspaper library summary

In [None]:
print(f"Summary: \n{article.summary}")

Summary: 
For one, SpaceX is the first private company to launch, orbit, and recover a spacecraft.
It is also the first private company to send astronauts to orbit and to the International Space Station (ISS).
This made SpaceX the first private company to successfully launch, orbit, and recover a spacecraft.
In May 2019, SpaceX launched a constellation of 60 Starlink satellites on a Falcon 9 rocket.
In September 2021, SpaceX launched the Inspiration4 mission, successfully completing the first orbital spaceflight mission with only private citizens on board.


##**What is Hugging Face?**
Hugging Face provides thousands of pre-trained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, and more in over 100 languages.
Its main goal is to make cutting-edge NLP easier to use for everyone.
With the transformers library, you can load a model with just a few lines of code, fine-tune them on your own datasets, and share them on their model hub.

##summarize with Hugging Face and Gradio

In [None]:
text

'Elon Musk’s SpaceX marks twenty years today. It has become one of the biggest private space companies in the world and achieved some key milestones as well. For one, SpaceX is the first private company to launch, orbit, and recover a spacecraft. It is also the first private company to send astronauts to orbit and to the International Space Station (ISS). It is also trying to build its satellite internet service with Starlink, which uses ‘mega-constellations’ of small satellites for the same.\n\nWe take a look at SpaceX’s key achievements over the past 20 years.\n\nFalcon 1 and NASA\n\nIn 2006, NASA awarded a “Commercial Orbital Transportation Services” (COTS) contract to SpaceX, where the company had to demonstrate cargo delivery capabilities to the ISS with a contract option for crew transport. After multiple failed launches in 2006 and 2008, SpaceX successfully launched its Falcon 1 launch vehicle on September 28 2008, making it the first privately-developed liquid-fueled rocket to 

###Comparing hugging face summarization models

###The four models (chosen based on top downloads) we will be using are:
distilbart-cnn-12–6<br>
bart-large-cnn (from Facebook)<br>
pegasus-xsum (from Google)<br>
distilbart-cnn-6–6 (a more lightweight version of distilbart-cnn-12–6)

Pasting the text into the text box, hit the submit button.
Since there are 4 models running, it might take a while to run.
Note: Copy the text from above Text given.

In [None]:
io1 = gr.Interface.load('huggingface/sshleifer/distilbart-cnn-12-6')
io2 = gr.Interface.load("huggingface/facebook/bart-large-cnn")
io3 = gr.Interface.load("huggingface/google/pegasus-xsum")  
io4 = gr.Interface.load("huggingface/sshleifer/distilbart-cnn-6-6")                   

iface = Parallel(io1, io2, io3, io4,
                 theme='huggingface', 
                 inputs = gr.inputs.Textbox(lines = 10, label="Text"))

iface.launch()

Fetching model from: https://huggingface.co/sshleifer/distilbart-cnn-12-6
Fetching model from: https://huggingface.co/facebook/bart-large-cnn
Fetching model from: https://huggingface.co/google/pegasus-xsum
Fetching model from: https://huggingface.co/sshleifer/distilbart-cnn-6-6
Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://58174.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


(<fastapi.applications.FastAPI at 0x7f49be422cd0>,
 'http://127.0.0.1:7860/',
 'https://58174.gradio.app')

###From the outputs, the facebook/bart-large-cnn model seems to have the best summary as it captures what the new startup is doing and talks about the Important milestoens of Elon-Musk with spaceX.