In [1]:
from bs4 import BeautifulSoup
from urllib.request import Request,urlopen
from nltk.tokenize import sent_tokenize
import numpy as np
import pandas as pd
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx

In [2]:
def get_only_text(url):
    """ 
    return the title and the text of the article
    at the specified url
    """
    
    hdr = {'User-Agent': 'Mozilla/5.0'}
    req = Request(url,headers=hdr)
    page = urlopen(req)
    soup = BeautifulSoup(page, "lxml")
    text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
  
    print ("=====================")
    print (text)
    print ("=====================")
 
    return soup.title.text, text    
 
     


In [3]:
url="https://www.thehindu.com/news/national/three-indian-companies-get-licence-to-manufacture-nasas-ventilators-for-covid-19-patients/article31708809.ece"
text = get_only_text(url)

Three Indian companies have got licences from NASA to manufacture its indigenously developed ventilators for critical COVID-19 patients. The three Indian companies are Alpha Design Technologies Pvt Ltd, Bharat Forge Ltd and Medha Servo Drives Pvt Ltd, the space organisation said in a statement on Friday. Also read: Coronavirus | U.S. to donate ventilators to India: Donald Trump  Apart from the Indian firms, 18 other companies, including eight American and three Brazilian, have been selected to manufacture the critical breathing devices.  The National Aeronautics and Space Administration (NASA) developed the ventilator specifically for coronavirus patients at its Jet Propulsion Laboratory (JLP) in Southern California.  The JPL engineers designed the special ventilator— called VITAL — in little over a month and received ‘Emergency Use Authorization’ from the Food and Drug Administration on April 30.  The VITAL (Ventilator Intervention Technology Accessible Locally) equipment uses one-sev

In [4]:
import nltk
nltk.download('punkt')
  
 

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\aayus\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [5]:
sentences = []
for s in text:
    sentences.append(sent_tokenize(s))

sentences = [y for x in sentences for y in x]
sentences[10:20]

['“Our hope is to have this technology reach across the world and provide an additional source of solutions to deal with the on-going COVID-19 crisis,” he said.',
 'NASA said VITAL was developed with input from doctors and medical device manufacturers.',
 'A prototype of the JPL device was successfully tested by the Human Simulation Lab in the Department of Anesthesiology, Perioperative and Pain Medicine at Mount Sinai on April 23.',
 'A modified design, which uses compressed air and can be deployed by a greater range of hospitals, was recently tested at the UCLA Simulation Center in Los Angeles.',
 'A high-fidelity lung simulator tested almost 20 different ventilator settings, representing a number of scenarios that could be seen in critically ill patients in an intensive care unit, it said.',
 '“VITAL performed well in simulation testing with both precise and reproducible results,” said Dr Tisha Wang, clinical chief of the UCLA Division of Pulmonary and Critical Care Medicine.',
 '“I

In [6]:
# Extract word vectors
word_embeddings = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    word_embeddings[word] = coefs
f.close()

In [7]:
len(word_embeddings)

400000

In [8]:
 import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\aayus\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [9]:
# remove punctuations, numbers and special characters
clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")

# change to lowercase
clean_sentences = [s.lower() for s in clean_sentences]
stop_words = stopwords.words('english')
# function to remove stopwords
def remove_stopwords(sen):
    sen_new = " ".join([i for i in sen if i not in stop_words])
    return sen_new
clean_sentences = [remove_stopwords(r.split()) for r in clean_sentences]

In [10]:
sentence_vectors = []
for i in clean_sentences:
    if len(i) != 0:
        v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
    else:
        v = np.zeros((100,))
    sentence_vectors.append(v)

In [11]:
# Create an empty similarity matrix
sim_mat = np.zeros([len(sentences), len(sentences)])

In [12]:
for i in range(len(sentences)):
    for j in range(len(sentences)):
        if i != j:
            sim_mat[i][j] = cosine_similarity(sentence_vectors[i].reshape(1,100), sentence_vectors[j].reshape(1,100))[0,0]


In [13]:
nx_graph = nx.from_numpy_array(sim_mat)
scores = nx.pagerank(nx_graph)
ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)
# Extract top 15 sentences as the summary representation
for i in range(10):
    print(ranked_sentences[i][1])

“Our hope is to have this technology reach across the world and provide an additional source of solutions to deal with the on-going COVID-19 crisis,” he said.
At this difficult time, it becomes even more important that we have access to information that has a bearing on our health and well-being, our lives, and livelihoods.
Sign In  
Start your 14 days free trial
Sign Up You can support quality journalism by turning off ad blocker or purchase a subscription for unlimited access to The Hindu.
To enable wide dissemination of news that is in public interest, we have increased the number of articles that can be read free, and extended free trial periods.
Its flexible design means it also can be modified for use in field hospitals, the NASA statement read.
Printable version | Dec 17, 2020 10:07:51 PM | https://www.thehindu.com/news/national/three-indian-companies-get-licence-to-manufacture-nasas-ventilators-for-covid-19-patients/article31708809.ece
 
© THG PUBLISHING PVT LTD.
 
Precursor ch

In [14]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
from gensim.summarization import summarize
from gensim.summarization import keywords
import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

In [15]:
!python -m pip install -U gensim

Requirement already up-to-date: gensim in d:\anaconda3\lib\site-packages (3.8.3)


You should consider upgrading via the 'D:\Anaconda3\python.exe -m pip install --upgrade pip' command.


In [16]:
text = requests.get('https://www.thehindu.com/news/national/three-indian-companies-get-licence-to-manufacture-nasas-ventilators-for-covid-19-patients/article31708809.ece').text

print('Summary:')
print(summarize(text, ratio=0.01))

print('\nKeywords:')
print(keywords(text, ratio=0.01))

Summary:


2020-12-17 22:26:33,561 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-12-17 22:26:33,598 : INFO : built Dictionary(2322 unique tokens: ['compani', 'covid', 'hindu', 'indian', 'licenc']...) from 5211 documents (total 21915 corpus positions)
2020-12-17 22:26:33,617 : INFO : Building graph
2020-12-17 22:26:33,620 : INFO : Filling graph
2020-12-17 22:26:35,631 : INFO : PROGRESS: processing 1000/2425 doc (25 non zero elements)
2020-12-17 22:26:37,501 : INFO : PROGRESS: processing 2000/2425 doc (22 non zero elements)
2020-12-17 22:26:38,231 : INFO : Removing unreachable nodes of graph
2020-12-17 22:26:38,237 : INFO : Pagerank graph
2020-12-17 22:26:39,432 : INFO : Sorting pagerank scores


"@id": "https://www.thehindu.com/news/national/three-indian-companies-get-licence-to-manufacture-nasas-ventilators-for-covid-19-patients/article31708809.ece"
html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}body{margin:0}aside,header,nav,section{display:block}a{background-color:transparent}img{border:0}hr{-webkit-box-sizing:content-box;-moz-box-sizing:content-box;box-sizing:content-box;height:0}button,input{color:inherit;font:inherit;margin:0}button{overflow:visible}button{text-transform:none}button{-webkit-appearance:button}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}input{line-height:normal}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}*:before,*:after{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}input,button{font-family:inherit;font-size:inherit;line-height:inherit}img{vertical-align:middle}.container{margin-right:auto;margin-left:auto;padding-left:10px;padding-right:10px}@media(

var
class
article
articles
data function
span
div
content
width
text
style
true
important
return
returned
returning
document
documents
script
font
http
https
window
windows
display
displayed
background
url
new
news
color


In [17]:
!python -m pip install sumy

Collecting sumy
  Downloading sumy-0.8.1-py2.py3-none-any.whl (83 kB)
Collecting pycountry>=18.2.23
  Downloading pycountry-20.7.3.tar.gz (10.1 MB)
Collecting breadability>=0.1.20
  Downloading breadability-0.1.20.tar.gz (32 kB)
Collecting docopt<0.7,>=0.6.1
  Downloading docopt-0.6.2.tar.gz (25 kB)
Building wheels for collected packages: pycountry, breadability, docopt
  Building wheel for pycountry (setup.py): started
  Building wheel for pycountry (setup.py): finished with status 'done'
  Created wheel for pycountry: filename=pycountry-20.7.3-py2.py3-none-any.whl size=10746871 sha256=b2b36389059b900075dc522ec5474546129c107f51bb87c02c1eb1bf7375f0a4
  Stored in directory: c:\users\aayus\appdata\local\pip\cache\wheels\57\e8\3f\120ccc1ff7541c108bc5d656e2a14c39da0d824653b62284c6
  Building wheel for breadability (setup.py): started
  Building wheel for breadability (setup.py): finished with status 'done'
  Created wheel for breadability: filename=breadability-0.1.20-py2.py3-none-any.whl 

You should consider upgrading via the 'D:\Anaconda3\python.exe -m pip install --upgrade pip' command.


In [18]:
from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals
 
from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
 
from sumy.summarizers.luhn import LuhnSummarizer
from sumy.summarizers.edmundson import EdmundsonSummarizer
from sumy.summarizers.lex_rank import LexRankSummarizer

In [19]:

LANGUAGE = "english"
SENTENCES_COUNT = 10

In [20]:
url="https://www.thehindu.com/news/national/three-indian-companies-get-licence-to-manufacture-nasas-ventilators-for-covid-19-patients/article31708809.ece"
parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
print ("--LsaSummarizer--")    
summarizer = LsaSummarizer()
summarizer = LsaSummarizer(Stemmer(LANGUAGE))
summarizer.stop_words = get_stop_words(LANGUAGE)
for sentence in summarizer(parser.document, SENTENCES_COUNT):
    print(sentence)


--LsaSummarizer--
The National Aeronautics and Space Administration (NASA) developed the ventilator specifically for coronavirus patients at its Jet Propulsion Laboratory (JLP) in Southern California.
The JPL engineers designed the special ventilator— called VITAL — in little over a month and received ‘Emergency Use Authorization’ from the Food and Drug Administration on April 30.
A prototype of the JPL device was successfully tested by the Human Simulation Lab in the Department of Anesthesiology, Perioperative and Pain Medicine at Mount Sinai on April 23.
A modified design, which uses compressed air and can be deployed by a greater range of hospitals, was recently tested at the UCLA Simulation Center in Los Angeles.
A high-fidelity lung simulator tested almost 20 different ventilator settings, representing a number of scenarios that could be seen in critically ill patients in an intensive care unit, it said.
“VITAL performed well in simulation testing with both precise and reproducibl

In [21]:
print ("--LuhnSummarizer--")     
summarizer = LuhnSummarizer() 
summarizer = LuhnSummarizer(Stemmer(LANGUAGE))
summarizer.stop_words = ("I", "am", "the", "you", "are", "me", "is", "than", "that", "this")
for sentence in summarizer(parser.document, SENTENCES_COUNT):
    print(sentence)

--LuhnSummarizer--
Three Indian companies have got licences from NASA to manufacture its indigenously developed ventilators for critical COVID-19 patients.
The three Indian companies are Alpha Design Technologies Pvt Ltd, Bharat Forge Ltd and Medha Servo Drives Pvt Ltd, the space organisation said in a statement on Friday.
A modified design, which uses compressed air and can be deployed by a greater range of hospitals, was recently tested at the UCLA Simulation Center in Los Angeles.
The UCLA team commends JPL for actively contributing to the COVID-19 response and successfully addressing one of the key medical needs in the sickest group of patients,” a media statement said.
Briefing We brief you on the latest and most important developments, three times a day.
We have been keeping you up-to-date with information on the developments in India and the world that have a bearing on our health and wellbeing, our lives and livelihoods, during these difficult times.
To enable wide disseminatio

In [22]:
print ("--EdmundsonSummarizer--")     
summarizer = EdmundsonSummarizer() 
words1 = ("economy", "fight", "trade", "china")
summarizer.bonus_words = words1
     
words2 = ("another", "and", "some", "next")
summarizer.stigma_words = words2
    
words3 = ("another", "and", "some", "next")
summarizer.null_words = words3
for sentence in summarizer(parser.document, SENTENCES_COUNT):
    print(sentence)

--EdmundsonSummarizer--
Three Indian companies have got licences from NASA to manufacture its indigenously developed ventilators for critical COVID-19 patients.
The three Indian companies are Alpha Design Technologies Pvt Ltd, Bharat Forge Ltd and Medha Servo Drives Pvt Ltd, the space organisation said in a statement on Friday.
Apart from the Indian firms, 18 other companies, including eight American and three Brazilian, have been selected to manufacture the critical breathing devices.
The National Aeronautics and Space Administration (NASA) developed the ventilator specifically for coronavirus patients at its Jet Propulsion Laboratory (JLP) in Southern California.
The JPL engineers designed the special ventilator— called VITAL — in little over a month and received ‘Emergency Use Authorization’ from the Food and Drug Administration on April 30.
The VITAL (Ventilator Intervention Technology Accessible Locally) equipment uses one-seventh the parts of a traditional ventilator, relying on 

In [23]:
print ("--LexRankSummarizer--")   
summarizer = LexRankSummarizer()
summarizer = LexRankSummarizer(Stemmer(LANGUAGE))
summarizer.stop_words = ("I", "am", "the", "you", "are", "me", "is", "than", "that", "this")
for sentence in summarizer(parser.document, SENTENCES_COUNT):
    print(sentence)

--LexRankSummarizer--
Three Indian companies have got licences from NASA to manufacture its indigenously developed ventilators for critical COVID-19 patients.
Its flexible design means it also can be modified for use in field hospitals, the NASA statement read.
A prototype of the JPL device was successfully tested by the Human Simulation Lab in the Department of Anesthesiology, Perioperative and Pain Medicine at Mount Sinai on April 23.
You have reached your limit for free articles this month.
Subscription Benefits Include Today's Paper Find mobile-friendly version of articles from the day's newspaper in one easy-to-read list.
Unlimited Access Enjoy reading as many articles as you wish without any limitations.
To enable wide dissemination of news that is in public interest, we have increased the number of articles that can be read free, and extended free trial periods.
Support Quality Journalism Your support for our journalism is invaluable.
The Hindu has always stood for journalism th