## Newspaper Python library

News, full-text, and article metadata extraction in Python 3.

https://github.com/codelucas/newspaper

---

  "Newspaper delivers Instapaper style article extraction." -- The Changelog 

---

First, install the module:
* pip install newspaper3k



In [1]:
from newspaper import Article

### Retrieving article from a news site

In [3]:
url = "https://www.bbc.com/news/science-environment-54435638"

article = Article(url)

article

<newspaper.article.Article at 0x1125866d0>

In [4]:
article.download()

article.html[:1000]

'\n\n<!DOCTYPE html>\n<html lang="en" id="responsive-news">\n<head  prefix="og: http://ogp.me/ns#">\n    <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=1">\n    <meta charset="utf-8">\n    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n    <title>Prince William and Sir David Attenborough join forces on \'Earthshot\' prize - BBC News</title>\n    <meta name="description" content="The Duke of Cambridge and Sir David launch the biggest environmental award ever.">\n    <link rel="preload" as="style" href="https://static.bbc.co.uk/news/1.319.13664/stylesheets/services/news/compact.css" media="(max-width: 599px)">\n    <link rel="preload" as="style" href="https://static.bbc.co.uk/news/1.319.13664/stylesheets/services/news/tablet.css" media="(min-width: 600px) and (max-width: 1007px)">\n    <link rel="preload" as="style" href="https://static.bbc.co.uk/news/1.319.13664/stylesheets/services/news/wide.css" media="(min-width: 1008px)">\n\n      

### Parsing the retrieved article

In [5]:
article.parse()

In [6]:
article.authors

['Justin Rowlatt', 'Chief Environment Correspondent']

In [7]:
# this does not work for BBC articles
article.publish_date

In [8]:
article.title

"Prince William and Sir David Attenborough join forces on 'Earthshot' prize"

In [10]:
article.text[:1000]

print(article.text[:1000])

Image copyright Earthshot Image caption The prize launched by Sir David and Prince William is looking for "brilliant" projects to save the planet

Prince William and Sir David Attenborough have joined forces to launch what they hope will become the "Nobel Prize for environmentalism".

They say the search is on for 50 solutions to the world's gravest environmental problems by 2030.

With £50m to be awarded over a decade, the "Earthshot Prize" is the biggest environmental prize ever.

The Prince said "positivity" had been missing from the climate debate - something the award could supply.

"The Earthshot prize is really about harnessing that optimism and that urgency to find some of the world's solutions to some of the greatest environmental problems," he told the BBC.

Anyone could win,he explained, as he called for "amazing people" to create "brilliant innovative projects".

These, he said, could help save the planet.

To mark the event BBC Radio 4's Today Programme has secured an unpr

In [11]:
article.top_image

'https://ichef.bbci.co.uk/news/1024/branded_news/C558/production/_114802505_dsc043162.jpg'

In [12]:
from IPython.display import Image
Image(article.top_image, width=500)

<IPython.core.display.Image object>

In [13]:
article.movies

[]

### Text analysis

In [14]:
article.nlp()

In [15]:
help(article.nlp)


Help on method nlp in module newspaper.article:

nlp() method of newspaper.article.Article instance
    Keyword extraction wrapper



In [16]:
article.keywords

['environmental',
 'ideas',
 'join',
 'attenborough',
 'planet',
 'forces',
 'prize',
 'prince',
 'earthshot',
 'caption',
 'sir',
 'david',
 'william']

In [18]:
print(article.summary)

Image copyright Earthshot Image caption The prize launched by Sir David and Prince William is looking for "brilliant" projects to save the planetPrince William and Sir David Attenborough have joined forces to launch what they hope will become the "Nobel Prize for environmentalism".
With £50m to be awarded over a decade, the "Earthshot Prize" is the biggest environmental prize ever.
Prince William and Sir David will be joined on an "Earthshot Prize Council" by celebrities from the worlds of entertainment, sport, business, charity and the environment.
Prince William and Sir David Attenborough have set themselves the dizzyingly ambitious goal of "repairing the planet by 2030".
And it explains why Prince William and Sir David Attenborough are saying don't hesitate to apply if you think you've got an idea that could help.


### Exploring metadata 

In [19]:
article.meta_data

defaultdict(dict,
            {'viewport': 'width=device-width, initial-scale=1.0',
             'description': 'The Duke of Cambridge and Sir David launch the biggest environmental award ever.',
             'x-country': 'lv',
             'x-audience': 'International',
             'CPS_AUDIENCE': 'International',
             'CPS_CHANGEQUEUEID': 164398940,
             'og': {'title': "'Earthshot': William and Attenborough launch prize to save planet",
              'type': 'article',
              'description': 'The Duke of Cambridge and Sir David launch the biggest environmental award ever.',
              'site_name': 'BBC News',
              'locale': 'en_GB',
              'url': 'https://www.bbc.com/news/science-environment-54435638',
              'image': {'identifier': 'https://ichef.bbci.co.uk/news/1024/branded_news/C558/production/_114802505_dsc043162.jpg',
               'alt': 'BBC News. David Attenborough and the Duke of Cambridge'}},
             'article': {'autho

In [20]:
article.images

{'https://a1.api.bbc.co.uk/hit.xiti?&col=1&from=p&ptag=js&s=598253&p=science_and_environment::news.science_and_environment.story.54435638.page&x1=[urn:bbc:cps:cb864c76-6c09-4cf0-add9-1f32ce7d3cd2]&x2=[responsive]&x3=[bbc_website]&x4=[en]&x7=[article]&x8=[reverb-1.6.0-nojs]&x11=[NEWS_GNL]&x12=[NEWS]',
 'https://ichef.bbci.co.uk/images/ic/720x405/p08tp51g.jpg',
 'https://ichef.bbci.co.uk/images/ic/720x405/p08tpsj4.jpg',
 'https://ichef.bbci.co.uk/news/1024/branded_news/C558/production/_114802505_dsc043162.jpg',
 'https://ichef.bbci.co.uk/news/320/cpsprodpb/C558/production/_114802505_dsc043162.jpg',
 'https://ssc.api.bbc.com/?c1=2&c2=19293874&ns_site=bbc&name='}