# Newspaper API
---
* [Documentation](https://newspaper.readthedocs.io/)
* [Quick Start](https://pypi.org/project/newspaper3k/)

In [4]:
%%html
<iframe width="640" height="390" src="https://newspaper.readthedocs.io/"></iframe>

# Download an Article

In [50]:
from newspaper import Article
# from bs4 import BeautifulSoup as bs
# import requests
url = 'https://www.marketwatch.com/story/disney-reports-record-revenue-but-stock-falls-after-earnings-2019-08-06?mod=newsviewer_click'
article = Article(url)
article.download()
html = article.html
article.parse()


In [51]:
article.authors

['Jeremy C. Owens', 'Technology Editor']

In [52]:
article.publish_date

datetime.datetime(2019, 8, 6, 0, 0)

In [53]:
print(article.text)

The Walt Disney Co. reported record quarterly revenue Tuesday, but expectations for more broken records failed to materialize as Disney struggled to get a handle on its $70 billion purchase of Fox Corp. assets.

“I’ve been doing earnings calls for a long time, and this is one of our more complicated ones,” Chief Executive Robert Iger summed up at the beginning of a conference call Tuesday afternoon.

Disney DIS, +2.58% reported fiscal third-quarter profit of $1.76 billion, or 98 cents a share, on sales of $20.25 billion, up from revenue of $15.2 billion in the year-ago quarter. Disney’s previous record for quarterly revenue was $15.37 billion in the final quarter of the 2017 calendar year. After adjusting for restructuring charges and other effects, Disney claimed earnings of $1.35 a share, down from $1.87 a share a year ago and well below expectations. Analysts on average expected Disney to report adjusted earnings of $1.72 a share on sales of $21.45 billion, according to FactSet.

Sh

# Images

In [31]:
article.top_image

'http://s.marketwatch.com/public/resources/MWimages/MW-HI205_endgam_ZG_20190424151651.jpg'

In [32]:
# %%html
# <img class="image" src='http://s.marketwatch.com/public/resources/MWimages/MW-HI205_endgam_ZG_20190424151651.jpg' alt="MyText">

In [54]:
article.movies

[]

# NLP

In [55]:
article.nlp()
article.keywords

['revenue',
 'fox',
 'stock',
 'earnings',
 'disney',
 'reports',
 'record',
 'reported',
 'hurts',
 'billion',
 'falls',
 'acquisition',
 'quarter',
 'sales',
 'disneys',
 'operating',
 'share']

In [35]:
print(article.summary)

The Walt Disney Co. reported record quarterly revenue Tuesday, but expectations for more broken records failed to materialize as Disney struggled to get a handle on its $70 billion purchase of Fox Corp. assets.
Disney DIS, +2.58% reported fiscal third-quarter profit of $1.76 billion, or 98 cents a share, on sales of $20.25 billion, up from revenue of $15.2 billion in the year-ago quarter.
Disney’s previous record for quarterly revenue was $15.37 billion in the final quarter of the 2017 calendar year.
Analysts on average expected Disney to report adjusted earnings of $1.72 a share on sales of $21.45 billion, according to FactSet.
Disney reported $6.71 billion in revenue from its TV properties and $6.58 billion from the theme parks, which launched a new “Star Wars”-themed attraction in the quarter.


# Download Entire CNN Paper

In [46]:
import newspaper

cnn_paper = newspaper.build('http://cnn.com')
cnn_article = cnn_paper.articles[0]
cnn_article.download()
cnn_article.parse()
cnn_article.nlp()

In [38]:
for article in cnn_paper.articles:
    print(article.url)

http://cnn.com/business/media
http://cnn.com/2019/08/05/asia/hong-kong-strike-august-5-intl-hnk/index.html
http://cnn.com/2019/08/06/asia/north-korea-cryptocurrency-missile-tests-intl-hnk/index.html
http://cnn.com/2019/08/05/world/north-korea-projectile-launch-intl/index.html
http://cnn.com/2019/08/06/asia/sushma-swaraj-dead-intl/index.html
http://cnn.com/2019/08/06/asia/turkmenistan-president-gateway-to-hell-intl/index.html
http://cnn.com/2019/08/06/asia/india-kashmir-union-territory-intl-hnk/index.html
http://cnn.com/2019/08/06/asia/thai-actor-baby-sea-cow-intl/index.html
http://cnn.com/2019/08/06/asia/arctic-fires-russia-climate-change-intl/index.html
http://cnn.com/2019/08/06/asia/siberia-warehouse-fire-intl-hnk/index.html
http://cnn.com/2019/08/06/asia/kashmir-india-modi-analysis-intl-hnk/index.html
http://cnn.com/2019/08/05/asia/article-370-india-explainer-intl/index.html
http://cnn.com/asia/live-news/hong-kong-strike-protest-intl-hnk/index.html
http://cnn.com/2019/08/05/asia/ind

In [39]:
for category in cnn_paper.category_urls():
    print(category)

http://cnn.com
http://cnn.com/asia
http://cnn.com/africa
http://cnn.com/collection
http://cnn.com/entertainment
http://cnn.com/world
http://cnn.com/europe
http://cnnespanol.cnn.com
http://cnn.com/style
http://cnn.com/accessibility
http://cnn.com/travel
https://money.cnn.com
http://edition.cnn.com
http://cnn.com/uk
http://cnn.com/vr
http://arabic.cnn.com
http://us.cnn.com
http://cnn.com/more
https://www.cnn.com
http://cnn.com/australia
http://cnn.com/middle-east
http://cnn.com/opinions
http://cnn.com/china
http://cnn.com/videos
http://cnn.com/us
http://cnn.com/health
http://cnn.com/business
http://cnn.com/americas
http://cnn.com/tour
http://cnn.com/politics
http://cnn.com/transcripts
http://cnn.it
http://cnn.com/india


# Import Text

In [41]:
from newspaper import fulltext

html = requests.get('http://cnn.com/entertainment').text
text = fulltext(html)

In [42]:
text

"Olivia Newton-John doesn't want to know how long she has to live"

In [None]:
# Different Languages
from newspaper import Article
url = 'http://www.bbc.co.uk/zhongwen/simp/chinese_news/2012/12/121210_hongkong_politics.shtml'

a = Article(url, language='zh') # Chinese

a.download()
a.parse()

print(a.text[:150])
cnn_article = cnn_paper.articles[0]
cnn_article.download()
cnn_article.parse()
cnn_article.nlp()