# Newspaper3k example

Code source: 
* http://theautomatic.net/2020/08/05/how-to-scrape-news-articles-with-python/
* https://newspaper.readthedocs.io/en/latest/

# Import modules and set up

In [1]:
from newspaper import Article

url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article = Article(url)

In [2]:
# Download the article html
article.download()
# Parse the html
article.parse()

## Examples of attributes

In [3]:
article.title

'New Year, new laws: Obamacare, pot, guns and drones'

In [4]:
article.authors

['Cnn Wire']

In [5]:
article.publish_date

datetime.datetime(2013, 12, 30, 0, 0)

In [6]:
article.text

'By Leigh Ann Caldwell\n\nWASHINGTON (CNN) — Not everyone subscribes to a New Year’s resolution, but Americans will be required to follow new laws in 2014.\n\nSome 40,000 measures taking effect range from sweeping, national mandates under Obamacare to marijuana legalization in Colorado, drone prohibition in Illinois and transgender protections in California.\n\nAlthough many new laws are controversial, they made it through legislatures, public referendum or city councils and represent the shifting composition of American beliefs.\n\nFederal: Health care, of course, and vending machines\n\nThe biggest and most politically charged change comes at the federal level with the imposition of a new fee for those adults without health insurance.\n\nFor 2014, the penalty is either $95 per adult or 1% of family income, whichever results in a larger fine.\n\nThe Obamacare, of Affordable Care Act, mandate also requires that insurers cover immunizations and some preventive care.\n\nAdditionally, mil

## Natural Language Processing results

In [7]:
# NLP 
article.nlp()

In [8]:
article.summary

'Oregon: Family leave in Oregon has been expanded to allow eligible employees two weeks of paid leave to handle the death of a family member.\nArkansas: The state becomes the latest state requiring voters show a picture ID at the voting booth.\nMinimum wage and former felon employmentWorkers in 13 states and four cities will see increases to the minimum wage.\nNew Jersey residents voted to raise the state’s minimum wage by $1 to $8.25 per hour.\nCalifornia is also raising its minimum wage to $9 per hour, but workers must wait until July to see the addition.'

In [9]:
article.keywords

['guns',
 'wage',
 'obamacare',
 'law',
 'drones',
 'latest',
 'family',
 'minimum',
 'states',
 'leave',
 'state',
 'laws',
 'pot',
 'national']

# News Scraping from media source

In [7]:
import newspaper
# Source CNN
source_url = "http://cnn.com"
source = newspaper.build(source_url)

In [8]:
# The url in the source
for article in source.articles:
    print(article.url)

In [9]:
# URLs in categories
for category in source.category_urls():
    print(category)

http://cnn.com
http://edition.cnn.com
https://www.cnn.com
http://cnn.com/europe
http://cnn.com/transcripts
http://cnn.com/india
http://cnn.com/uk
http://cnn.com/us
http://cnn.com/more
http://cnn.com/americas
http://cnn.com/africa
http://cnn.com/weather
http://cnn.com/world
http://cnn.com/politics
http://cnn.com/opinions
http://cnn.com/accessibility
http://cnn.com/middle-east
http://arabic.cnn.com
http://cnnespanol.cnn.com
http://cnn.it
http://us.cnn.com
http://cnn.com/videos
http://cnn.com/vr
http://cnn.com/health
https://money.cnn.com
http://cnn.com/travel
http://cnn.com/australia
http://cnn.com/style
http://cnn.com/collection
http://cnn.com/entertainment
http://cnn.com/asia
http://cnn.com/china
http://cnn.com/business
http://cnn.com/audio


In [10]:
# Select the article wanted and download
article = source.articles[0]
article.download()
# Parse the html
article.parse()

IndexError: list index out of range

# Language setup

In [None]:
from newspaper import Article
url = 'http://www.bbc.co.uk/zhongwen/simp/chinese_news/2012/12/121210_hongkong_politics.shtml'

In [None]:
a = Article(url, language='zh') # Chinese

a.download()
a.parse()

In [None]:
print(a.text[:150])
print(a.title)