In [59]:
from newspaper import Article

url = 'https://www.nytimes.com/2010/04/13/opinion/13wenzel.html'

In [60]:
article = Article(url)
article.download()
article.parse()
article.nlp()
print(article.keywords)
print(article.summary)

['control', 'virus', 'public', 'children', 'learned', 'health', 'work', 'vaccination', 'schools', 'h1n1s', 'opinion', 'need', 'authorities']
Public health groups emphasized the necessity of frequent hand-washing, which surely helped reduce transmission.
In our own country, the virus struck at a time when Americans seemed particularly skeptical about our government and large institutions.
It is not an easy task, but our public health authorities need to become clearer about the lexicon of uncertainty  what they know and don’t know about a pandemic.
Even as we work to solve these enigmas, we can try to prepare better for future pandemics.
First, we need to approach disease control not as individual nations, but as a global community.


In [61]:
article.publish_date

datetime.datetime(2010, 4, 13, 0, 0)

In [36]:
from nytimesarticle import articleAPI
api = articleAPI('GiEgG2mBQSbTKKIBkrZVb1oDv9ru7HAw')

In [40]:
articles = api.search( q = 'H1N1', )

In [41]:
def parse_articles(articles):
    '''
    This function takes in a response to the NYT api and parses
    the articles into a list of dictionaries
    '''
    news = []
    for i in articles['response']['docs']:
        dic = {}
        dic['id'] = i['_id']
        if i['abstract'] is not None:
            dic['abstract'] = i['abstract'].encode("utf8")
        dic['headline'] = i['headline']['main'].encode("utf8")
        dic['desk'] = i['news_desk']
        dic['date'] = i['pub_date'][0:10] # cutting time of day.
        dic['section'] = i['section_name']
        if i['snippet'] is not None:
            dic['snippet'] = i['snippet'].encode("utf8")
        dic['source'] = i['source']
        dic['type'] = i['type_of_material']
        dic['url'] = i['web_url']
        dic['word_count'] = i['word_count']
        # locations
        locations = []
        for x in range(0,len(i['keywords'])):
            if 'glocations' in i['keywords'][x]['name']:
                locations.append(i['keywords'][x]['value'])
        dic['locations'] = locations
        # subject
        subjects = []
        for x in range(0,len(i['keywords'])):
            if 'subject' in i['keywords'][x]['name']:
                subjects.append(i['keywords'][x]['value'])
        dic['subjects'] = subjects   
        news.append(dic)
    return(news)

In [42]:
def get_articles(date,query):
    '''
    This function accepts a year in string format (e.g.'1980')
    and a query (e.g.'Amnesty International') and it will 
    return a list of parsed articles (in dictionaries)
    for that year.
    '''
    all_articles = []
    for i in range(0,100): #NYT limits pager to first 100 pages. But rarely will you find over 100 pages of results anyway.
        articles = api.search(q = query,
               fq = {'source':['Reuters','AP', 'The New York Times']},
               begin_date = date + '0101',
               end_date = date + '1231',
               sort='oldest',
               page = str(i))
        articles = parse_articles(articles)
        all_articles = all_articles + articles
    return(all_articles)

In [44]:
get_articles("2009", "H1N1")

TypeError: can only concatenate str (not "bytes") to str

In [70]:
articles['response']['docs'][5]

{'abstract': 'An extensive review shows side effects no different from those of seasonal flu vaccines, health officials reported.',
 'web_url': 'https://www.nytimes.com/2009/12/05/health/05flu.html',
 'snippet': 'An extensive review shows side effects no different from those of seasonal flu vaccines, health officials reported.',
 'lead_paragraph': 'An extensive review of adverse effects from the swine flu vaccine indicates that the vaccine is safe, with side effects no different from those of seasonal flu vaccines, health officials reported on Friday. ',
 'print_section': 'A',
 'print_page': '13',
 'source': 'The New York Times',
 'multimedia': [],
 'headline': {'main': 'Review Shows Safety of H1N1 Vaccine, Officials Say',
  'kicker': None,
  'content_kicker': None,
  'print_headline': 'Review Shows Safety of H1N1 Vaccine, Officials Say',
  'name': None,
  'seo': None,
  'sub': None},
 'keywords': [{'name': 'subject',
   'value': 'Vaccination and Immunization',
   'rank': 1,
   'major'

In [66]:
url = articles['response']['docs'][2]["web_url"]
article = Article(url)
article.download()
article.parse()
article.nlp()
print(article.keywords)
print(article.summary)

['measures', 'h1n1', 'child', 'local', 'monitor', 'flu', 'opinion', 'parents', 'school', 'schools', 'severe', 'abcs', 'team']
Schools can take several measures to help keep flu from spreading: Hand-washing and coughing or sneezing into the arm or a tissue should be stressed.
Hand-sanitizing gel dispensers should be available throughout schools for both staff members and students.
To make sure that such measures are taken in all schools, every county should create an influenza action team run by the local health department and including parents and school administrators.
(The team should also include local doctors and nurses, government officials and the news media.)
Since the novel H1N1 virus infection so far appears to be about as severe as ordinary seasonal flu, parents should seek a doctor’s help if the child seems to have a severe case or is at high risk for complications.


In [12]:
import newspaper

In [117]:
paper = newspaper.build('https://edition.cnn.com/2013/01/11/health', memoize_articles=False, request_timeout=15)

In [115]:
len(paper.articles)

0

In [116]:
for article in paper.articles:
    if "2013" in article.url:
        print(article.url)

In [93]:
article = Article("https://edition.cnn.com/2013/01/11/health/flu-dont-know/index.html")

In [94]:
article.download()
article.parse()
article.nlp()
print(article.keywords)
print(article.summary)

['dont', 'strains', 'peak', 'vaccines', 'viruses', 'flu', 'things', 'know', 'influenza', 'season', 'seasonal', 'watch', 'watched']
Here are three things we still don't know about the flu:When flu seasons begin and end.
Seasonal flu activity can begin as early as October and continue to occur as late as May, according to the Seasonal flu activity can begin as early as October and continue to occur as late as May, according to the Centers for Disease Control and Prevention .
"This is the earliest regular flu season we've had in nearly a decade, since the 2003-2004 flu season," he said.
03:02JUST WATCHED Your flu questions answered Replay More Videos ... MUST WATCH Your flu questions answered 00:59JUST WATCHED Saving your child's life from the flu Replay More Videos ... MUST WATCH Saving your child's life from the flu 02:10JUST WATCHED Worst flu season in a decade Replay More Videos ... MUST WATCH Worst flu season in a decade 03:19When flu cases peak.
The CDC considers the second peak, in

In [36]:
from newspaper.source import Source

In [83]:
a = Source(url="http://cnn.com/2009/", memoize_articles=False, )

In [84]:
a.build()

In [86]:
a.categories[5].url

'http://cnn.com/china'

In [87]:
article = a.articles[0]

In [91]:
for article in a.articles:
    if "2009" not in article.url:
        continue
    print(article.url)
    try:
        article.download()
        article.parse()
        
#         print("Success")
        print(article.publish_date)
    except KeyboardInterrupt:
        raise
    except:
        print("Fail")
        pass

https://cnnespanol.cnn.com/2020/04/03/la-economia-estadounidense-perdio-701-000-empleos-en-marzo-el-peor-reporte-desde-2009/
2020-04-03 00:00:00
