# Using News API
You can search for articles with any combination of the following criteria:

Keyword or phrase. Eg: find all articles containing the word 'Microsoft'.
* Date published. Eg: find all articles published yesterday.
* Source name. Eg: find all articles by 'TechCrunch'.
* Source domain name. Eg: find all articles published on thenextweb.com.
* Language. Eg: find all articles written in English.


You can sort the results in the following orders:

* Date published
* Relevancy to search keyword
* Popularity of source

# Query Parameters 

### q

Keywords or phrases to search for in the article title and body.

Advanced search is supported here:

* Surround phrases with quotes (") for exact match.
* Prepend words or phrases that must appear with a + symbol. Eg: +bitcoin
* Prepend words that must not appear with a - symbol. Eg: -bitcoin
* Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Eg: crypto AND (ethereum OR litecoin) NOT bitcoin.
* The complete value for q must be URL-encoded.

# qInTitle

Keywords or phrases to search for in the article title only.

Advanced search is supported here:

* Surround phrases with quotes (") for exact match.
* Prepend words or phrases that must appear with a + symbol. Eg: +bitcoin
* Prepend words that must not appear with a - symbol. Eg: -bitcoin
* Alternatively you can use the AND / OR / NOT keywords, and optionally group these with parenthesis. Eg: crypto AND (ethereum OR litecoin) NOT bitcoin.
* The complete value for qInTitle must be URL-encoded.

# sortBy
* relevancy = articles more closely related to q come first.
* popularity = articles from popular sources and publishers come first.
* publishedAt = newest articles come first.

In [38]:
import requests
import json
from config import n_key
import os
import pandas as pd

In [25]:
# Pulling out news sources that are available to us 
base_url_sources = 'https://newsapi.org/v2/sources?'
response = requests.get(f"{base_url_sources}apiKey={n_key}").json()
print(json.dumps(response, indent=4))

{
    "status": "ok",
    "sources": [
        {
            "id": "abc-news",
            "name": "ABC News",
            "description": "Your trusted source for breaking news, analysis, exclusive interviews, headlines, and videos at ABCNews.com.",
            "url": "https://abcnews.go.com",
            "category": "general",
            "language": "en",
            "country": "us"
        },
        {
            "id": "abc-news-au",
            "name": "ABC News (AU)",
            "description": "Australia's most trusted source of local, national and world news. Comprehensive, independent, in-depth analysis, the latest business, sport, weather and more.",
            "url": "http://www.abc.net.au/news",
            "category": "general",
            "language": "en",
            "country": "au"
        },
        {
            "id": "aftenposten",
            "name": "Aftenposten",
            "description": "Norges ledende nettavis med alltid oppdaterte nyheter innenfor innenriks

In [43]:
# Pulling out domains of interest
sources = response['sources']
sources_master = []
for index in range(0, len(sources)):
    source_dict = {
        'Name': sources[index]['name'],
        'url': sources[index]['url'],
        'country': sources[index]['country']
    }
    sources_master.append(source_dict)
sources_master

[{'Name': 'ABC News', 'url': 'https://abcnews.go.com', 'country': 'us'},
 {'Name': 'ABC News (AU)',
  'url': 'http://www.abc.net.au/news',
  'country': 'au'},
 {'Name': 'Aftenposten', 'url': 'https://www.aftenposten.no', 'country': 'no'},
 {'Name': 'Al Jazeera English',
  'url': 'http://www.aljazeera.com',
  'country': 'us'},
 {'Name': 'ANSA.it', 'url': 'http://www.ansa.it', 'country': 'it'},
 {'Name': 'Argaam', 'url': 'http://www.argaam.com', 'country': 'sa'},
 {'Name': 'Ars Technica', 'url': 'http://arstechnica.com', 'country': 'us'},
 {'Name': 'Ary News', 'url': 'https://arynews.tv/ud/', 'country': 'pk'},
 {'Name': 'Associated Press', 'url': 'https://apnews.com/', 'country': 'us'},
 {'Name': 'Australian Financial Review',
  'url': 'http://www.afr.com',
  'country': 'au'},
 {'Name': 'Axios', 'url': 'https://www.axios.com', 'country': 'us'},
 {'Name': 'BBC News', 'url': 'http://www.bbc.co.uk/news', 'country': 'gb'},
 {'Name': 'BBC Sport', 'url': 'http://www.bbc.co.uk/sport', 'country'

In [47]:
# Store sources in a dataframe
all_sources = pd.DataFrame(sources_master)
all_sources.to_csv('allsourcesNewsAPIdata.csv', index=False)
us_sources = all_sources.loc[all_sources['country']=='us'].reset_index(drop=True)
us_sources.to_csv('ussourcesNewsAPIdata.csv', index=False)
us_sources


Unnamed: 0,Name,url,country
0,ABC News,https://abcnews.go.com,us
1,Al Jazeera English,http://www.aljazeera.com,us
2,Ars Technica,http://arstechnica.com,us
3,Associated Press,https://apnews.com/,us
4,Axios,https://www.axios.com,us
5,Bleacher Report,http://www.bleacherreport.com,us
6,Bloomberg,http://www.bloomberg.com,us
7,Breitbart News,http://www.breitbart.com,us
8,Business Insider,http://www.businessinsider.com,us
9,Buzzfeed,https://www.buzzfeed.com,us


In [21]:
# Base url includes: 
# language parameter which we will not change (English) 
# page size parameter to return maximum number of articles (100) 
# sortBy parameter which we need to decide on

search_term = 'immigration'
sort_option = 'relevance'
# page = will determine based on initial searches

base_url = ('http://newsapi.org/v2/everything?')
params = {
    'language': 'en',
    'pageSize': 100,
    'q': search_term,
#move this one to base_URL after we make a decision
    'sortBy': sort_option,
    'apiKey': n_key
    #'page': page_num
    
}

response = requests.get(base_url, params)
data = response.json()
print(json.dumps(data, indent=4))

{
    "status": "ok",
    "totalResults": 8623,
    "articles": [
        {
            "source": {
                "id": null,
                "name": "Lifehacker.com"
            },
            "author": "Mike Winters",
            "title": "How to Spot the Most Common COVID-Related Scams",
            "description": "Part of the \u201cnew normal\u201d of the pandemic is the uptick in COVID-related scams. More than 200,000 Americans have lost a sum total of around $145 million to them since the start of the year, according to the Federal Trade Commission (FTC). Here are some of the \u2026",
            "url": "https://lifehacker.com/how-to-spot-the-most-common-covid-related-scams-1845287251",
            "urlToImage": "https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/qeeid4mmcpaa8fkvycfm.jpg",
            "publishedAt": "2020-10-06T18:45:00Z",
            "content": "Part of the new normal of the pandemic is the uptick in 

In [46]:
search_terms = ['immigration', 'immigrants', 'migrants', 'refugees']
sort_option = 'relevance'
# page = will determine based on initial searches

base_url = ('http://newsapi.org/v2/everything?')
params = {
    'language': 'en',
    'pageSize': 100,
#move this one to base_URL after we make a decision
    'sortBy': sort_option,
    'apiKey': n_key
    #'page': page_num
    
}

totalResults = {}

for term in search_terms:
    params['q'] = term
    response = requests.get(base_url, params)
    data = response.json()
    total = data['totalResults']
    totalResults[term] = total
print(totalResults)

{'immigration': 8636, 'immigrants': 5607, 'migrants': 4183, 'refugees': 4394}


In [39]:
total_results = data['totalResults']
articles = data['articles']
data_master = []
for index in range(0, len(articles)): 
    article_dict = {
        'Keyword': 'immigration',
        'Source': articles[index]['source']['name'],
        'Author': articles[index]['author'],
        'Title': articles[index]['title'],
        'URL': articles[index]['url'],
        'Text': articles[index]['content']
        'Published': articles[index]['publishedAt']
    }
    data_master.append(article_dict)

In [23]:
data_df = pd.DataFrame(data_master)
data_df.to_csv('initialNewsAPIdata.csv', index=False)
data_df

Unnamed: 0,Keyword,Source,Author,Title,URL,Text
0,immigration,Lifehacker.com,Mike Winters,How to Spot the Most Common COVID-Related Scams,https://lifehacker.com/how-to-spot-the-most-co...,Part of the new normal of the pandemic is the ...
1,immigration,Wired,WIRED Staff,One Free Press Coalition Spotlights Journalist...,https://www.wired.com/story/one-free-press-coa...,"In May 2019, WIRED joined the One Free Press C..."
2,immigration,TechCrunch,Walter Thompson,Dear Sophie: What is a J-1 visa and how can we...,http://techcrunch.com/2020/09/09/dear-sophie-w...,More posts by this contributor\r\nHere’s anoth...
3,immigration,TechCrunch,Walter Thompson,Dear Sophie: Now that a judge has paused Trump...,http://techcrunch.com/2020/10/05/dear-sophie-n...,"More posts by this contributor\r\nOn Thursday,..."
4,immigration,TechCrunch,Walter Thompson,Dear Sophie: Possible to still get through I-7...,http://techcrunch.com/2020/09/23/dear-sophie-p...,More posts by this contributor\r\nHere’s anoth...
...,...,...,...,...,...,...
95,immigration,Reuters,Reuters Staff,Hungary PM Orban says EU commission immigratio...,https://www.reuters.com/article/us-europe-migr...,By Reuters Staff\r\nFILE PHOTO: Hungary's Prim...
96,immigration,Reuters,Michael Shields,"With echoes of Brexit, Swiss set to vote on im...",https://www.reuters.com/article/uk-swiss-eu-id...,ZURICH (Reuters) - Swiss voters will decide on...
97,immigration,Reuters,Daniel Wiessner,SCOTUS to decide if appeals courts must credit...,https://www.reuters.com/article/immigration-sc...,The U.S. Supreme Court on Friday agreed to dec...
98,immigration,Reuters,Jan Wolfe,Factbox: Notable legal opinions of U.S. Suprem...,https://www.reuters.com/article/us-usa-court-b...,"(Reuters) - Amy Coney Barrett, who President D..."


In [24]:
data_df['Source'].unique()

array(['Lifehacker.com', 'Wired', 'TechCrunch', 'Gizmodo.com',
       'New York Times', 'Mashable', 'BBC News', 'CNN', 'Reuters'],
      dtype=object)

In [41]:
len(data_df['Title'].unique())

98

In [42]:
data_df['Title'].unique()

array(['How to Spot the Most Common COVID-Related Scams',
       'One Free Press Coalition Spotlights Journalists Under Attack - October 2020',
       'Dear Sophie: What is a J-1 visa and how can we use it?',
       'Dear Sophie: Now that a judge has paused Trump’s H-1B visa ban, how can I qualify my employees?',
       'Dear Sophie: Possible to still get through I-751 and citizenship after divorce?',
       'Dear Sophie: Is it easier and faster to get an O-1A than an EB-1A?',
       'Daily Crunch: Facebook unveils the Oculus Quest 2',
       'AOC flagged ‘material risks’ to Palantir investors in letter to SEC',
       'Daily Crunch: Shopify confirms data breach',
       'Pentagon Official Warns About Chinese Drones Without Explaining Specific Security Risks (Again)',
       'Google to Journalists: Shut Up and Take the Money',
       'CBP Drones Conducted Flyovers Near Homes of Indigenous Pipeline Activists, Flight Records Show',
       'Whistleblower: DHS Goons Whitewashed Intel to Do