<a href="https://colab.research.google.com/github/IshaKanani/-data_mgmt_rsch_life_cycle/blob/master/Week_8_Acquiring_data_API_and_web_crawling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Learning objectives
* Introduction to Bing
* Use bing API, retriev api
* Get a list of articles using Bing
* Use newspaper3k with bing

# Bing

## Why API

* ***Suggest search terms in real time***:	Improve your application experience by using the Bing Autosuggest API to display suggested search terms as they're typed.
* ***Filter and restrict results by content type***: 	Customize and refine search results with filters and query parameters for web pages, images, videos, safe search, and more.
* ***Hit highlighting for unicode characters***:	Identify and remove unwanted unicode characters from search results before displaying them to users with hit highlighting.
* ***Localize search results by country, region, and/or market***: 	Bing Web Search supports more than three dozen countries or regions. Use this feature to refine search results for a specific country/region or market.
* ***Analyze search metrics with Bing Statistics:*** 	Bing Statistics is a paid subscription that provides analytics on call volume, top query strings, geographic distribution, and more.

## Step1: Retrieving Response

* Retrieve the subscription key from microsoft account
* Paste the subsciprion key in headers
* Define Params
* Request a url
* Parse the response in JSON

Function used:
1. [request.get](http://docs.python-requests.org/en/master/user/quickstart/)
2. [Exception Handling](https://www.w3schools.com/python/python_try_except.asp)

[Bing Parameters](https://docs.microsoft.com/en-us/rest/api/cognitiveservices/bing-custom-search-api-v7-reference)

In [1]:
import json
import requests

headers = {
    # Request headers
    # Add any of the two keys from microsoft account here
    'Ocp-Apim-Subscription-Key': '56de6ee45d3e474b8a664174fcabdf81',
}

params = ({
    # Request parameters
    
    #q: Query or key word
    'q': 'bill gates',
    
    #count: The number of articles to be retrieved
    'count': '10',
    
    #offset: The first number of articles to skip before retrieving the list
    'offset': '0',
    
    #mkt: The region/ country market to focus while retrieving articles
    #market codes: https://docs.microsoft.com/en-us/azure/cognitive-services/bing-web-search/language-support
    'mkt': 'en-us',
    
    #safesearch: A filter used to filter results for adult content.
    'safesearch': 'Off', #'Strict', 'Off', 'Moderate',
})



try:
  
    #Request a response by providing url, subscription key and cpnstraints
    response = requests.get("https://api.cognitive.microsoft.com//bing/v7.0/news/search?%s?", headers=headers, params=params)
    
    #Check for error
    response.raise_for_status()
    
    print(response)

#Deal with the exception if one is raised
except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))


<Response [200]>


### Understanding Exception Handling

In [2]:
headers = {
    # Request headers
    'Ocp-Apim-Subscription-Key': '56de6ee45d3e474b8a664174fcabdf8',
}

params = ({
    # Request parameters
    
    #q: Query or key word
    'q': 'bill gates',
    'count': '10',
    'offset': '0',
    'mkt': 'en-us',
    'safesearch': 'Moderate',
})

try:
    response_er = requests.get("https://api.cognitive.microsoft.com//bing/v7.0/news/search?%s?", headers=headers, params=params)
    response_er.raise_for_status()
    print(response_er)
except Exception as e:
    print("[Errno {0}]".format(e))

[Errno 401 Client Error: Access Denied for url: https://api.cognitive.microsoft.com//bing/v7.0/news/search?%s?&q=bill+gates&count=10&offset=0&mkt=en-us&safesearch=Moderate]


### Retrieving JSON response

In [3]:
#Get the response in JSON format
search_results = response.json()

#Display JSON
search_results

{'_type': 'News',
 'queryContext': {'adultIntent': False, 'originalQuery': 'bill gates'},
 'readLink': 'https://api.cognitive.microsoft.com/api/v7/news/search?q=bill+gates',
 'sort': [{'id': 'relevance',
   'isSelected': True,
   'name': 'Best match',
   'url': 'https://api.cognitive.microsoft.com/api/v7/news/search?q=bill+gates'},
  {'id': 'date',
   'isSelected': False,
   'name': 'Most recent',
   'url': 'https://api.cognitive.microsoft.com/api/v7/news/search?q=bill+gates&sortby=date'}],
 'totalEstimatedMatches': 62700,
 'value': [{'about': [{'name': 'Bill Gates',
     'readLink': 'https://api.cognitive.microsoft.com/api/v7/entities/0d47c987-0042-5576-15e8-97af601614fa'}],
   'category': 'Health',
   'datePublished': '2019-03-13T03:29:00.0000000Z',
   'description': 'In the most recent "Ask Me Anything" on Reddit, Microsoft cofounder Bill Gates was asked a host of humanitarian-related questions ranging from topics like climate change to the future of education. About 30 minutes into

## Step2: Structured Format: JSON

### JSON

* JavaScript Object Notation
* Language independent
* Self describing due to key-value pairs
* Uses JavaScript Syntax for data Objects

What is a JSON object?
Same as Dictionary


![JSON format](https://image.slidesharecdn.com/json-130530085957-phpapp01/95/json-the-basics-13-638.jpg?cb=1369904720)

In [4]:
#Understanding the hierarchy of JSON response
for obj in search_results:
  print(obj)

_type
readLink
queryContext
totalEstimatedMatches
sort
value


In [13]:
search_results.keys()

dict_keys(['_type', 'readLink', 'queryContext', 'totalEstimatedMatches', 'sort', 'value'])

In [14]:
search_results['totalEstimatedMatches']

62700

In [7]:
#Studying the hierarchy of particular key "value" 
for record in search_results['value']:
  print(record['url'])

https://www.businessinsider.com/bill-gates-says-hes-happier-at-age-63-than-25-because-of-4-things-2019-3
https://finance.yahoo.com/news/watch-33-old-bill-gates-141248255.html
https://www.bloomberg.com/news/articles/2019-03-12/capital-gains-increase-at-heart-of-democrats-tax-the-rich-plans?srnd=politics-vp
https://www.educationdive.com/news/chicago-network-for-college-success-gets-gates-recognition-for-focus-on-fre/549935/
https://medium.com/utopiapress/aoc-and-bill-gates-agree-and-disagree-on-automation-59cbd3357377
https://www.wealthdaily.com/articles/bill-gates-top-technology-picks-of-2019-an-investor-s-guide-part-2-/91705
https://finance.yahoo.com/news/bill-gates-simple-trick-figure-152314051.html
https://www.realtor.com/news/celebrity-real-estate/bill-gates-has-a-trampoline-room-in-his-home/
https://www.marketwatch.com/story/bill-gates-finds-an-ally-in-washington-for-his-idea-to-tax-robots-alexandria-ocasio-cortez-2019-03-11?mod=market-extra
https://www.cnbc.com/2019/03/08/bill-gat

## Step3: Generating a list using key values

In [15]:
#Empty list
list_articles=[]

#Retrieving urls from search_results['value']
for obj in search_results['value']:
  list_articles.append(obj['url'])
 
list_articles

['https://www.businessinsider.com/bill-gates-says-hes-happier-at-age-63-than-25-because-of-4-things-2019-3',
 'https://finance.yahoo.com/news/watch-33-old-bill-gates-141248255.html',
 'https://www.bloomberg.com/news/articles/2019-03-12/capital-gains-increase-at-heart-of-democrats-tax-the-rich-plans?srnd=politics-vp',
 'https://www.educationdive.com/news/chicago-network-for-college-success-gets-gates-recognition-for-focus-on-fre/549935/',
 'https://medium.com/utopiapress/aoc-and-bill-gates-agree-and-disagree-on-automation-59cbd3357377',
 'https://www.wealthdaily.com/articles/bill-gates-top-technology-picks-of-2019-an-investor-s-guide-part-2-/91705',
 'https://finance.yahoo.com/news/bill-gates-simple-trick-figure-152314051.html',
 'https://www.realtor.com/news/celebrity-real-estate/bill-gates-has-a-trampoline-room-in-his-home/',
 'https://www.marketwatch.com/story/bill-gates-finds-an-ally-in-washington-for-his-idea-to-tax-robots-alexandria-ocasio-cortez-2019-03-11?mod=market-extra',
 'ht

## Bonus Assignment: 5 points

The assignment deadline is **03/15 (today) 11:59 pm**.

1. Use your own search keywords instead of "Bill Gates"
2. Using these new words, retrieve the list of URLs using your own Bing API key
3. Give the list of URLs as an inout to "newspaper" to get the article data. Retrieve data that you find relevant like text, authors, title etc. (Get at least 3 types of data)
4. Save this data in a .tsv file
5. Upload both the file and the python code on your GitHub.


Refrence:




[Bing API](https://dev.cognitive.microsoft.com/docs/services/f40197291cd14401b93a478716e818bf/operations/56b4447dcf5ff8098cef380d)

[Documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/bing-web-search/)