<a href="https://colab.research.google.com/github/nhwhite212/DealingwithDataSpring2021/blob/colab/3-WebAPIs_crawling/B-Web_APIs_Old.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Web API's continued: Face Recognition, Entity Extraction, etc

# Unfortunately, the faceplus recognition api is no longer free, but it is still very inexpensive.

Now let's try to play with a few APIs that are a bit more complex than the ones that we dealt earlier.

## FacePlusPlus API: Face Recognition

Let's start with the FacePlusPlus API that allows us to recognize faces. We will call the API through Mashape, which will also allow us to learn about _headers_, which is an additional piece of information that we send to APIs, in addition to parameters. The documentation of Face++ on Mashape can be found at https://market.mashape.com/faceplusplus/faceplusplus-face-detection.

We will start by analyzing the image below, which is accessible through this URL: http://graphics8.nytimes.com/newsgraphics/2016/02/01/iowa-hp/dd8cb1e066b52661f94bb2306fc54189f1c3325e/hp-kk-dem-1.jpg

![Image from NY Times](http://graphics8.nytimes.com/newsgraphics/2016/02/01/iowa-hp/dd8cb1e066b52661f94bb2306fc54189f1c3325e/hp-kk-dem-1.jpg)

In [None]:
import requests
import json

facepp_url = "https://faceplusplus-faceplusplus.p.mashape.com/detection/detect"
img_url = "http://graphics8.nytimes.com/newsgraphics/2016/02/01/iowa-hp/dd8cb1e066b52661f94bb2306fc54189f1c3325e/hp-kk-dem-1.jpg"

headers = {
   "X-Mashape-Key": "GV6K1ICRbTmsh81eqtIBt6j9AbAep1VqX5Xjsn99NeApuQLJR5",
  "Accept": "application/json"
}
parameters = {
    'attributes': 'glass,pose,gender,age,race,smiling',
    'url': img_url
}

data = requests.get(facepp_url, params=parameters, headers=headers, verify=True).json()
data


{'message': 'Endpoint/detection/detect does not exist'}

In [None]:
data.keys()

dict_keys(['message'])

In [None]:
data['face']

KeyError: 'face'

In [None]:
# The "face" attribute contains a list, and each element of the list is a dictionary
len(data["face"])

#### Exercise

* Print the gender, age, race, and smiling attributes for each face
* Do an image search and get an image URL from the Internet, preferably with multiple faces. Repeat the task above for the new image.

In [None]:
# your code here

### Interacting with the IBM Watson Natural Language Understanding API; POST vs GET

Another useful API, especially when dealing with text, is the [IBM Watson  Natural Language Understanding API](https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/#introduction), which offers a variety of text analysis functionalities, such as sentiment analysis, entity extraction, keyword extraction, etc.

#### /analyze call

We will first start with the `GET /analyze` API call ([documentation](https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/#get-analyze)), which takes as input a piece of text, and returns an analysis across various dimensions.

The call below gets as input a "text" variable, and returns back the sentiment of the text.

In [None]:
import requests
import json

def getSentiment(text):
    endpoint = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"

    # You can register and get your own credentials
    # The ones below have a quota of 1000 calls per day 
    # and can run out quickly if multiple people use these
    username = "802a033d-ff91-4b02-a6c4-a40703ac1b16"
    password = "TBWFrRx6xwmc"

    parameters = {
        #'features' : 'concepts,categories,emotion,entities,keywords,metadata,relations,semantic_roles,sentiment',
        'features': 'emotion,sentiment',
        'version' : '2017-02-27',
        'text': text,
        'language' : 'en',
        # url = url_to_analyze, this is an alternative to sending the text
    }

    resp = requests.get(endpoint, params=parameters, auth=(username, password))
    
    return resp.json()

text = '''
This class is challenging, but I love how much I am learning.
'''

data = getSentiment(text)

In [None]:
data.keys()

In [None]:
data['sentiment']

In [None]:
data['emotion']

#### Entities call

[Full Documentation of the call](https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/#entities)

This is a an API call that extracts entities from the text, and also the sentiment and emotion for each of these entities. You will also see that there is the capability of "normalizing" each entity, so that two different ways of saying the same thing get mapped to the same entity. So for example, "President Trump" and "Donald Trump" get mapped to the same Knowledge Graph entity.

In [None]:
import requests
import json

def processURL(url_to_analyze):
    endpoint_watson = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"
    params = {
        'version': '2017-02-27',
    }
    headers = { 
        'Content-Type': 'application/json',
    }
    watson_options = {
      "url": url_to_analyze,
      "features": {
        "entities": {
          "sentiment": True,
          "emotion": True,
          "limit": 10
        }
      }
    }
    username = "802a033d-ff91-4b02-a6c4-a40703ac1b16"
    password = "TBWFrRx6xwmc"

    resp = requests.post(endpoint_watson, data=json.dumps(watson_options), 
                         headers=headers, params=params, auth=(username, password) )
    return resp.json()


url_to_analyze = 'http://www.politico.com/story/2017/05/23/infrastructure-transportation-trump-budget-238741'

data = processURL(url_to_analyze)

In [None]:
# Let's see what we get back as top-level attributes
data.keys()

In [None]:
# Let' see the entities list
data["entities"]

In [None]:
# Let' see the first entity. Notice the "disambiguated" attribute that
# points to "canonical" versions of the entity, in DBPedia, Freebase, OpenCYC, YAGO, etc
data["entities"][0]

In [None]:
# This function takes as input the result
# from the IBM Watson API and returns a list
# of entities that are relevant (above threshold)
# to the article
def getEntities(data, threshold):
    result = []
    for entity in data["entities"]:
        relevance = float(entity['relevance'])
        if relevance > threshold:
            result.append(entity['text'])
    return result

getEntities(data, 0.25)

#### Exercise

* Fetch the main page of NY Times. Print the entities that are currently being discussed in the news, together with their relevance value and the associated sentiment.
* _Optional:_ Use the NY Times API to fetch the Top Stories News. You can register and get an API key at https://developer.nytimes.com/. The `Top Stories V2 API` provides the details of the news of the day: (The API call documentation is at https://developer.nytimes.com/top_stories_v2.json and the API Call is  https://api.nytimes.com/svc/topstories/v2/home.json?api-key=PUTYOURKEYHERE). Repeat the entity extraction process from above.

### Exercise: Using the Spotify API

We will now use the Spotify API to get information about an artist. The documentation of the calls is at https://developer.spotify.com/web-api/endpoint-reference/. For now, use only the calls that do not require an OAuth authentication. 

Tasks:
* We can first find the id of an artist using the `/v1/search?type=artist` API call. The documentation of the `search-item` endpoint is at https://developer.spotify.com/web-api/search-item/.
* Once you get back the ID of the artist, use the `get artist` endpoint, to get further information about the artist: https://developer.spotify.com/web-api/get-artist/
* Study the documentation and figure out how to get the albums of an article, the top tracks for an artist, and the related artists.



