# Getting data from the web: API

## Spotify

Go to spotify and we're going to do similar steps to get our credentials

https://developer.spotify.com/dashboard/login

In [None]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [None]:
import spotify_key

In [None]:
# This is BAD if you store it in GitHub
# 
# CLIENT_ID='69ed62cb11654ab4b'
# CLIENT_SECRET='fffa11dee3e540'

In [None]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=spotify_key.CLIENT_ID,
                                                           client_secret=spotify_key.CLIENT_SECRET))

In [None]:
results = sp.search(q='Neil Young', limit=20)

In [None]:
results

Sometimes it's like trying to drink water from a firehose.

### How do I know how to do the above code in the first place?

It can be VERY important and helpful to consult the documentation.

Check out: https://developer.spotify.com/documentation/web-api/
* Notice that they give descriptions of the response codes
* Look for Guides, References, Documentation, Libraries, etc.

Guidance as to Web API Libraries: https://developer.spotify.com/documentation/web-api/libraries/
* Find Spotipy

And there is documentation in turn for libraries like spotipy: https://spotipy.readthedocs.io/
* Installation
* Examples
* Reference
* Link to source code
  * GitHub repos can themselves contain useful documentation on the main README
  * ... not that we want to overburden ourselves with Python, but note that you can even peruse source code if you want
    * https://github.com/spotipy-dev/spotipy/blob/2.22.1/spotipy/client.py
    * note the use of requests and json (and something called "urllib3" -> requests is built on urllib3 and intended to make HTTP requests more painless)

In [None]:
# Let's put our Python dictionary skills to good use

results.keys()

In [None]:
results['tracks'].keys()

In [None]:
results['tracks']['items'][0].keys()

In [None]:
print(results['tracks']['items'][0]['album']['name'])
print(results['tracks']['items'][0]['artists'])
print(results['tracks']['items'][0]['popularity'])

In [None]:
results = sp.search(q='Neil Young', limit=20)
for idx, track in enumerate(results['tracks']['items']):
    print(idx, track['name'])

In [None]:
results = sp.search(q='Neil Young', limit=20)
for idx, track in enumerate(results['tracks']['items']):
    print(idx, track['name'], ' : ', track['artists'][0]['name'])

As long as you can traverse this data structure, then you can collect information from every item that is returned into something a little more manageable.... like a Pandas dataframe.

### Example: Simple data on songs from the 70's
* Search for songs from the 70's
* Make lists of info
* Make a Pandas dataframe with one column for each list you've collected

In [None]:
artist_name = []
track_name = []
popularity = []
track_id = []

# Search for 50 songs from 1970
track_results = sp.search(q='year:1970', type='track', limit=50)

# Use a for loop to iterate over the track items
# and add the artist name, track name, popularity score, and track ID to lists
for i in track_results['tracks']['items']:
    artist_name.append(i['artists'][0]['name'])
    track_name.append(i['name'])
    track_id.append(i['id'])
    popularity.append(i['popularity'])

Import pandas and convert those lists into columns of a dataframe

In [None]:
import pandas as pd

In [None]:
track_dataframe = pd.DataFrame({'artist_name' : artist_name,
                                'track_name' : track_name,
                                'track_id' : track_id,
                                'popularity' : popularity})

In [None]:
print(track_dataframe.shape)
track_dataframe.head()

Use the dataframe now as an easier structure to navigate and analyze.

In [None]:
track_dataframe.sort_values(by='popularity')

In [None]:
track_dataframe.sort_values(by='popularity')[-10:].plot.barh(x='artist_name',y='popularity')

## You now know some spotipy specifics
Try looking through spotipy's [example code](https://github.com/plamere/spotipy/tree/2.19.0/examples) in their GitHub repository and try one here that interests you

## NYTimes API


Setting things up for using the API:
* Create a developer account and a new app by following the instructions here:  https://developer.nytimes.com/get-started

Setting up for authentication on the developer page (developer.nytimes.com)
* We're going to create a new APP with access to Article Search and Books
* Copy the key into a file and assign it as a string to a variable named 'MY_KEY'
* Use that key here by importing the module (your *.py file) which contains simply the variable you've defined to store your key

In [None]:
import requests

In [None]:
import nytimes_key

How do you know how to use the API?
* Consult the documentation
  * Article search -- see https://developer.nytimes.com/docs/articlesearch-product/1/overview
    * `/articlesearch.json?q={query}&fq={filter}`
    * Example:
      * `https://api.nytimes.com/svc/search/v2/articlesearch.json?q=election&api-key=yourkey`
  * Books API -- see https://developer.nytimes.com/docs/books-product/1/overview
    * `/lists/2019-01-20/hardcover-fiction.json`
    * Example:
      * `https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=yourkey`
    * "The book reviews service lets you get NYT book review by author, ISBN, or title."
      * `https://api.nytimes.com/svc/books/v3/reviews.json?author=Stephen+King&api-key=yourkey`


Let's try the Books first:

In [None]:
# Note how we include your key without explicitly typing it into this file

url = 'https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=' + nytimes_key.MY_KEY

In [None]:
print(url)

Use "requests" to query the API endpoint

In [None]:
response = requests.get(url)

In [None]:
response

In [None]:
response.text

Actually.... the response is in JSON format

In [None]:
x = response.json()

In [None]:
x

Navigating this data structure may remind you of experiences navigating the nasty CORGIS lists of nested dictionaries.

In [None]:
x.keys()

In [None]:
x['results'].keys()

In [None]:
x['results']['books'][0]

In [None]:
for i in x['results']['books']:
    print('Author: ' + i['author'] + ', Title: ' + i['title'])

### Example 2: Get a review of a book above

In [None]:
# Note how we include your key without explicitly typing it into this file

url = 'https://api.nytimes.com/svc/books/v3/reviews.json?title=GO+TELL+THE+BEES+THAT+I+AM+GONE&api-key=' + nytimes_key.MY_KEY

In [None]:
print(url)

Use "requests" to query the API endpoint

In [None]:
response = requests.get(url)

In [None]:
response.json()

No result.  Try again.

In [None]:
url = 'https://api.nytimes.com/svc/books/v3/reviews.json?title=THE+MIDNIGHT+LIBRARY&api-key=' + nytimes_key.MY_KEY
response = requests.get(url)
x = response.json()
x

* It doesn't look like the API allows us to actually retrieve the review itself
* BUT, we're now fully capable of getting that ourselves with the URL
* -> Grab the review with requests
* -> Use BeautifulSoup to help us sort through all the HTML tags

**However**, this is disallowed under the [Terms of Service](https://help.nytimes.com/hc/en-us/articles/115014893428-Terms-of-service).  If you wanted to do this, you should seek written permission from the NYTimes.

## Article search

The NYTimes has a number of different APIs.  Let's repeat the process with the Article Search.

In [None]:
url = 'https://api.nytimes.com/svc/search/v2/articlesearch.json?q=spotify&api-key=' + nytimes_key.MY_KEY

In [None]:
print(url)

In [None]:
response = requests.get(url)

In [None]:
response

In [None]:
response.text

Actually.... the response is in JSON format

In [None]:
x = response.json()

In [None]:
x.keys()

In [None]:
len(x['response']['docs'])

In [None]:
x['response']['docs'][0].keys()

In [None]:
for i in x['response']['docs']:
    print(i['headline'])

In [None]:
for i in x['response']['docs']:
    print(i['headline']['main'])