## Application Programming Interface (API)

**ALWAYS CHECK WEBSITE API PARAMETERS REGARDING LIMITS, AVAILABLE INFO., ETC.**

> API stands for **Application Programming Interface**. 

An API is a communication protocol between 2 software systems. It describes the mechanism through which if one system **requests** some information using a predefined format, a remote system **responds** with an outcome that gets sent back to the first system. 

APIs are a way of allowing 2 applications to interact with each other. This is an incredibly common task in modern web-based programs. For instance, if you've ever connected your facebook profile to another service such as Spotify or Instagram, this is done through APIs. An API represents a way for 2 pieces of software to interact with one another. Under the hood, the actual request and response is done as a **_HTTP Request_**.

### What is an API made of ?

APIs are very common in tech world, which means that are many, many different kinds that you're going to run into. While each API you work with will be unique in some way, there are some common traits you can expect to see overall. An API has three main components as listed below:

* **Access Permissions:** Is the user allowed to ask for data or services?
* **Request:** The service being asked for (e.g., if I give you current location using GPS, tell me the map around that place - as we see in Pokemon Go).  A Request has two main parts:

    * **Methods:** Once the access is permitted, what questions can be asked.
    
    * **Parameters:** Additional details that can be sent with requests or responses

* **Response:** The data or service as a result of the request.

## HTTP Request using python requests

To make these things simpler, one easy-to-use third-party library, known as Requests, is available and most developers prefer to use it instead or urllib/urllib2. It is an Apache2 licensed HTTP library powered by urllib3 and httplib. Requests is add-on library that allows you to send HTTP requests using Python. With this library, you can access content like web page headers, form data, files, and parameters via simple Python commands. It also allows you to access the response data in a simple way.

    !pip install requests
    #Import requests to working environment
    import requests
    
    #Making a request
    resp = requests.get('https://www.google.com')
    
    #Check the returned status code
    resp.status_code == requests.codes.ok
    #https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
    
    #Once we know that our request was successful and we have a valid response, we can check the returned information using `.text` property of the response object. 
    print (resp.text)
   
    #Read the header of the response - convert to dictionary for displaying k:v pairs neatly
    dict(resp.headers)
    
    #parsing retrieved data
    print(resp.headers['Content-Length'])  # length of the response
    print(resp.headers['Date'])  # Date the response was sent
    print(resp.headers['server'])   # Server type (google web service - GWS)
    
#### Passing Parameters in GET
In some cases, you'll need to pass parameters along with your GET requests. These extra parameters usually take the the form of query strings added to the requested URL. To do this, we need to pass these values in the `params` parameter. Let's try to acces information from `httpbin` with some user information. 

Note: The user information is not getting authenticated at `httpbin` so any name/password will work fine. This is merely for practice. 


    credentials = {'user_name': 'FlatironSchool', 'password': 'learnlovecode'}  
    r = requests.get('http://httpbin.org/get', params=credentials)

    print(r.url)  
    print(r.text)

#### HTTP POST method 

Sometimes we need to send one or more files simultaneously to the server. For example, if a user is submitting a form and the form includes different fields for uploading files, like user profile picture, user resume, etc. Requests can handle multiple files on a single request. This can be achieved by putting the files to a list of tuples in the form (field_name, file_info).

    import requests

    url = 'http://httpbin.org/post'  
    file_list = [  
        ('image', ('fi.png', open('fi.png', 'rb'), 'image/png')),
        ('image', ('fi2.jpeg', open('fi2.jpeg', 'rb'), 'image/png'))
    ]

    r = requests.post(url, files=file_list)  
    print(r.text) 

## OAuth

> **OAuth stands for Open Authorization.**

**_OAuth_** is an open-source protocol created to allow the creators of APIs and other online services to easily let them share private data or assets with users. One of the biggest challenges of building multi-user applications is making sure that you only give people access to the data and functionality they're supposed to have. OAuth provides a framework for allowing authenticated access, but without the risk of having to share the original login credentials such as a password.  The OAuth protocol was created in 2010, and was the brainchild of major tech companies such as Google and Twitter. It's now the most popular open standard for user authentication, is is used by almost all of the the major players in the tech world, such as Netflix, Amazon, Facebook, and more!

### The Steps of OAuth

Prior to using OAuth, we must also register our application with the authorizer and get our **credentials** to use during the process. We need to set up some information about the application, like the app's name or website, and most importantly, **a redirect URI**. The authorizer later uses this to contact the requesting app and tell them that the user said yes. 

> A URI (Uniform Resource Identifier) is a string that refers to a resource. The most common are URLs, which identify the resource by giving its location on the Web.

After registration, The first step is the **authorization**. Here, we send our users to the authorization server to ask for some permissions with our scope (permissions) that we would like to have. The user can see everything being requested on his behalf and confirm that they would like to grant our application access for those permissions.

 
The second step is the **redirect**. Redirect URI are a critical part of the OAuth flow. After a user successfully authorizes their application, the authorization server then redirects the user back to app with an **authorization code** in the URL. Because the redirect URL will contain sensitive information, it is critical that the service doesn’t redirect the user to arbitrary locations. The authorization code is used by our application in the final act of getting the access token. 


The final step is **acquisition**. This is where we finally receive our **access token** from service provider so we can process API requests for our user. We use the authorization code we received in the redirect to our redirect url and our own application secret (which is acquired during initial registration) in order to get our user’s access token. The access token can then be used to make API calls on behalf of our user.

#### OAuth requests and conversion to DataFrame
**wont work unless proper credentials are entered (authorization, bearer)**

    #As a general rule of thumb, don't store passwords in a main file like this!
    #Instead, you would normally store those passwords under a sub file like passwords.py which you would then import.
    #Or even better, as an environment variable that could then be imported!
    #For now, we'll simply hardcode them into our notebook for simplicity.
    client_id = #Your client ID goes here (as a string)
    api_key = #Your api key goes here (as a string)
    
    term = 'Mexican'
    location = 'Astoria NY'
    SEARCH_LIMIT = 10

    url = 'https://api.yelp.com/v3/businesses/search'

    headers = {
            'Authorization': 'Bearer {}'.format(api_key), #authorization information
        }

    url_params = {
                    'term': term.replace(' ', '+'),
                    'location': location.replace(' ', '+'),
                    'limit': SEARCH_LIMIT
                }
    response = requests.get(url, headers=headers, params=url_params)
    print(response)
    print(type(response.text))
    print(response.text[:1000])
    
    #get response
    response.json().keys()
    
    #preview information in each key
    for key in response.json().keys():
        print(key)
        value = response.json()[key] #Use standard dictionary formatting
        print(type(value)) #What type is it?
        print('\n\n') #Seperate out data
        
    #more detailed responses
    response.json()['businesses'][:2]
    response.json()['region']
    response.json()['total']
    
    #convert information to pandas dataframe
    import pandas as pd
    df = pd.DataFrame.from_dict(response.json()['businesses'])
    print(len(df)) #Print how many rows
    print(df.columns) #Print column names
    df.head() #Previews the first five rows. 
    #You could also write df.head(10) to preview 10 rows or df.tail() to see the bottom
    
#### Bonus: pagination to retrieve all results and convert to DF

    #Your code here; use a function or loop to retrieve all the results from your original request
    import pandas as pd
    import time

    def yelp_call(url_params, api_key):
        url = 'https://api.yelp.com/v3/businesses/search'
        headers = {'Authorization': 'Bearer {}'.format(api_key)}
        response = requests.get(url, headers=headers, params=url_params)

        df = pd.DataFrame(response.json()['businesses'])
        return df

    def all_results(url_params, api_key):
        num = response.json()['total']
        print('{} total matches found.'.format(num))
        cur = 0
        dfs = []
        while cur < num and cur < 1000:
            url_params['offset'] = cur
            dfs.append(yelp_call(url_params, api_key))
            time.sleep(1) #Wait a second
            cur += 50
        df = pd.concat(dfs, ignore_index=True)
        return df

    term = 'pizza'
    location = 'Astoria NY'
    url_params = {  'term': term.replace(' ', '+'),
                    'location': location.replace(' ', '+'),
                    'limit' : 50
                 }
    df = all_results(url_params, api_key)
    print(len(df))
    df.head()

## Webscraping with Beautiful Soup

https://www.crummy.com/software/BeautifulSoup/bs4/doc/   

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Three features make it powerful:

* Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn't take much code to write an application

* Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't detect one. Then you just have to specify the original encoding.

* Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.

Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

#### Packages

    from bs4 import BeautifulSoup
    import requests
    import re
    import pandas as pd

#### Collect song lyrics from an artists page

    def grab_song_links(artist_page_url):

        url = artist_page_url

        html_page = requests.get(url) #Make a get request to retrieve the page
        soup = BeautifulSoup(html_page.content, 'html.parser') #Pass the page contents to beautiful soup for parsing


        #The example from our lecture/reading
        data = [] #Create a storage container

        #Get album divs
        albums = soup.find_all("div", class_="album")
        for album_n in range(len(albums)):
            #On the last album, we won't be able to look forward
            if album_n == len(albums)-1:
                cur_album = albums[album_n]
                album_songs = cur_album.findNextSiblings('a')
                for song in album_songs:
                    page = song.get('href')
                    title = song.text
                    album = cur_album.text
                    data.append((title, page, album))
            else:
                cur_album = albums[album_n]
                next_album = albums[album_n+1]
                saca = cur_album.findNextSiblings('a') #songs after current album
                sbna = next_album.findPreviousSiblings('a') #songs before next album
                album_songs = [song for song in saca if song in sbna] #album songs are those listed after the current album but before the next one!
                for song in album_songs:
                    page = song.get('href')
                    title = song.text
                    album = cur_album.text
                    data.append((title, page, album))
        return data
        
#### function to scrape text (lyrics)

    def scrape_lyrics(song_page_url):
        html_page = requests.get(song_page_url)
        soup = BeautifulSoup(html_page.content, 'html.parser')
        main_page = soup.find('div', {"class": "container main-page"})
        main_l2 = main_page.find('div', {"class" : "row"})
        main_l3 = main_l2.find('div', {"class" : "col-xs-12 col-lg-8 text-center"})
        lyrics = main_l3.findAll('div')[6].text
        return lyrics
