<h1>PYTHON'S REQUESTS LIBRARY (TUTORIAL)</h1>

requests library:
- standard for making HTTP requests in Python
- abstracts complexities of making requests behind simple API so can focus on teracting w/ services + consuming data in app

will learn how to:
- **make** requests using common HTTP methods
- **customize** requests' headers + data, using query string and message body
- **inspect** data from requests + responses
- make **authenticated** requests
- **configure** requests to help prevent app from backing up or slowing down

<h2>getting started with "requests"</h2>

In [1]:
# install requests library
import requests
from requests import*

<h2>The GET Request</h2>

HTTP methods such as GET and POST:
- determine which action you're trying to perform when making HTTP requests

**GET**: GET method indicates that you're trying to retrieve data from specified resource. To make GET requests, invoke "requests.get()"

In [2]:
# make GET request to github's Root Rest API
requests.get('https://api.github.com')

<Response [200]>

<h2>The RESPONSE</h2>
- object that inspects results of request

In [3]:
# store return value in variable so can get closer look at attributes + behaviours
response = requests.get('https://api.github.com')

- in this example, captured return value of get() which is an instance of Response and stored in a variable called **response**

- can now use <code> response </code> to see lots of info about results of GET request

<h3>Status Codes</h3>

- first bit of info can gather from **response** is the **status code**
- status code informs you of status of request
    - 200 OK: request was successful
    - 404 NOT FOUND: resource looking for was not found

In [4]:
# access .status_code to see status code server has returned
response.status_code # 200 => request was successful and server responded w/ data requested

200

In [5]:
# sometimes want to use status code to make decisions in code
if response.status_code == 200:
    print('SUCCESS')
elif response.status_code == 404:
    print('FAILURE')
    
# if server returns 200 status code, program will print SUCCESS
# if server returns 404 status code, program will print FAILURE

SUCCESS


**requests** goes one step further in simpligying process for you:
- if use a **response** instance in conditional expression, it will evaluate to TRUE if status code was b/w 200 and 400 and FALSE otherwise

In [6]:
# simplify last example
if response:
    print('SUCCESS')
else:
    print('error has occurred')
    
# be wary as other status codes in 200-400 range give SUCCESS message b/c other status codes that provide 
# some workable response

SUCCESS


In [7]:
from requests.exceptions import HTTPError

# don't want to use if statement, instead raise EXCEPTION if request unsuccessful by using
# .raise_for_status()

for url in ['https://api.github.com', 'https://api.github.com/invalid']:
    try:
        response = requests.get(url)

        # If the response was successful, no Exception will be raised
        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')  
    except Exception as err:
        print(f'Other error occurred: {err}')  
    else:
        print('SUCCESS')
        
# if invoke .raise_for_status(), HTTPError will be raised for certain status codes.
# if status code indicates successful request, program will proceed w/o Exception raised

SUCCESS
HTTP error occurred: 404 Client Error: Not Found for url: https://api.github.com/invalid


<h3>CONTENT</h3>

- response of GET request often has valuable info, **payload**, in message body
- using attributes + methods of **response**, can view payload in variety of diff formats

In [8]:
# see response's content in bytes, use .content (gives access to raw bytes of response payload)
response = requests.get('https://api.github.com')
response.content

b'{\n  "current_user_url": "https://api.github.com/user",\n  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",\n  "authorizations_url": "https://api.github.com/authorizations",\n  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",\n  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",\n  "emails_url": "https://api.github.com/user/emails",\n  "emojis_url": "https://api.github.com/emojis",\n  "events_url": "https://api.github.com/events",\n  "feeds_url": "https://api.github.com/feeds",\n  "followers_url": "https://api.github.com/user/followers",\n  "following_url": "https://api.github.com/user/following{/target}",\n  "gists_url": "https://api.github.com/gists{/gist_id}",\n  "hub_url": "https://api.github.com/hub",\n  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",\n  "issues_url": "https://api.

In [9]:
# convert .content raw bytes to string using character encoding such as UTF-8 by using: .text
response.text

'{\n  "current_user_url": "https://api.github.com/user",\n  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",\n  "authorizations_url": "https://api.github.com/authorizations",\n  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",\n  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",\n  "emails_url": "https://api.github.com/user/emails",\n  "emojis_url": "https://api.github.com/emojis",\n  "events_url": "https://api.github.com/events",\n  "feeds_url": "https://api.github.com/feeds",\n  "followers_url": "https://api.github.com/user/followers",\n  "following_url": "https://api.github.com/user/following{/target}",\n  "gists_url": "https://api.github.com/gists{/gist_id}",\n  "hub_url": "https://api.github.com/hub",\n  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",\n  "issues_url": "https://api.g

In [10]:
# b/c decoding BYTES to STR requires encoding scheme, requests will try to guess encoding based on
# response's headers (if you do not specify one)

# can provide EXPLICIT ENCODING by setting .encoding b4 .text:
response.encoding = 'utf-8' # optional: requests infers this internally
response.text

'{\n  "current_user_url": "https://api.github.com/user",\n  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",\n  "authorizations_url": "https://api.github.com/authorizations",\n  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",\n  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",\n  "emails_url": "https://api.github.com/user/emails",\n  "emojis_url": "https://api.github.com/emojis",\n  "events_url": "https://api.github.com/events",\n  "feeds_url": "https://api.github.com/feeds",\n  "followers_url": "https://api.github.com/user/followers",\n  "following_url": "https://api.github.com/user/following{/target}",\n  "gists_url": "https://api.github.com/gists{/gist_id}",\n  "hub_url": "https://api.github.com/hub",\n  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",\n  "issues_url": "https://api.g

In [11]:
# look at response, see that it is serialized JSON content. to get dictionary, could take 
# STR retrieved from .text and deserialize it using json.loads(), simpler way to do this is
# use .json()

response.json() # TYPE of return value of .json() is dictionary, can access values in object by key

{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

<h3>HEADERS</h3>

- response headers can give useful info (content type of response payload, time limit on how long to cache response, etc.)
- to view these headers, access .headers

In [12]:
# access .headers:
response.headers

{'server': 'GitHub.com', 'date': 'Tue, 21 Apr 2020 01:24:37 GMT', 'content-type': 'application/json; charset=utf-8', 'status': '200 OK', 'cache-control': 'public, max-age=60, s-maxage=60', 'vary': 'Accept, Accept-Encoding, Accept, X-Requested-With', 'etag': 'W/"c6bac8870a7f94b08b440c3d5873c9ca"', 'x-github-media-type': 'github.v3; format=json', 'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', 'access-control-allow-origin': '*', 'strict-transport-security': 'max-age=31536000; includeSubdomains; preload', 'x-frame-options': 'deny', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block', 'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'content-security-policy': "default-src 'none'", 'content-encoding': 'gzip', 'X-Ratelimit-Limit': '60', 'X-Ratelim

In [13]:
# .headers return dictionary-like object, can access header values by key

# to see content type of response payload, access content-type:
response.headers['Content-Type']

'application/json; charset=utf-8'

In [14]:
# smt special about this dictionary-like headers object, HTTP spec defines headers to be case-insensitive
# this means we are able to access these headers w/o capitalization:
response.headers['content-type']

'application/json; charset=utf-8'

<h2>QUERY STRING PARAMETERS</h2>

- one common way to cutomize a **GET** request is to pass values thru **query string params** in URL
- to do this using get(), pass data to PARAMS

In [15]:
# use github's search API to look for requests librar
response = requests.get('https://api.github.com/search/repositories', params = {
    'q': 'requests+language:python' # this line passes dictionary to PARAMS parameter of .get(), able to modify results that come back from Search API
},)

# inspect some attributes of the `requests` repo
json_response = response.json()
repo = json_response['items'][0]
print(f'Repository name: {repo["name"]}')  # Python 3.6+
print(f'Repository description: {repo["description"]}')  # Python 3.6+

Repository name: grequests
Repository description: Requests + Gevent = <3


In [16]:
# can pass params to get() in form of dictionary (as we have just done) or as list of TUPLES
requests.get(
    'https://api.github.com/search/repositories',
    params = [('q', 'requests+language:python')],)

<Response [200]>

In [17]:
# can pass values as bytes
requests.get(
    'https://api.github.com/search/repositories',
    params = b'q=requests+language:python',)

<Response [200]>

query strings are useful for parameterizing **GET** requests, can also customize requests by adding/modifying headers you send

<h2>Request Headers</h2>

- to customize headers, pass a dict of HTTP headers to get() using HEADERS param
- e.g. can change previous search request to highlight matching search terms in results by specifying text-match media type in ACCEPT header:

In [18]:
# change previous search request to highlight matching search terms in results, specify text-match media type in ACCEPT header
response = requests.get(
    'https://api.github.com/search/repositories',
    params = {'q': 'requests+language:python'},
    headers = {'Accept': 'application/vnd.github.v3.text-match+json'},
)

# view new `text-match` array which provides info about search term w/i results
json_response = response.json()
repo = json_response['items'][0]
print(f'Text matches: {repo["text_matches"]}')

Text matches: [{'object_url': 'https://api.github.com/repositories/4290214', 'object_type': 'Repository', 'property': 'description', 'fragment': 'Requests + Gevent = <3', 'matches': [{'text': 'Requests', 'indices': [0, 8]}]}]


- `Accept` header tells server what content types your app can handle
- in this case, since expecting matching search terms to be highlighted, we are using header value: application/vnd.github.v3.text-match+json, which is a propietary github `Accept` header where the content is a special JSON format

<h2>Other HTTP methods</h2>

- aside from `GET`, other HTTP methods include: `POST, PUT, DELETE, HEAD, PATCH`
- `OPTIONS.requests` provides a method (w/ similar signature to `get()`) for each of these HTTP methods

In [19]:
requests.post('https://httpbin.org/post', data={'key':'value'})
requests.put('https://httpbin.org/put', data={'key':'value'})
requests.delete('https://httpbin.org/delete')
requests.head('https://httpbin.org/get')
requests.patch('https://httpbin.org/patch', data={'key':'value'})
requests.options('https://httpbin.org/get')

# each func makes a request to `httpbin` service using corresponding HTTP method

<Response [200]>

In [20]:
# inspect their responses for each method
response = requests.head('https://httpbin.org/get')
response.headers['Content-Type']


'application/json'

In [21]:
response = requests.delete('https://httpbin.org/delete')
json_response = response.json()
json_response['args']

{}

<h2>The Message Body</h2>

- according to HTTP specification, `POST, PUT, and PATCH` requests pass their data thru message body rather than thru params in query string
- using `requests`, we'll pass payload to corresponding func's `data` parameter

- `data` takes a dict, list of tupes, bytes, or file-like object
- we'll want to adapt data you send in body of your request to specific needs of service interacting with

In [22]:
# e.g. can send form data as a DICT if request's content type is `application/x-www-form-urlencoded`
requests.post('https://httpbin.org/post', data={'key':'value'})

<Response [200]>

In [23]:
# send same data as list of tuples
requests.post('https://httpbin.org/post', data=[('key', 'value')])

<Response [200]>

- if need to send JSON data, can use `json` param
- when you pass JSON data via `json`, `requests` will serialize data and add correct `Content-Type` header for you

In [24]:
# httpbin.org: great resource by Kenneth Reitz

# use to inspect basic POST request
response = requests.post('https://httpbin.org/post', json={'key':'value'})
json_response = response.json()
json_response['data']

'{"key": "value"}'

In [25]:
json_response['headers']['Content-Type']

'application/json'

<h2>Inspecting your Request</h2>

- when making request, `requests` library prepares request b4 sending it to destination server
- request preparation: validating headers, serializing JSON content

In [26]:
# view `PreparedRequest` by accessing .request:
response = requests.post('https://httpbin.org/post', json={'key':'value'})
response.request.headers['Content-Type']

'application/json'

In [27]:
response.request.url

'https://httpbin.org/post'

In [28]:
response.request.body

b'{"key": "value"}'

- inspecting `PreparedRequest` gives access to info about request such as: payload, URL, headers, authentication, etc.

- made lots of diff kinds of requests: but they are all **unathenticated** requests to public APIs
- many services may want you to authenticate in some way

<h2>Authentication</h2>

- authentication helps a service understand who you are
- typically, provide your credentials to server by passing data thru `Authorization` header or a custom header defined by service
- all request funcs you've seen to this pt provide a param called `auth`, which allows us to pass credentials

example of API that requires authentication is github's Authenticated User API:
- endpoint provides info about authenticated user's profile
- to make request to authenticated user API, can pass github user and password in tuple to get():

In [29]:
# make request to authenticated user API, pass user and password in tuple
from getpass import getpass
requests.get('https://api.github.com/user', auth=('username', getpass()))
# request succeeded if credentials passed in to `auth` are valid

········


<Response [403]>

In [30]:
# if make this request w/ no credentials, status code will be 401 Unauthorized
requests.get('https://api.github.com/user')

<Response [401]>

- when pass user and password in tuple to `auth` param, `requests` is applying credentials using HTTP's "basic access authentication scheme' under the hood
- could make same request by passing explicit Basic authentication credentials using HTTPBasicAuth:

In [31]:
# pass same request (user and password) using explicit Basic authenticatino credentials, HTTPBasicAuth:
from requests.auth import HTTPBasicAuth
from getpass import getpass
requests.get(
    'https://api.github.com/user',
    auth = HTTPBasicAuth('username', getpass()))

········


<Response [403]>

- don't need to be so explicit for **Basic authentication**
- may want to authenticate using other method: `requests` offers other methods of authentication outside of box such as: `HTTPDigestAuth and HTTPProxyAuth`
- can even supply own authentication mechanism

In [32]:
# first create subclass of AuthBase, then implement _call_()

In [33]:
import requests
from requests.auth import AuthBase

class TokenAuth(AuthBase): #custom TokenAuth mechanism receives a token, the includes that token
                            # in X-TokenAuth header of request
    """Implements a custom authentication scheme."""
    
    
    def __init__(self, token):
        self.token = token

    def __call__(self, r):
        """Attach an API token to a custom auth header."""
        r.headers['X-TokenAuth'] = f'{self.token}'  # Python 3.6+
        return r


requests.get('https://httpbin.org/get', auth=TokenAuth('12345abcde-token'))

<Response [200]>

- always use tried-and-true auth scheme like **Basic or OAuth** (unless need custom mechanism for some reason)
- security...? lets look at SSL certificates using `requests`

<h2>SSL Certificate Verification</h2>

- security is important whenever trying to send or receive sensitive data
- communicate w/ secure sites over HTTP by establishing encrypter connectino using SSL
- target server's SSL Certificate is CRITICAL
- `requests` does this by default, however, might be some cases where you want to change behaviour

In [34]:
# if want to disable SSL Ceritifcate verification, pass False to VERIFY param of request func
requests.get('https://api.github.com', verify = False)

# will get warning from `requests` to help keep data safe



<Response [200]>

- note: `requests` uses package called `certifi` to provide Certificate Authorities. This lets `requests` know which authorities are trustworthy, update `certifi` FREQUENTLY

<h2>Performance</h2>

- when using `requests` (especially in production application environment, important to consider performance implications
- features can keep your app running smoothly (timeout control, session, retry limits)

<h3>Timeouts</h3>

- when make inline request to external service, system needs to wait for response b4 moving on
- if app takes too long for response, service could back up, UX could suffer, background jobs could hang
- `requests` waits indefinitely on response by default, we should specify timeout duration to prevent bad things from happening
- to set request's timeout, use `timeout` param
- `timeout` can be integer or float representing # of seconds to wait on response b4 timing out

In [35]:
# set timeout duration to 1 second
requests.get('https://api.github.com', timeout = 1)

<Response [200]>

In [36]:
# request will timeout after 3.05 seconds
requests.get('https://api.github.com', timeout = 3.05)

<Response [200]>

- can also pass a tuple to `timeout` w/ first element being a **connect timeout** (the time it allows for client to establish connection to server), second element is a **read timeout** (time it will wait on response once client has established a connection)

In [37]:
# if request establishes connection w/i 2 seconds and receives data w/i 5 sections of being established
# response will be returned as it was before, if request times out: raise a Timeout exception
requests.get('https://api.github.com', timeout = (2, 5))

<Response [200]>

In [38]:
# catch Timeout exception + respond accordingly
import requests
from requests.exceptions import Timeout

try:
    response = requests.get('https://api.github.com', timeout = 1)
except Timeout:
    print('request timed out')
else:
    print('request did not time out')

request did not time out


<h3>The Session Object</h3>

- been dealing w/ high level `requests` such as `get() and post()`
- these funcs are **abstractions** of what's going on when making requests
- they hide **implementations** of how connections are managed so don't have to worry

- underneath abstractions is `Session` class:
    - if need to fine-tune control over how requests are being made or improve performance of requests, may need to use `Session` instance directly
- Sessions used to persist params across requests

In [39]:
# use same authentication across multiple requests, use session:
import requests
from getpass import getpass

# by using context manager, can ensure resources used by session will be released after use
with requests.Session() as session:
    session.auth = ('username', getpass())
    
    # instead of requests.get(), use session.get()
    response = session.get('https://api.github.com/user')
    
# can inspect response just like you did before
print(response.headers)
print(response.json)

········
{'Server': 'GitHub.com', 'Date': 'Tue, 21 Apr 2020 01:25:09 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Status': '403 Forbidden', 'X-GitHub-Media-Type': 'github.v3; format=json', 'X-RateLimit-Limit': '60', 'X-RateLimit-Remaining': '52', 'X-RateLimit-Reset': '1587433315', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Vary': 'Accept-Encoding, Accept, X-Requested-With', 'Content-Encoding': 'gzip', 'X-G

- each time make request w/ `session`, once it has been initialized w/ authentication credentials, credentials will be persisted

- primary performance optimization of sessions comes in the form of **persistent connections**
    - when app makes connection to server using `Session`, keeps connection around in connection pool
    - when app wants to connec tot same server again, will reuse connection from pool rather than establishing new one
    
<h3>Max Retries</h3>

- when request fails, may want app to retry same request
- `requests` do not do this by default
- to apply this functionality, implement custom **TransportAdapter**
    - TransportAdapter defines set of configs per service you're interacting with
    - e.g. want all requests to `http://api.github.com` to retry 3 times b4 raising `ConnectionError`
    - would want to build a TransportAdapter, set its max_retries param and mount to existing `Session`:

In [40]:
import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError

github_adapter = HTTPAdapter(max_retries = 3)

session = requests.Session()

# use github_adapter for all requests to endpoints that start w/ this URL
session.mount('https://api.github.com', github_adapter) # mount HTTPAdapter, github_adapter to session
                                                        # session will adhere its config for each request to api.github.com

try:
    session.get('https://api.github.com')
except ConnectionError as ce:
    print(ce)

- Timeouts, Transport Adapters, and sessions help keep code efficient + app resilient