# requests

Extensive documentation can be found at the [Requests Home Page](http://docs.python-requests.org/en/master/) or [GitHub repository](https://github.com/requests/requests).

In [1]:
import requests

In [2]:
import os

## Submitting a GET Request

### A Simple GET Request

Let's start by retrieving the Home Page of [Private Property](https://www.privateproperty.co.za/).

In [3]:
r = requests.get("https://www.privateproperty.co.za/")

Check if the request was successful.

In [4]:
r.status_code

200

Look at the headers in the response.

In [5]:
r.headers

{'Cache-Control': 'private', 'Content-Type': 'text/html; charset=utf-8', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'Server': 'Microsoft-IIS/10.0', 'X-AspNetMvc-Version': '5.2', 'X-AspNet-Version': '4.0.30319', 'Set-Cookie': 'live-za.phoenix.dv=I1pVVz8Y4kmj76XoArDsFA; expires=Sun, 04-Oct-2116 13:29:43 GMT; path=/', 'X-Powered-By': 'ASP.NET', 'Date': 'Wed, 04 Oct 2017 13:29:43 GMT', 'Content-Length': '20598'}

The web is a wonderful place, filled with all sort of exotic documents. If encoding is specified in the HTML then this is obvious. Otherwise the HTTP headers are used to make an educated guess about the document's encoding.

In [6]:
r.encoding

'utf-8'

If `requests` get the encoding wrong then you can assign the correct value to this attribute.

### A GET Request with Parameters

Let's submit a search request on Private Property.

The search form has a number of fields, some of which are hidden. We can find out about those (and check the default values) by exploring the page contents.

In [7]:
params = {
    'locationPhrase' : 'Glenwood, Durban',
    'listingType' : 'Sales',
}

In [8]:
r = requests.get("https://www.privateproperty.co.za/Portal/Search/SearchBoxSearch", params=params)

Let's take a look at the response.

In [9]:
print(r.text)


<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta property="fb:app_id" content="171012896285253" />

    
    <meta name="msapplication-config" content="/browserconfig.xml">

    
        <meta name="description" content="Private Property has 410 houses, apartments, complexes, farms, land for sale in Glenwood. View photos, virtual tours and detailed property descriptions." />
    
    
    <meta name="og:image" content="https://prppublicstore.blob.core.windows.net/live-za-images/socialmediasharing/sharelogo.jpg" />

            <link rel="next" href="https://www.privateproperty.co.za/for-sale/kwazulu-natal/durban/durban-central-and-cbd/glenwood/592?page=2" />



    <title>
        Property and houses for sale in Glenwood, Durban Central and CBD | Private Property
    </title>
    <link href="/bundles/site.9f8c585f586a1d26313b.css" rel="stylesheet"/>

    
    
    <link href="/bundles/search.5064f8c17d5de94487

Obviously we need some other tools to make sense of that!

### Requesting Binary

You can also access the binary content of the response directly. This is useful, for example, if the document retrieved is an image.

In [10]:
r = requests.get("https://upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Python.svg/500px-Python.svg.png")
r.content

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\xf4\x00\x00\x01\xf4\x08\x06\x00\x00\x00\xcb\xd6\xdf\x8a\x00\x00\x00\x06bKGD\x00\xff\x00\xff\x00\xff\xa0\xbd\xa7\x93\x00\x00 \x00IDATx\x9c\xec\xddy\x98\\Wy\'\xfe\xef\xb9\xb7\xf6\xa5\xf7]-\xa9\xd5\x92\xba\xb5Y\x96,/\x80%[\x06\x1b\x82\x8dlH"\x0c\x0c$\x84I\x1c6\x1b\x1b3I\x9e\x87L\xa6\t\x99\x19\x9e\x99$@\x0c\xc9$\xbfd \xdb<\x19\x08a1C\x02rb\x07\x0c\xd8\x18\xef\x9b\xf6]\xbdwu\xd7^\xb7\xee\xbd\xe7\xfc\xfe\xa8V\xab%\xf5RU\xe7\xdc\xba\xb7\xbb\xde\x8f(\xb7Pw\xbd\xf7TuU\xbd\xf7\x9c{\xce{\x00B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x84\x10B\x08!\x8a1\xb7\x1b@\x08Y\xc0\xc1\x83\xfa\xf6l_;gF\x1b\xd7}\xed\x80\xdd&\x18\xc2Lh\x11\x08\xde\x08\xc6\xa2`\x08\x0b\x81\x86\xe5B1`\x06\x8c\t\xc6\x91\x12\xe0\xb6\xd0XF\xe3H@\x13\t\xce\x91\xe0:O\x84M+\xf1\xf2\xff\xfb\xb3\xe9Z<4B\x883(\xa1\x13Rc\xdb\x0f\x0e\x05`$\xd7\x99\x0c}\x8c\xf3\xf5`Z\x1f\x98X\xcf\x04\xd6\x0b\x81v0\xb4\x0

This can immediately be transformed into an image object.

In [11]:
from PIL import Image
from io import BytesIO

img = Image.open(BytesIO(r.content))
img.show()

### Requesting JSON

The `json()` method will perform JSON decoding.

In [12]:
r = requests.get('https://api.github.com/events')
r.text



That's a big JSON document. It'd be easier if it were translated into a Python data type. That's the job of the `json()` method.

In [13]:
events = r.json()
type(events)

list

In this case the response has been decoded into a list of dictionaries.

In [14]:
events

[{'actor': {'avatar_url': 'https://avatars.githubusercontent.com/u/8268115?',
   'display_login': 'west0r',
   'gravatar_id': '',
   'id': 8268115,
   'login': 'west0r',
   'url': 'https://api.github.com/users/west0r'},
  'created_at': '2017-10-04T13:35:23Z',
  'id': '6672943718',
  'org': {'avatar_url': 'https://avatars.githubusercontent.com/u/13049122?',
   'gravatar_id': '',
   'id': 13049122,
   'login': 'avito-tech',
   'url': 'https://api.github.com/orgs/avito-tech'},
  'payload': {'before': '3151f971afad4262c2c861b2fc3d8a5c633f0b79',
   'commits': [{'author': {'email': 'aakudryavtsev@avito.ru',
      'name': 'Aleksey Kudryavtsev'},
     'distinct': True,
     'sha': 'df434e2f5deb3504d177355403e50eecafd747ef',
     'url': 'https://api.github.com/repos/avito-tech/Paparazzo/commits/df434e2f5deb3504d177355403e50eecafd747ef'},
    {'author': {'email': 'aakudryavtsev@avito.ru',
      'name': 'Alexey Kudryavtsev'},
     'distinct': True,
     'message': 'Add CI badge',
     'sha': 'dd9

In [15]:
events[0]['id']

'6672943718'

## Submitting a POST Request

We're going to subscribe John Smith to [One Day Only](https://www.onedayonly.co.za/). If you are not yet a subscriber, feel free to substitute your details below. You can always cancel the subscription later.

First we'll submit the data to [HTTPBIN](http://httpbin.org/).

In [16]:
payload = {
    'firstname': 'John',
    'lastname': 'Smith',
    'email': 'info@energetix.co.za'
}
r = requests.post("http://httpbin.org/post", data=payload)

In [17]:
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "email": "info@energetix.co.za", 
    "firstname": "John", 
    "lastname": "Smith"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "58", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.1"
  }, 
  "json": null, 
  "origin": "196.22.242.26", 
  "url": "http://httpbin.org/post"
}



Now we'll actually submit the form.

In [18]:
r = requests.post("https://www.onedayonly.co.za/subscribe/campaign/confirm/", data=payload)

There's a voucher code included in the response. It'd be cool to retrieve that. Maybe we'll give it a go later.

### Checking for Success

Check if we were successful.

In [19]:
r

<Response [200]>

In [20]:
r.status_code

200

In [21]:
requests.codes

<lookup 'status_codes'>

In [22]:
r.status_code == requests.codes.ok

True

### Dealing with Failure

Not every request will be successful. We need to gracefully deal with failure.

This will throw an exception if the request was not successful.

In [23]:
r.raise_for_status()

Let's illustrate for a 404 error.

In [24]:
status_404 = requests.get('http://httpbin.org/status/404')
status_404.status_code

404

Unless you actually checked the status code you'd be blissfuly unaware of the problem.

In [25]:
try:
    status_404.raise_for_status()
except requests.HTTPError as error:
    print('Something went wrong! [%s.]' % str(error))

Something went wrong! [404 Client Error: NOT FOUND for url: http://httpbin.org/status/404.]


### Checking Response Headers

The response headers can be accessed as a dictionary.

In [26]:
r.headers

{'Server': 'nginx', 'Date': 'Wed, 04 Oct 2017 13:38:20 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': 'PHPSESSID=bm5se37fith55lh4uk1aqm5v23; expires=Wed, 11-Oct-2017 13:38:18 GMT; path=/', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Pragma': 'no-cache', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip'}

In [27]:
r.headers['Content-Encoding']

'gzip'

But it's a special dictionary where the keys are case insensitive (in order to comply with [RFC 7230 Section 3.2](https://tools.ietf.org/html/rfc7230)).

In [28]:
r.headers.get('content-encoding')

'gzip'

## Using Custom Headers

Sometimes we'd like to manipulate the request headers.

Let's see what the default User Agent string is.

In [29]:
r = requests.get('http://www.whatsmyua.info/api/v1/ua')
user_agent = r.json()[0]

In [30]:
user_agent

{'meta': {'name': 'useragent',
  'repo': 'https://github.com/3rd-Eden/useragent',
  'version': '2.1.9'},
 'os': {'family': 'Other',
  'major': '0',
  'minor': '0',
  'patch': '0',
  'string': {'family': 'Other', 'major': '0', 'minor': '0', 'patch': '0'}},
 'ua': {'device': {'family': 'Other',
   'major': '0',
   'minor': '0',
   'patch': '0'},
  'family': 'Python Requests',
  'major': '2',
  'minor': '18',
  'patch': '0',
  'rawUa': 'python-requests/2.18.1'}}

In [31]:
user_agent['ua']['rawUa']

'python-requests/2.18.1'

Looks like it's `python-requests`. That's kind of a give away that we're scraping. Perhaps we'd like to look more human.

In [32]:
headers = {
    'User-Agent': 'Lynx/2.8.6rel.5 libwww-FM/2.14'
}

In [33]:
requests.get('http://www.whatsmyua.info/api/v1/ua', headers=headers).json()

[{'meta': {'name': 'useragent',
   'repo': 'https://github.com/3rd-Eden/useragent',
   'version': '2.1.9'},
  'os': {'family': 'Other',
   'major': '0',
   'minor': '0',
   'patch': '0',
   'string': {'family': 'Other', 'major': '0', 'minor': '0', 'patch': '0'}},
  'ua': {'device': {'family': 'Other',
    'major': '0',
    'minor': '0',
    'patch': '0'},
   'family': 'Lynx',
   'major': '2',
   'minor': '8',
   'patch': '6',
   'rawUa': 'Lynx/2.8.6rel.5 libwww-FM/2.14'}},
 {'meta': {'name': 'ua-parser-js',
   'repo': 'https://github.com/faisalman/ua-parser-js',
   'version': '0.7.11'}},
 {'device': {'description': 'Lynx/2.8.6rel.5 libwww-FM/2.14',
   'manufacturer': None,
   'product': None},
  'meta': {'name': 'platform.js',
   'repo': 'https://github.com/bestiejs/platform.js/',
   'version': '1.3.3'},
  'os': {'os': {'architecture': None, 'family': None, 'version': None}},
  'ua': {'layout': None, 'name': None, 'version': None}}]

Use [this site](http://www.whoishostingthis.com/tools/user-agent/) to find the User Agent string for your browser. Might also be worthwhile looking at this [catalog](http://www.useragentstring.com/pages/useragentstring.php) of User Agent strings.

## Cookies

Many web sites store information locally as cookies. If you're interested in looking at cookies in your browser, check out the [EditThisCookie](http://www.editthiscookie.com/) extension.

In [34]:
r = requests.get("http://www.wikipedia.org")
r.cookies.keys()

['GeoIP']

Cookies are stored in a (mutable) `RequestsCookieJar` object.

In [35]:
type(r.cookies)

requests.cookies.RequestsCookieJar

This is essentially a dictionary.

In [36]:
r.cookies.get_dict()

{'GeoIP': 'ZA:WC:Cape_Town:-33.93:18.42:v4'}

In [37]:
r.cookies['GeoIP']

'ZA:WC:Cape_Town:-33.93:18.42:v4'

You can build your own cookies and submit them along with a request.

The `Domain` and `Path` of a cookie specify where it is applied. You can find out more in [RFC 2965](https://tools.ietf.org/html/rfc2965).

In [38]:
jar = requests.cookies.RequestsCookieJar()

jar.set('foo', '1', domain='httpbin.org', path='/cookies')
jar.set('bar', '5', domain='httpbin.org', path='/elsewhere')

r = requests.get('http://httpbin.org/cookies', cookies=jar)
#
# Note that only the cookies for the 'cookies' path are returned.
#
r.text

'{\n  "cookies": {\n    "foo": "1"\n  }\n}\n'

However these cookies are not persisted across multiple requests.

In [39]:
r = requests.get('http://httpbin.org/cookies/set/foo/1')
r.text

'{\n  "cookies": {\n    "foo": "1"\n  }\n}\n'

In [40]:
requests.get('http://httpbin.org/cookies').text

'{\n  "cookies": {}\n}\n'

To get that working you need to use [sessions](http://docs.python-requests.org/en/master/user/advanced/#session-objects).

## Sessions

Sessions allow you to persist information between requests. Specifically:

- share cookies across requests and
- reuse same TCP connection.

We'll perform the same requests as above but this time using a `Session` object.

In [41]:
s = requests.Session()

In [42]:
s.get('http://httpbin.org/cookies/set/foo/1')

<Response [200]>

In [43]:
r = s.get('http://httpbin.org/cookies')
r.text

'{\n  "cookies": {\n    "foo": "1"\n  }\n}\n'

The `Session` object has a number of useful properties. For example:

- `cookies`
- `headers`
- `auth`
- `proxies`
- `max_redirects`.

Let's take a look at `headers`.

In [44]:
s.headers

{'User-Agent': 'python-requests/2.18.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

If we update this then all subsequent requests in the session will use the updated headers.

In [45]:
s.headers.update({
    'user-agent': 'Lynx/2.8.6rel.5 libwww-FM/2.14'
})

In [46]:
s.headers

{'user-agent': 'Lynx/2.8.6rel.5 libwww-FM/2.14', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

A `Session` object can also be used as a context manager.

In [47]:
with requests.Session() as s:
    s.get('http://httpbin.org/get')

## Robust Requests

In [48]:
try:
    # Good practice to apply a timeout, otherwise this could hang indefinitely.
    r = requests.get('http://github.com', timeout=1)
    # Check status.
    r.raise_for_status()
except requests.HTTPError:
    # 404 error, 500 error.
    pass
except requests.ConnectionError:
    # DNS failure, connection refused.
    pass
except requests.Timeout:
    # Request took too long.
    pass
except requests.TooManyRedirects:
    pass