In this notebook, we'll take a quick look at how `POST` requests can be handled using Requests. `POST` requests are typically used to submit entered form contents to a web server.

Submitting `POST` requests with Requests is very simple, we just replace `.get(...)` with `.post(...)` and use the `data` argument to specify our form data. Requests will take care of encoding the data correctly for us. Note that you can still use `params` as well to specify URL parameters.

In [1]:
import requests

# Simple example

We will use the example web site http://www.webscrapingfordatascience.com/postform2/ here. As always, make sure to inspect the form using your browser.

In [2]:
url = 'http://www.webscrapingfordatascience.com/postform2/'

# First perform a normal GET request (we don't have to, but we can do so to take a look at the form)
r = requests.get(url)

print(r.text)

<html>
	<body>


		
		<form method="POST">

			<table border="1">
				<tr style="background-color: #24afe2;"><th>Name</th><th>Value</th></tr>

				<tr><td>Your name</td>
					<td><input type="text" name="name"></td></tr>

				<tr><td>Your gender</td>
					<td><input type="radio" name="gender" value="M">Male<br>
						<input type="radio" name="gender" value="F">Female<br>
						<input type="radio" name="gender" value="N">Other / prefer not to say</td></tr>

				<tr><td>Food you like</td>
					<td><input type="checkbox" name="pizza" value="like">Pizza!<br>
						<input type="checkbox" name="fries" value="like">Fries please<br>
						<input type="checkbox" name="salad" value="like">Salad for me</td></tr>

				<tr><td>Your hair color</td>
					<td>
						<select name="haircolor">
							<option value="black">Black hair</option>
							<option value="brown">Brown hair</option>
							<option value="blonde">Blonde hair</option>
							<option value="other">Other</option>
						</select>
				

In [3]:
# Next, we submit the form
formdata = {
    'name': 'Seppe',
    'gender': 'M',
    'pizza': 'like',
    'haircolor': 'brown',
    'comments': ''
}

r = requests.post(url, data=formdata)
print(r.text)

<html>
	<body>


<h2>Thanks for submitting your information</h2>

<p>Here's a dump of the form data that was submitted:</p>

<pre>array(5) {
  ["name"]=>
  string(5) "Seppe"
  ["gender"]=>
  string(1) "M"
  ["pizza"]=>
  string(4) "like"
  ["haircolor"]=>
  string(5) "brown"
  ["comments"]=>
  string(0) ""
}
</pre>


	</body>
</html>



Note that Requests also specifies a different argument, `files`, which can be used to upload files in case the server expects it. See https://requests.readthedocs.io/en/master/user/quickstart/#post-a-multipart-encoded-file for more info on this.

# Quotes to Scrape

Let's now move on to a more complicated example, as hosted on http://quotes.toscrape.com/search.aspx.

For the sake of this example, say that we're not interested in getting all the quotes for authors (though feel free to try this), but rather fetch the list of tags for each author. Let's try this now. We first need to import Beautiful Soup as well.

In [4]:
from bs4 import BeautifulSoup

In [5]:
url = 'http://quotes.toscrape.com/search.aspx'

In [6]:
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

Let's first get the list of authors.

In [7]:
authors = [element.get('value') for element in soup.find(id='author').find_all('option') if element.get('value')]
authors[:3]

['Albert Einstein', 'J.K. Rowling', 'Jane Austen']

Just selecting the tag drop down doesn't work. So we need to figure out what happens if we select a particular author:

In [8]:
soup.find(id='tag').find_all('option')

[<option>----------</option>]

Again, make sure to follow along in your browser. You will see that a `POST` request is performed to http://quotes.toscrape.com/filter.aspx. The form data contains `author`, `tag` as well as a strange `__VIEWSTATE` field. Let's see if we can simply ignore that field...

In [10]:
filter_url = 'http://quotes.toscrape.com/filter.aspx'

In [11]:
requests.post(filter_url, data={
    'author': 'Albert Einstein'
})

<Response [500]>

That doesn't work, how about with the tag included?

In [12]:
requests.post(filter_url, data={
    'author': 'Albert Einstein',
    'tag': '----------'
})

<Response [500]>

Same. We hence have no choice but to get out the viewstate from the HTML. As such, we can define an author tag retrieving function in two ways. The first one is as follows:

In [13]:
def get_author_tags(author):
    # First request the search page
    soup = BeautifulSoup(requests.get(url).text, 'html.parser')
    # Get out the viewstate
    viewstate = soup.find(id='__VIEWSTATE').get('value')
    # Now perform the post
    soup = BeautifulSoup(requests.post(filter_url, data={
        'author': author,
        'tag': '----------',
        '__VIEWSTATE': viewstate
    }).text, 'html.parser')
    # And get out the list of tags
    return [element.get('value') for element in soup.find(id='tag').find_all('option') if element.get('value')]

In [14]:
get_author_tags('Albert Einstein')

['change',
 'deep-thoughts',
 'thinking',
 'world',
 'inspirational',
 'life',
 'live',
 'miracle',
 'miracles',
 'adulthood',
 'success',
 'value',
 'simplicity',
 'understand',
 'children',
 'fairy-tales',
 'imagination',
 'knowledge',
 'learning',
 'understanding',
 'wisdom',
 'simile',
 'music',
 'mistakes']

In [15]:
get_author_tags('Jane Austen')

['aliteracy',
 'books',
 'classic',
 'humor',
 'friendship',
 'love',
 'romantic',
 'women',
 'library',
 'reading',
 'elizabeth-bennet',
 'jane-austen']

This works, but having to perform the `GET` request to the main page every time is annoying, and won't always work (i.e. sites will not always have an option to go back to an initial state). As such, the following is even better:

In [17]:
def get_author_tags(author, viewstate=None):
    # If the viewstate is None, get out the first one
    if not viewstate:
        soup = BeautifulSoup(requests.get(url).text, 'html.parser')
        viewstate = soup.find(id='__VIEWSTATE').get('value')
    soup = BeautifulSoup(requests.post(filter_url, data={
        'author': author,
        'tag': '----------',
        '__VIEWSTATE': viewstate
    }).text, 'html.parser')
    viewstate = soup.find(id='__VIEWSTATE').get('value')
    # Return the tags and viewstate for the next request
    return [element.get('value') for element in soup.find(id='tag').find_all('option') if element.get('value')], \
            viewstate

In [18]:
tags, viewstate = get_author_tags('Albert Einstein')
tags

['change',
 'deep-thoughts',
 'thinking',
 'world',
 'inspirational',
 'life',
 'live',
 'miracle',
 'miracles',
 'adulthood',
 'success',
 'value',
 'simplicity',
 'understand',
 'children',
 'fairy-tales',
 'imagination',
 'knowledge',
 'learning',
 'understanding',
 'wisdom',
 'simile',
 'music',
 'mistakes']

In [19]:
tags, viewstate = get_author_tags('Jane Austen', viewstate)
tags

['aliteracy',
 'books',
 'classic',
 'humor',
 'friendship',
 'love',
 'romantic',
 'women',
 'library',
 'reading',
 'elizabeth-bennet',
 'jane-austen']

Note that in a real-life example, you'd probably want to wrap this functionality in a custom class instead.