## Import the Requests Library

In [1]:
import requests as r

Note the requests that are made in the subsequent sections are made to a very static web server hosting a Learning Website. 
You should be able to make the same general "get" calls and receive identical information.

### Our Intitial HTTP request

In [3]:
# Web Scraping "Hello from the web!"
url = 'http://www.webscrapingfordatascience.com/basichttp/'
req = r.get(url)
print(req.text)

Hello from the web!



### Notable Data Points returned with the HTTP Requests

In [8]:
# Which HTTP status code did we get back from the server?
print(req.status_code)

200


In [9]:
# What is the textual status code?
print(req.reason)

OK


In [10]:
# What were the HTTP response headers? 
print(req.headers)

{'Date': 'Mon, 08 Oct 2018 20:42:33 GMT', 'Server': 'Apache/2.4.18 (Ubuntu)', 'Content-Length': '20', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=UTF-8'}


In [12]:
# The request information is saved as a Python object in req.requests
print(req.request)

<PreparedRequest [GET]>


In [19]:
# Returns the Requested URL
print(req.request.url)

http://www.webscrapingfordatascience.com/basichttp/


In [14]:
# What were the HTTP requests headers?
print(req.request.headers)

{'User-Agent': 'python-requests/2.19.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}


In [15]:
# The HTTP response content. 
print(req.text)

Hello from the web!



### Attempting to make a request to a website that uses parameters

In [16]:
url2 = 'http://webscrapingfordatascience.com/paramhttp/'
req2 = r.get(url2)
print(req2.text)

Please provide a "query" parameter


### Passing the 'test' parameter within the url request

In [17]:
url3 = 'http://webscrapingfordatascience.com/paramhttp/?query=test'
req3 = r.get(url3)
print(req3.text)

I don't have any information on "test"


The Request library will attempt encode some special charachters for you i.e spaces

In [23]:
#Get request with white spaces passed into the parameter section

url4 = 'http://www.webscrapingfordatascience.com/paramhttp/?query=a query with spaces'
req4 = r.get(url4)
# Parameter will be encoded as 'a%20query%20with%20spaces'

# You can verify this by looking at the prepared request URL:
print(req4.request.url)
# Will show [...]/paramhttp/?query=a%20query%20with%20spaces

http://www.webscrapingfordatascience.com/paramhttp/?query=a%20query%20with%20spaces


In [21]:
print(req4.text)
# Will show: I don't have any information on "a query with spaces"

I don't have any information on "a query with spaces"


Sometimes Request is not able to encode the charachters, in this instance they will be passed direclty with the url.

In [28]:
url5 = 'http://www.webscrapingfordatascience.com/paramhttp/?query=complex?&'
req5 = r.get(url5)

#the '&' charachter will be dropped by the responding web server

In [29]:
print(req5.request.url)

http://www.webscrapingfordatascience.com/paramhttp/?query=complex?&


In [30]:
print(req5.text)

I don't have any information on "complex?"


### You can attempt to avoid encoding errors by using the quote & quote_plus functions. They are included in the urllib.parse library. 

quote: replaces spaces with encoded charachters using the '%XX' format. (i.e. '=' is %3D, ' ' is %20 and so on...) 

quote_plus: replace spaces with (+) and sepcial characters using the '%XX' format (generaly used for query strings)

In [31]:
from urllib.parse import quote, quote_plus

In [34]:
raw_string_value = 'a string / value ?with & spaces =and characters'

print(quote(raw_string_value))
print(quote_plus(raw_string_value))

a%20string%20/%20value%20%3Fwith%20%26%20spaces%20%3Dand%20characters
a+string+%2F+value+%3Fwith+%26+spaces+%3Dand+characters
