-
Notifications
You must be signed in to change notification settings - Fork 0
/
HTTP.html
50 lines (49 loc) · 3.95 KB
/
HTTP.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<!DOCTYPE html>
<html>
<head>
<title>HTTP.md</title>
<link rel="stylesheet" href="OmegaTech.css">
</head>
<body>
<h1 id="http">HTTP</h1>
<p>(This is intentionally simple, superficial.)</p>
<p>HTTP and HTTPS (secure HTTP) make the world go around these days.<br>HTTPS is just HTTP with secure, encrypted connections that<br>allow us to verify the site is who it says it is, and we are who we say we are,<br>and that "nobody" can eavesdrop.</p>
<p>HTTP is the Hyper Text Transfer Protocol.<br>It is a set of conventions for communicating,<br>specifically for one party (the client)<br>to ask a server for the contents or value<br>associated with a specific Web address, a URL (Uniform Resource Locator).</p>
<p>A Web browser is an HTTP client. R is an HTTP client. wget or curl are command-line<br>HTTP clients. And so on.</p>
<p>After the initial connection, an HTTP transaction<br>consists of a <strong>Request</strong> from the client<br>and a <strong>Response</strong> from the server. There can be a lot going on in these.</p>
<p>Both a request and a response consist of two parts - Header and Body.<br>The request may have an empty Body and overwhelming number of requests<br>in which we are interested for <em>scraping</em> have no body.</p>
<p>The header of a request or response provides information about<br>the request and/or describes what is in the body, e.g.,<br>how many bytes are in the body, what is the format of the content (a JPEG, text, HTML).</p>
<p>An HTTP conversation might be just one request and response,<br>or it may be a sequence request-response interactions.</p>
<p>HTTP has 3 important verbs</p>
<ul>
<li>GET</li>
<li>POST</li>
<li>PUT</li>
</ul>
<p>Most HTTP requests are GET.<br>These have no body.</p>
<p>POST and PUT have a body and for our purposes are very, very similar.</p>
<p>When we make a request to a URL that has user-specific inputs,<br>e.g., in a Web form, rather than just to a fixed URL,<br>the request can include these inputs/parameters in various ways.</p>
<p>In the GET method, it appends the inputs to the end of the URL.<br>It separates these via the ? character. Each of the inputs<br>is separated from the others via a & and appears as name=value,<br>e.g.</p>
<pre><code><div class="highlight"><pre><span class="nx">https</span><span class="o">:</span><span class="c1">//www.rateinflation.com/consumer-price-index/usa-historical-cpi?start-year=2002&end-year=2017</span>
</pre></div>
</code></pre><p>The URL is<br><code>https://www.rateinflation.com/consumer-price-index/usa-historical-cpi</code>.<br>The two inputs are named<br>start-year and end-year and take values 2002 and 2017 in this request.</p>
<p>When we may want to send a lot of data as input to the request, we cannot it append it to the URL.<br>Instead, we use a POST (or PUT) request and these inputs are sent in the body of the request. There<br>are three ways to include this information in the body - www-urlencoded-form, as<br>multipart/form-data, or directly.</p>
<p>Sending data in the body avoids having to escape binary data (e.g., images, video, saved R objects)<br>to regular ASCII characters and then convert them back.</p>
<p>With RCurl (and hopefully other packages), all you need to know for an HTML<br>form is whether it expects a GET or POST request. This is in the method attribute<br>of the HTML form.</p>
<p>Use higher-level functions for submitting forms. Don't paste the inputs together to create the URL for a GET operation.<br>Pasting is not as flexible, but more importantly, you have to escape certain characters,<br>e.g. &, ? since they are used in the URL for a purpose, and also characters<br>such as space, {, ...</p>
<ul>
<li>HTTP requests<ul>
<li>Request -> Response.<ul>
<li>Header & Body (optional)</li>
</ul>
</li>
<li>GET host/path/to/file</li>
<li>GET with form parameters</li>
<li>POST for sending contents</li>
<li>See the requests in browser developer tools.</li>
<li>Header information in request, response.</li>
</ul>
</li>
</ul>
</body>
</html>