-
Notifications
You must be signed in to change notification settings - Fork 0
/
Outline.html
74 lines (73 loc) · 2.55 KB
/
Outline.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
<!DOCTYPE html>
<html>
<head>
<title>Outline.md</title>
<link rel="stylesheet" href="OmegaTech.css">
</head>
<body>
<h1 id="web-scraping-apis">Web Scraping & APIs</h1>
<ul>
<li>Git Repository: <a href="https://github.com/dsidavis/WebScrapingFall17">https://github.com/dsidavis/WebScrapingFall17</a></li>
<li>Web Page: <a href="http://dsi.ucdavis.edu/WebScraping/">http://dsi.ucdavis.edu/WebScraping/</a></li>
</ul>
<h2 id="workshop-goals-">Workshop Goals:</h2>
<ol>
<li>Understand the role of elements of the Web (HTTP, HTML, JSON, XPath, XML, JavaScript)</li>
<li>Become familiar with the essential tools to work with some of these</li>
<li>Explore strategies for extracting data.</li>
</ol>
<p>The tools are important; the thought process is more important to learn.</p>
<h2 id="-scraping-apis-concepts-scraping-html-"><a href="Scraping.html">Scraping & APIs Concepts</a></h2>
<ul>
<li>What is scraping.</li>
<li>APIs much preferred over scraping</li>
<li>Better ways</li>
<li>Rules/Restrictions for scraping.</li>
</ul>
<h2 id="-http-requests-http-html-"><a href="HTTP.html">HTTP Requests</a></h2>
<ul>
<li><a href="Inflation.html">Inflation table</a></li>
</ul>
<h2 id="html-tables">HTML Tables</h2>
<ul>
<li><a href="Inflation.html">Inflation table</a></li>
</ul>
<h2 id="-html-forms-http-html-"><a href="HTTP.html">HTML Forms</a></h2>
<ul>
<li><a href="Inflation.html">GET</a></li>
<li><a href="directEnergy.html">POST and Browser Developer Tools</a></li>
</ul>
<h2 id="html-links">HTML Links</h2>
<ul>
<li><a href="CACities.R">California Cities</a></li>
</ul>
<h2 id="-xpath-xpath-html-"><a href="XPath.html">XPath</a></h2>
<ul>
<li><a href="Rent.R">City Rents</a></li>
<li><a href="cybercoders.R">Data Science Jobs</a></li>
</ul>
<h2 id="-javascript-dynamic-content-selenium-html-"><a href="Selenium.html">JavaScript & Dynamic Content</a></h2>
<ul>
<li><a href="selenium.R">selenium.R</a></li>
</ul>
<h2 id="apis">APIs</h2>
<ul>
<li><a href="geocoding.R">GeoCoding</a></li>
<li><a href="nrel.R">Renewable Energy Electricity Rates</a></li>
</ul>
<h2 id="http-options">HTTP Options</h2>
<pre><code><div class="highlight"><pre><span class="nx">sort</span><span class="p">(</span><span class="nx">names</span><span class="p">(</span><span class="nx">getCurlOptionsConstants</span><span class="p">()))</span>
</pre></div>
</code></pre><ul>
<li>verbose</li>
<li>followlocation</li>
<li>useragent</li>
<li>cookies - cookiejar, cookie, cookiefile</li>
<li>useragent</li>
<li>httpheader</li>
<li>cainfo, capath, certinfo</li>
<li>userpwd</li>
</ul>
<h2 id="cookies">Cookies</h2>
</body>
</html>