<h1 style="color:#D43D6E">
HTTP (GET and POST), APIs, and Web Scraping
</h1>

<br>
<h2>
This is what Reddit says:
</h2>

<ul>
<li>HTTP is the "way" in which web applications receive and send data around the world. - <i>shellsage</i></li>
<li>"POST" means "put this on the server", and "GET" means "get this from the server". - <i>rewboss</i></li>
<li>When you use an API, you're using an interface that lets the program you write control or access a program somebody else wrote. - <i>dmazzoni</i></li>
<li>Usually data scraping refers to getting data out of a web page without using an API. - <i>blitzkraft</i></li>
</ul>
***

<h3 style="color:#D43D6E">HTTP (GET and POST) Requests</h3>

Your computer sends requests to another computer to talk with it. The request can get data or post data with respect to the other computer.

**An example of a GET request**:
- Going to <https://www.google.com> sends a GET request from your computer to Google's servers. They get you the webpage from their servers then send it to your browser.
- Searching for things to buy at Amazon. Your computer tells Amazon to get data of the things you want to buy from their servers then send it to your browser. 

<br>

**An example of a POST request**
- You signing up to a website. Your login data is posted to their server then gets stored so you can login in the future.
- You posting a picture on Instagram is just your phone sending a POST request with the picture and caption to Instagram's servers.

***

<h3 style="color:#D43D6E">APIs</h3>

<h4 style="text-align:center">App to API to Server to API to App</h4>
<img src="https://lvivity.com/wp-content/uploads/2018/07/how-api-work.jpg" width="550">
<br>
<br>

**An API allows my program in langauge X to talk with your program in language Y**
- An API gives structure to GET and POST requests
- It's a **black box**. You don't need to know how it works, you only need to know how to use it

***

<h3 style="color:#D43D6E">Web Scraping</h3>

I want to get data from your program but you don't have an API. How do I do this?

**You request the website, then get data from the HTML code**
<br>

<h4 style="text-align:center">Do this with the browser's developer tools</h4>
<img src="https://images.zapier.com/storage/photos/19cc8efc198be50255261d5bc1a24667.png?format=jpg" width="700">

**The usual steps for this are:**
1. Send a GET request to the website
2. After getting the data, extract your data (usually done with the HTML tags)
3. **Profit $$$**

***
***

In [1]:
# Now we can show the above through code

<strong style="color:#D43D6E">Note: Do not try to memorize. Try to understand. You'll learn the syntax from Stackoverflow or Google!</strong>

In [2]:
# GET request
#
# Want to find textbooks posted on UBC Reddit?

import requests

url = 'https://api.pushshift.io/reddit/search/submission/'
params = {
    'q': 'textbook',
    'subreddit': 'UBC',
    'size': 1
}


get_req = requests.get(url=url, params=params)

print(get_req.url)

# Open the URL and see what's inside
# Try doing:
#
#     get_req.json()
#
# ... To access the data with your Python program

https://api.pushshift.io/reddit/search/submission/?q=textbook&subreddit=UBC&size=1


In [3]:
# POST request
#
# Want to store a password on the internet that expires after 1 view?

url = 'https://file.io/'
password = {'text':'xke02kdsadn20alnd22skna'}

post_req = requests.post(url=url, data=password)

# What kind of product or service can you build with this?
#
# Hint: A Snapchat for files

In [4]:
print(post_req.json())
print(post_req.json().get('link'))

# It says that it will expire after 14 days, but it actually expires after 1 view...
# Try opening it!

{'success': True, 'key': '67rhqs', 'link': 'https://file.io/67rhqs', 'expiry': '14 days'}
https://file.io/67rhqs


In [5]:
# GET requests + web scraping
#
# We want to get job postings from Indeed.com
# We are going to get python/javascript developer jobs
# ... and they will not be for senior or intermediate level developers
# ... also no blockchain or wordpress development

# After searching on Indeed, we get this link:
# https://ca.indeed.com/jobs?as_and=&as_phr=&as_any=python+javascript&as_not=senior+intermediate+blockchain+wordpress&as_ttl=developer&as_cmp=&jt=all&st=&as_src=&salary=&radius=25&l=Vancouver%2C+BC&fromage=7&limit=50&sort=&psf=advsrch
# It's a long link, so let's parse it using:
# https://www.freeformatter.com/url-parser-query-string-splitter.html

# we end up with this

url = 'https://ca.indeed.com/jobs'
params = {
    'as_ttl': 'developer',
    'as_any': 'python javascript',
    'as_not': 'senior intermediate blockchain wordpress',
    'jt': 'all',
    'l': 'vancouver,bc',
    'radius': '25',
    'limit': '50',
    'fromage': '7'
}

req_scrape = requests.get(url=url, params=params)
print(req_scrape.url)


# Try comparing the url printed and the url posted in the comments above
# It should exactly be the same thing
# Now we can scrape it for job titles and company names

https://ca.indeed.com/jobs?as_ttl=developer&as_any=python+javascript&as_not=senior+intermediate+blockchain+wordpress&jt=all&l=vancouver,bc&radius=25&limit=50&fromage=7


In [6]:
# Scraping with BeautifulSoup
#
# The code underneath basically says "turn the hmtl string into a parseable thing"

!pip install bs4 # we're installing the bs4 library first

from bs4 import BeautifulSoup

soup = BeautifulSoup(markup=req_scrape.text, features='html.parser')

In [7]:
# Use shift + tab on the find_all method to know what it does

jobs = soup.find_all(name='div', attrs={'data-tn-component': 'organicJob'})

In [8]:
# We can now find job postings data by specifying what their tags are
# You need to go to the Indeed website to figure out what the tags are
# Also, notice how there's a strip method in the code...
# Indeed doesn't have a public API, so it scraping get data that's a little dirty
# The strip method is there for some data cleaning

for job in jobs[:5]:
    title = job.find('h2').find('a').get('title')
    company = job.find('span', attrs={'class': 'company'}).text.strip()
    
    print('\n')
    print(title)
    print(company)

# Is this useful for you?
# What else can you do with this?



Junior python developer
P3Dprinter


JavaScript Mentor/Developer
CampusUp Solutions Inc.


Node.js Developer
Coinfield


Agile Full Stack Developer Co-Op (Summer 2019) – Analysis Workspace
Splunk


Agile Software Developer Co-Op (Summer 2019) – Machine Learning
Splunk


# Afterword

This is not a technical guide on how to use the Requests and BeautifulSoup libraries. They have documentation for that. Instead, what this is is a small taste of what can be done by knowing the basics of HTTP requests, API usage, and web scraping.

Too many things were skipped, but that's intentional. We believe the best way to learn stuff like this is by getting one's hand's dirty. So go build something!
<br>
<br>

**If you want to continue learning, we encourage you to try these resources:**
- HTTP and APIs (a 9-chapter course): <https://zapier.com/learn/apis/chapter-1-introduction-to-apis/>
- `Requests + BeautifulSoup` tutorial: <https://www.digitalocean.com/community/tutorials/how-to-work-with-web-data-using-requests-and-beautiful-soup-with-python-3>