Created by David Beales for the [Kelvin Smith Library](https://case.edu/library/) at [Case Western Reserve University](https://case.edu) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email rdb104@case.edu.<br />
___

# Web Scraping: Making a Request and Receiving a Response

**Description:** This lesson introduces the basic web scraping workflow using the `requests` library for Python.  

**Use Case:** For Learners (Additional explanation, not ideal for researchers)

**Difficulty:** Beginner

**Completion time:** 15 minutes

**Knowledge Required:** Basic Python

**Knowledge Recommended:** HTML Structure

**Data Format:** `html`, `txt`, `py` 

**Libraries Used:** `requests` to demonstrate code cell execution
___

## Introduction
Welcome to your first web scraping project.  

### HTTP Requests: Sending a request and checking for a response.

HTTP is a protocol for fetching resources like HTML documents, images, video, ads (yuck), and other content that make up a web page.  HTTP is a client-server protocol meaning that the client, usually a web browser, initiates a request and the server(s) sends back a response object that contains all the data.  This is where the python package `requests` gets its name, from the request piece of the request/response exchange.

Web scraping doesn't use a web browser to initiate an HTTP request. We are going to use a Python script instead.  

Before we can begin using a Python package, we have to import it.  Run the cell below to import the `requests` package.

In [3]:
import requests  #https://requests.readthedocs.io/

Now that the `requests` package has been imported we can use the various excellent methods that are built into the package. 
`requests.get` will send a `get` request to a web address that you specify.

Try running the code below.  What response do you get?

In [None]:
requests.get('https://api.github.com/events')

Python sent a request to the URL we specified, https://api.github.com/events in this case.  The response we got back is the first piece of information you need when web scraping.  Could we connect?!

The 200 is one of what are called **http response status codes** and it means our attempt to connect was a success.  Excellent!

A code in the 400s or 500s would mean there was an error connecting to the server.  

If you want to learn more about the status codes, you can check out the developer documentation from Mozilla here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status.

### Viewing the content of a response.

The status code is only one piece of information in the response object.  Now that we've made a successful connection to a website, we can start to look at the other information in the response object returned by our `get` request.  

The first thing we'll need to do is store the response object in a variable using the `=` so we can look at the content without repeatedly sending requests to the web server.

THe code below is the same as our first `get` request, but it is stored in the variable `r`.  You could name this variable anything you wanted.  In this example we chose r as a shortened version of response.  If you change the variable name here, you'll have to change it in all the following code examples as well.  So, just leave it as `r` for now.  

When you're ready, run the code cell below.

In [4]:
r = requests.get('https://api.github.com/events')

You'll notice this time, you didn't get an http status code as a response.  When you store a response in a variable, python won't display any information unless you use a command to ask it to show a piece of the response stored there. 

If we want to check the http status code again, we can call the variable, and use the 

In [5]:
r.status_code

200

## Lesson Complete

Congratulations! You have completed "Getting Started with Jupyter Notebooks." If you have never programmed in [Python](https://docs.constellate.org/key-terms/#python) before, we recommend you complete:
* *Python Basics* I
* *Python Basics* II
* *Python Basics* III

### Start Next Lesson: [Python Basics I](./python-basics-1.ipynb)