Created by David Beales for the [Kelvin Smith Library](https://case.edu/library/) at [Case Western Reserve University](https://case.edu) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email rdb104@case.edu.<br />
___

# Web Scraping: Making a Request and Receiving a Response

**Description:** This lesson introduces the basic web scraping workflow using the `requests` library for Python.  

**Use Case:** For Learners (Additional explanation, not ideal for researchers)

**Difficulty:** Beginner

**Completion time:** 15 minutes

**Knowledge Required:** Basic Python

**Knowledge Recommended:** HTML Structure

**Data Format:** `html`, `txt`, `py` 

**Libraries Used:** `requests` to demonstrate code cell execution
___

## Introduction
Welcome to your first web scraping project.  

### HTTP Requests: Sending a request and checking for a response.

HTTP is a protocol for fethcing resources like HTML documents, images, video, ads (yuck), and other content that make up a web page.  HTTP is a client-server protocol meaning that the client, usually a web browser, initiates a request and the server(s) sends a response.  This is where the python package `requests` gets its name, from the request piece of the request/response exchange.

Web scraping doesn't use a web browser to initiate an HTTP request. We are going to use a Python script instead.  

Before we can begin using a Python package, we have to import it.  Run the cell below to import the `requests` package.

In [None]:
import requests  #https://requests.readthedocs.io/

Now that the `requests` package has been imported we can use the various excellent commands that are built into the package. 
`requests.get` will send a `get` request to a web address that you specify.  Makes sense, right?

Try running the code below.  What response do you get?

In [None]:
requests.get('https://api.github.com/events')

Python sent a request to the URL we specified, https://api.github.com/events in this case.  The response we got back is the first piece of information you need when web scraping.  Could we connect?!

The 200 is one of what are called **http response status codes** and it means our attempt to connect was a success.  Excellent!

A code in the 400s or 500s would mean there was an error connecting to the server.  

If you want to learn more about the status codes, you can check out the developer documentation from Mozilla here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status.

### Viewing the content of a response.

Now that we've made a successful connection to a website, we can start to look at the information that was returned by our `get` request.  

The first thing we'll need to do  is store the results of our request in a variable so we can look at the content without repeatedly sending requests to the web server.

THe code below is the same as our first `get` request, but it is stored in the variable `r`.  You could name this variable anything you wanted.  In this example we chose r as a short verison of response.  If you change the variable name here, you'll have to change it in all the following code examples as well.  So, just leave it as `r` for now.  

When you're ready, run the code cell below.

In [None]:
r = requests.get('https://api.github.com/events')

#### Try this!
* Does it matter if you use single or double quotes?
* Can you also insert a comment into the code cell?
* Can you write code and a comment on a single line? Which must come first?

After your code runs, you'll receive any output and a number will appear in the pair of brackets `[ ]:` to the left of the code cell to show the order the cell was run. If your code is complicated or takes some time to execute, an asterisk * will be displayed in the pair of brackets `[*]:` while the code executes. 

Execute the code cell below which:

1. Prints "Waiting 5 seconds..."
2. Waits 5 seconds
3. Prints "Done"

As the program is running, watch the pair of brackets and you will see the code is running `[*]:`.

In [None]:
print('Waiting 5 seconds...')
import time
time.sleep(5)
print('Done')

If you missed the asterisk, you can run the code cell as many times as you like. Notice that each time you run a [code cell](https://docs.constellate.org/key-terms/#code-cell) the number increases in the pair of brackets `[ ]:`. This keeps track of the order cells were run. While we will always run code in order from top to bottom, keep in mind that [code cells](https://docs.constellate.org/key-terms/#code-cell) can be run in any order. If you run a [code cell](https://docs.constellate.org/key-terms/#code-cell) at the bottom of a notebook that depends on the output of a [code cell](https://docs.constellate.org/key-terms/#code-cell) at the top, you will probably get an error. When you get an error, it's a good idea to check if you missed a [code cell](https://docs.constellate.org/key-terms/#code-cell) earlier that needed to be run first.

### Creating a Cell


By default, a [code cell](https://docs.constellate.org/key-terms/#code-cell) is created. To change the cell type, click on the dropdown menu.
![Change cell type menu](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/change_code_cell.gif)

### Deleting a Cell

![right clicking to delete cell](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/delete_cell.gif)


To delete a [cell](https://docs.constellate.org/key-terms/#cell), select the [cell](https://docs.constellate.org/key-terms/#cell) (or set of cells) and select "Delete Cells" from the "Edit" menu. (Alternatively, press the key "d" twice.)


### Modifying a Cell

The text in [code cells](https://docs.constellate.org/key-terms/#code-cell) can be quickly changed like a regular textbox. In order to change the content of a [markdown cell](https://docs.constellate.org/key-terms/#markdown-cell), you need to expose the markdown content underneath by double-clicking the [cell](https://docs.constellate.org/key-terms/#cell). This will reveal the plain text of the markdown that creates various elements like headings, links, images, etc. When you want the cell to render again, you can simply run it again by pushing the play button or pressing Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard.

## What is Markdown?

If you are familiar with HTML, markdown is a simplified way to write HTML elements. Basically it allows you to mark out where headings, italics, bold, and other kinds of basic formatting go. In terms of styling, markdown is very minimalist. If you would like to include an element that is not included in markdown in your notebook, you can also use HTML and CSS in your [markdown cells](https://docs.constellate.org/key-terms/#markdown-cell).

# Heading

Here is some *emphasis* and **bold**.

* Item one
* Item two
* Item three

1. item one
2. item two
3. item three

This is a [link to jstor](http://jstor.org).

### How do I write my own Markdown?
Here are some basic examples to get you started. Double-click on this cell to see how each was made. There are many markdown [cheatsheets](https://www.markdownguide.org/basic-syntax) available on the web.

#### Headers

|Markdown Syntax|Rendered Result|
|---|---|
|`# header title`| Largest header size |
|`## header title`| Second largest header |
|`### header title`| Third largest header |

#### Emphasis

*Use asterisks around texts to add emphasis, also known as italics*
_You can also use underscores_
~~A strike-thru effect is created with two tildes~~ ~~

#### Lists

A list of ordered items:
1. List item 1
2. List item 2

Unordered items:
* List item
* Also a list item

+ A list item
+ Another list item

- Also an item
- The last item

#### Links

This is a link to [JSTOR](http://jstor.org). 

#### Images

![Description of the image for accessibility(a jstor logo)](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/logoJSTOR.png)

#### Horizontal Rule

Create a horizontal rule with three hyphens, asterisks, or underscores.
____

## Lesson Complete

Congratulations! You have completed "Getting Started with Jupyter Notebooks." If you have never programmed in [Python](https://docs.constellate.org/key-terms/#python) before, we recommend you complete:
* *Python Basics* I
* *Python Basics* II
* *Python Basics* III

### Start Next Lesson: [Python Basics I](./python-basics-1.ipynb)