# Using `requests` Module for Data Collection

Today we will see how to scrape websites and use the `requests` module to download the raw HTML of a webpage.  
In this section, we can safely use the following sites for scraping demos:

- https://quotes.toscrape.com/  
- https://books.toscrape.com/

---

## 1. What is `requests`?

- `requests` is a Python library used to send HTTP requests easily.  
- It allows you to fetch the content of a webpage programmatically.  
- It is commonly used as the first step before parsing HTML with BeautifulSoup.

---

## 2. Installing `requests`

To install `requests`, run:

```bash
pip install requests
```

---

## 3. Sending a Basic GET Request

**Example:**

```python
import requests
 
url = "https://example.com"
response = requests.get(url)
 
# Print the HTML content
print(response.text)
```

**Key Points:**
- `url`: The website you want to fetch.  
- `response.text`: The HTML content of the page as a string.

---

## 4. Checking the Response Status

Always check if the request was successful:

```python
print(response.status_code)
```

**Common Status Codes:**
- 200: OK (Success)  
- 404: Not Found  
- 403: Forbidden  
- 500: Internal Server Error  

**Good practice:**

```python
if response.status_code == 200:
    print("Page fetched successfully!")
else:
    print("Failed to fetch the page.")
```

---

## 5. Important Response Properties

| Property | Description |
|-----------|-------------|
| `response.text` | HTML content as Unicode text |
| `response.content` | Raw bytes of the response |
| `response.status_code` | HTTP status code |
| `response.headers` | Metadata like content-type, server info |

---

## 6. Adding Headers to Mimic a Browser

Sometimes websites block automated requests.  
Adding a User-Agent header helps the request look like it is coming from a real browser.

```python
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
 
response = requests.get(url, headers=headers)
```

---

## 7. Handling Connection Errors

Wrap your request in a try-except block to handle errors gracefully:

```python
try:
    response = requests.get(url, timeout=5)
    response.raise_for_status()  # Raises an HTTPError for bad responses
    print(response.text)
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
```

---

## 8. Best Practices for Fetching Pages

- Always check the HTTP status code.  
- Use proper headers to mimic a browser.  
- Set a timeout to avoid hanging indefinitely.  
- Respect the website by not making too many rapid requests.  

---

## 9. Summary

- `requests` makes it simple to fetch web pages using Python.  
- It is the starting point for most web scraping workflows.  
- Combining `requests` with BeautifulSoup allows for powerful data extraction.


In [1]:
import requests

In [2]:
url = "https://books.toscrape.com/"

In [3]:
response = requests.get(url)

In [4]:
for i in range(1, 51):
    new_response = requests.get(f"https://books.toscrape.com/catalogue/page-{i}.html")
    with open(f"htmls/page{i}.html","w", encoding = "utf-8") as f:
        f.write(new_response.text)
        print(f"Downloaded page {i} succesfully")

Downloaded page 1 succesfully


Downloaded page 2 succesfully


Downloaded page 3 succesfully


Downloaded page 4 succesfully


Downloaded page 5 succesfully


Downloaded page 6 succesfully


Downloaded page 7 succesfully


Downloaded page 8 succesfully


Downloaded page 9 succesfully


Downloaded page 10 succesfully


Downloaded page 11 succesfully


Downloaded page 12 succesfully


Downloaded page 13 succesfully


Downloaded page 14 succesfully


Downloaded page 15 succesfully


Downloaded page 16 succesfully


Downloaded page 17 succesfully


Downloaded page 18 succesfully


Downloaded page 19 succesfully


Downloaded page 20 succesfully


Downloaded page 21 succesfully


Downloaded page 22 succesfully


Downloaded page 23 succesfully


Downloaded page 24 succesfully


Downloaded page 25 succesfully


Downloaded page 26 succesfully


Downloaded page 27 succesfully


Downloaded page 28 succesfully


Downloaded page 29 succesfully


Downloaded page 30 succesfully


Downloaded page 31 succesfully


Downloaded page 32 succesfully


Downloaded page 33 succesfully


Downloaded page 34 succesfully


Downloaded page 35 succesfully


Downloaded page 36 succesfully


Downloaded page 37 succesfully


Downloaded page 38 succesfully


Downloaded page 39 succesfully


Downloaded page 40 succesfully


Downloaded page 41 succesfully


Downloaded page 42 succesfully


Downloaded page 43 succesfully


Downloaded page 44 succesfully


Downloaded page 45 succesfully


Downloaded page 46 succesfully


Downloaded page 47 succesfully


Downloaded page 48 succesfully


Downloaded page 49 succesfully


Downloaded page 50 succesfully


In [5]:
with open("requests_ran.flag","w") as f:
    f.write("Request File ran successfully")