# Web Scraping Basics

Web scraping is a way to **automatically collect information from websites**.  
Think of it as using a program to read a web page, just like your browser does, but instead of looking at it, the program grabs the data you want.

---

## Let’s break down your points:

### a) Use `requests` to fetch pages

#### What is `requests`?

- `requests` is a **Python library** (tool) that helps your code download web pages from the internet.
- Imagine typing a website address in your browser and hitting enter.  
  `requests` lets your code do the **same thing**—it can visit websites and get the page data for you!

# Example

In [28]:
import requests

In [29]:
url = "https://techsabyte.com/"

In [30]:
url

'https://techsabyte.com/'

In [31]:
response = requests.get(url)

In [32]:
response.text

'<!doctype html>\n<html lang="en">\n  <head>\n    <meta charset="UTF-8" />\n    <!-- <link rel="icon" type="image/svg+xml" href="/vite.svg" /> -->\n    <meta name="viewport" content="width=device-width, initial-scale=1.0" />\n    <title>TechsaByte</title>\n    <script type="module" crossorigin src="/assets/index-C1pstMX7.js"></script>\n    <link rel="stylesheet" crossorigin href="/assets/index-Dq-yZx2N.css">\n  </head>\n  <body>\n    <div id="root"></div>\n  </body>\n</html>\n'

### b) Parse HTML with BeautifulSoup

#### What is “HTML”?

- **HTML** is the code websites use to tell browsers what to display (text, images, buttons, etc.).

#### What is “BeautifulSoup”?

- **BeautifulSoup** is another **Python tool** that helps you read and understand (parse) the messy HTML code from a web page, so you can easily find the data you need.

# Example

In [33]:
from bs4 import BeautifulSoup

In [34]:
html_code = "<html><body><h1>Hello!</h1></body></html>"

In [35]:
html_code

'<html><body><h1>Hello!</h1></body></html>'

In [36]:
soup = BeautifulSoup(html_code, "html.parser")

In [37]:
soup.h1.text

'Hello!'

### c) Extract text, links, tables, images

With **BeautifulSoup**, you can pick out specific parts of a web page:

- **Text:** The actual words on the page
- **Links:** Website addresses (URLs) inside `<a>` tags
- **Tables:** Data organized in rows and columns
- **Images:** Picture links inside `<img>` tags

In [40]:
# get all text from the HTML
soup.get_text()

'Hello!'

In [39]:
# Extract all links
for link in soup.find_all("a"):
    print(link.get("href"))

In [41]:
# Extract all images
for img in soup.find_all("img"):
    print(img.get("src"))