# Introduction to HTML for Web Scraping

To effectively scrape websites, it's essential to understand how HTML (HyperText Markup Language) structures content on web pages.

---

## 1. Basic Structure of an HTML Document

```html
<!DOCTYPE html>
<html>
  <head>
    <title>Page Title</title>
  </head>
  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
  </body>
</html>
```

Breakdown:
- ```<!DOCTYPE html>:``` Declares the document as HTML5.
- ```<html>:``` Root of the HTML document.
- ```<head>:``` Metadata like title, styles, scripts.
- ```<body>:``` Main content of the page.


### 2: HTML Tags
Tags are used to define elements.

| Tag            | Description               |
| -------------- | ------------------------- |
| `<h1>`         | Heading (levels h1 to h6) |
| `<p>`          | Paragraph                 |
| `<a>`          | Hyperlink                 |
| `<div>`        | Division/container        |
| `<span>`       | Inline container          |
| `<ul>`, `<li>` | List and List Items       |
| `<img>`        | Image                     |




### 3. Attributes
Attributes provide additional information about elements.

```html
<a href="https://example.com" class="link">Click me</a>
<img src="image.jpg" alt="An image">
```

Common Attributes:
- ```href:``` Link destination
- ```src:``` Source (e.g., for images)
- ```alt:``` Alternate text
- ```class:``` Classification for styling or scraping
- ```id:``` Unique identifier

### Practice Exercise
Inspect this simple HTML snippet and answer the questions below:

```html
<div class="quote">
  <span class="text">"Be yourself; everyone else is already taken."</span>
  <span><small class="author">Oscar Wilde</small></span>
  <div class="tags">
    <a class="tag">inspirational</a>
    <a class="tag">life</a>
    <a class="tag">humor</a>
  </div>
</div>
```

#### Questions:
1. What tag contains the quote text?
2. Which class can be used to find the author's name?
3. How many tags are associated with this quote?