# HTML Tag Validator

## Problem Description

You're building an HTML validator that checks if HTML tags are properly nested and closed. Given a string containing HTML code, determine if the HTML structure is valid.

For this problem:
1. We only care about the HTML tags (elements enclosed in angle brackets like `<div>`, `</p>`, etc.)
2. Every opening tag must have a corresponding closing tag of the same type
3. Tags must be properly nested - if tag2 opens inside tag1, then tag2 must close before tag1 closes
4. Self-closing tags like `<img />` or `<br/>` are considered valid

**Examples:**
- `"<div><p>Hello</p></div>"` → `true` (properly nested and closed)
- `"<div><p>Hello</div></p>"` → `false` (improperly nested)
- `"<div>Hello"` → `false` (unclosed tag)
- `"</div>"` → `false` (closing tag without opening tag)
- `"<div><br/><p>Text</p></div>"` → `true` (valid with self-closing tag)

## Approach: Using a Stack

This problem is similar to the parentheses validator but with some added complexity. We'll use a stack to keep track of opened tags and ensure they're closed in the correct order:

1. Parse the HTML string to identify tags
2. When we find an opening tag, push it onto a stack
3. When we find a closing tag, check if it matches the most recent opening tag on the stack
4. If it's a self-closing tag, no need to push to stack
5. After processing the entire string, check if the stack is empty (all tags properly closed)

Let's implement this solution!

In [None]:
import re

def is_valid_html(html):
    """
    Function to check if HTML tags are properly nested and closed
    
    Args:
        html (str): A string containing HTML code
        
    Returns:
        bool: True if the HTML structure is valid, False otherwise
    """
    # Stack to store opening tags
    stack = []
    
    # Use regex to find all HTML tags
    tag_pattern = r'<\/?[\w-]+\s*\/?>'
    tags = re.findall(tag_pattern, html)
    
    for tag in tags:
        # Check if it's a self-closing tag (like <br/>, <img/>)
        if tag.endswith('/>'):
            continue
        
        # Check if it's a closing tag
        elif tag.startswith('</'):
            # If stack is empty, we have a closing tag without an opening tag
            if not stack:
                return False
            
            # Get the tag name (remove </ and >)
            closing_tag = tag[2:-1]
            
            # Compare with the last opening tag
            if stack[-1] != closing_tag:
                return False
            
            # Valid closing tag, pop from stack
            stack.pop()
            
        # It's an opening tag
        else:
            # Get the tag name (remove < and >)
            opening_tag = tag[1:-1]
            
            # Push to stack
            stack.append(opening_tag)
    
    # If stack is empty, all tags were properly closed
    return len(stack) == 0

In [None]:
# Test cases
test_cases = [
    ("<div><p>Hello</p></div>", True),                           # Basic nested tags
    ("<div><p>Hello</div></p>", False),                          # Improperly nested
    ("<div>Hello", False),                                       # Unclosed tag
    ("</div>", False),                                           # Closing tag without opening
    ("<div><br/><p>Text</p></div>", True),                       # With self-closing tag
    ("<div class='container'><p>Content</p></div>", True),       # Tag with attributes
    ("<div><p><span>Deeply</span> nested</p></div>", True),      # Multiple nested levels
    ("<div><p></p><p></p></div>", True),                         # Multiple sibling tags
    ("<div><img src='image.jpg'/><p>Caption</p></div>", True),   # Self-closing with attributes
    ("<div><p>Unclosed paragraph</div>", False),                 # Missing closing tag
    ("This is <b>bold</b> and <i>italic</i> text", True),        # Text with formatting
    ("<h1>Title</h2>", False),                                   # Mismatched tags
    ("<div><p attr='value' /></div>", True)                      # Self-closing with attributes
]

# Run the test cases
for i, (html, expected) in enumerate(test_cases):
    result = is_valid_html(html)
    status = "✅ PASS" if result == expected else "❌ FAIL"
    print(f"Test #{i+1}: Expected: {expected}, Got: {result} {status}")
    if result != expected:
        print(f"  HTML: {html}")
    print("")

## Explanation of the Solution

The HTML tag validator uses a **stack-based approach** similar to the parentheses matching problem but with more complexity since we need to handle HTML tag names and attributes.

### Key Components:

1. **Regular Expression Pattern**:
   - The pattern `r'<(/?)([a-zA-Z][a-zA-Z0-9]*)([^>]*?)(/?)>'` captures different parts of HTML tags
   - Group 1 `(/?)`: Captures the forward slash if it's a closing tag
   - Group 2 `([a-zA-Z][a-zA-Z0-9]*)`: Captures the tag name
   - Group 3 `([^>]*?)`: Captures any attributes 
   - Group 4 `(/?)`: Captures the self-closing forward slash if present

2. **Stack Operations**:
   - When an opening tag is found, we push its name onto the stack
   - When a closing tag is found, we check if it matches the tag at the top of the stack
   - Self-closing tags are automatically valid and don't need to be pushed onto the stack

3. **Validation Rules**:
   - All opening tags must have matching closing tags
   - Tags must be properly nested (last opened, first closed)
   - Closing tags without matching opening tags are invalid
   - After processing all tags, the stack should be empty

### Time and Space Complexity:

- **Time Complexity**: O(n) where n is the length of the HTML string
  - We iterate through the HTML string once to find all the tags
  - Each stack operation (push/pop) is O(1)

- **Space Complexity**: O(m) where m is the maximum nesting depth of tags
  - In the worst case, we would need to store all opening tags on the stack

### Edge Cases Handled:

- Self-closing tags
- Tags with attributes
- Unclosed tags
- Improperly nested tags
- Closing tags without opening tags

This solution demonstrates how stack data structures are particularly useful for validating nested structures like HTML, XML, or JSON.