This Python web scraper fetches a webpage and outputs the entire HTML structure as formatted JSON.
It recursively extracts every element, tag, attribute, and text node, providing a faithful JSON representation of the DOM tree.
- Fetches any page via URL
- Recursively parses all HTML elements, attributes, and text
- Outputs structured JSON representing the full DOM
- Handles errors gracefully
- Python 3.6+
- Required libraries:
requestsbeautifulsoup4
Install dependencies with:
pip3 install requests beautifulsoup4