A Python tool for converting HTML files to XML format with specific tag extraction.
- Extracts paragraphs containing annotation tags
- Supports different encodings
- Pretty print XML output
- Auto-generated output filenames
To get started, clone the repository and install the required dependencies:
git clone <repository-url>
cd html-to-xml-parser
pip install -r requirements.txt
python src/main.py <html_file> [output_file]
To ensure everything is working correctly, you can run the unit tests included in the project:
python -m unittest discover -s tests
Contributions are welcome! Please feel free to submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for more details.