A modern Python library for splitting, transforming, parsing and validating XML files.
- XML Parsing: Parse XML files and strings with a simple, intuitive API
- XML Splitting: Split large XML files into smaller, manageable chunks
- Support for ZIP archives and directory processing
- Pattern-based file filtering
- Recursive directory traversal
- XML Transformation: Transform XML using XSLT or custom transformations
- Hierarchy Flattening: Flatten nested elements with same tag names
- Perfect for recursive structures (e.g., Product containing Product)
- XML Validation: Validate XML against XSD, DTD, and RelaxNG schemas
- Namespace Handling: Easy namespace addition and removal
- Type Hints: Full type annotations for better IDE support
- Well Tested: Comprehensive test suite with high coverage
Install xmlforge using pip:
pip install xmlforge
For development installation with all tools:
pip install xmlforge[dev]
from xmlforge import XMLParser
parser = XMLParser()
# Parse from file
tree = parser.parse_file("example.xml")
# Parse from string
xml_string = '<?xml version="1.0"?><root><child>value</child></root>'
element = parser.parse_string(xml_string)
# Convert to dictionary
data = parser.to_dict(element)
print(data) # {'child': {'@text': 'value'}}
from xmlforge import XMLSplitter
splitter = XMLSplitter(target_tag="item", chunk_size=1000)
# Split single file
splitter.split_file("large_file.xml", output_dir="chunks/")
# Split all XML files in a ZIP archive
splitter.split_file("archive.zip", output_dir="chunks/")
# Split files in directory with pattern matching
splitter = XMLSplitter(
target_tag="product",
chunk_size=500,
pattern="*.xml",
recursive=True
)
splitter.split_file("data_directory/", output_dir="processed/")
# Or iterate over chunks
for chunk in splitter.split_file("large_file.xml"):
# Process each chunk
print(f"Processing chunk with {len(chunk)} items")
from xmlforge import XMLTransformer
# Transform with XSLT
transformer = XMLTransformer(xslt_file="transform.xslt")
result = transformer.transform_file("input.xml", output_file="output.xml")
# Flatten nested elements with same tag names
transformer = XMLTransformer()
# Example: Flatten recursive Product structures
# Input: <Product><Product><Product>...</Product></Product></Product>
# Output: <Product id="1"><Product id="2" parent_id="1"><Product id="3" parent_id="2">
from lxml import etree
element = etree.fromstring(nested_xml)
flattened_elements = transformer.flatten_hierarchy(element, "Product")
# Rebuild hierarchy from flattened elements
rebuilt = transformer.rebuild_hierarchy(flattened_elements, "Product")
# Remove namespaces
element = parser.parse_string(xml_with_namespaces)
clean_element = transformer.remove_namespace(element)
from xmlforge import XMLValidator
validator = XMLValidator()
# Validate against XSD
try:
validator.validate_with_xsd("document.xml", "schema.xsd")
print("Valid!")
except ValidationError as e:
print(f"Validation failed: {e}")
# Check if well-formed
is_valid = validator.is_well_formed("document.xml")
- Python 3.9+
- lxml >= 4.9.0
-
Clone the repository:
git clone https://github.com/DonColon/xmlforge.git cd xmlforge
-
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e ".[dev]"
-
Install pre-commit hooks:
pre-commit install
# Run all tests
pytest
# Run with coverage
pytest --cov=xmlforge
# Run specific test file
pytest tests/test_parser.py
# Format code
black src tests
isort src tests
# Lint code
flake8 src tests
# Type checking
mypy src
# Run all checks with tox
tox
xmlforge/
├── src/
│ └── xmlforge/
│ ├── __init__.py
│ ├── parser.py
│ ├── splitter.py
│ ├── transformer.py
│ └── validator.py
├── tests/
│ ├── __init__.py
│ ├── test_parser.py
│ ├── test_splitter.py
│ ├── test_transformer.py
│ └── test_validator.py
├── docs/
├── examples/
├── .github/
│ └── workflows/
├── .editorconfig
├── .flake8
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── setup.py
└── tox.ini
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
See CHANGELOG.md for a list of changes.
- Issues: GitHub Issues
- Documentation: GitHub Repository
Built with lxml - a powerful and Pythonic XML processing library.