xmlforge

A modern Python library for splitting, transforming, parsing and validating XML files.

Features

XML Parsing: Parse XML files and strings with a simple, intuitive API
XML Splitting: Split large XML files into smaller, manageable chunks
- Support for ZIP archives and directory processing
- Pattern-based file filtering
- Recursive directory traversal
XML Transformation: Transform XML using XSLT or custom transformations
- Hierarchy Flattening: Flatten nested elements with same tag names
- Perfect for recursive structures (e.g., Product containing Product)
XML Validation: Validate XML against XSD, DTD, and RelaxNG schemas
Namespace Handling: Easy namespace addition and removal
Type Hints: Full type annotations for better IDE support
Well Tested: Comprehensive test suite with high coverage

Installation

Install xmlforge using pip:

pip install xmlforge

For development installation with all tools:

pip install xmlforge[dev]

Quick Start

Parsing XML

from xmlforge import XMLParser

parser = XMLParser()

# Parse from file
tree = parser.parse_file("example.xml")

# Parse from string
xml_string = '<?xml version="1.0"?><root><child>value</child></root>'
element = parser.parse_string(xml_string)

# Convert to dictionary
data = parser.to_dict(element)
print(data)  # {'child': {'@text': 'value'}}

Splitting Large XML Files

from xmlforge import XMLSplitter

splitter = XMLSplitter(target_tag="item", chunk_size=1000)

# Split single file
splitter.split_file("large_file.xml", output_dir="chunks/")

# Split all XML files in a ZIP archive
splitter.split_file("archive.zip", output_dir="chunks/")

# Split files in directory with pattern matching
splitter = XMLSplitter(
    target_tag="product", 
    chunk_size=500,
    pattern="*.xml",
    recursive=True
)
splitter.split_file("data_directory/", output_dir="processed/")

# Or iterate over chunks
for chunk in splitter.split_file("large_file.xml"):
    # Process each chunk
    print(f"Processing chunk with {len(chunk)} items")

Transforming XML

from xmlforge import XMLTransformer

# Transform with XSLT
transformer = XMLTransformer(xslt_file="transform.xslt")
result = transformer.transform_file("input.xml", output_file="output.xml")

# Flatten nested elements with same tag names
transformer = XMLTransformer()

# Example: Flatten recursive Product structures
# Input:  <Product><Product><Product>...</Product></Product></Product>
# Output: <Product id="1"><Product id="2" parent_id="1"><Product id="3" parent_id="2">
from lxml import etree
element = etree.fromstring(nested_xml)
flattened_elements = transformer.flatten_hierarchy(element, "Product")

# Rebuild hierarchy from flattened elements
rebuilt = transformer.rebuild_hierarchy(flattened_elements, "Product")

# Remove namespaces
element = parser.parse_string(xml_with_namespaces)
clean_element = transformer.remove_namespace(element)

Validating XML

from xmlforge import XMLValidator

validator = XMLValidator()

# Validate against XSD
try:
    validator.validate_with_xsd("document.xml", "schema.xsd")
    print("Valid!")
except ValidationError as e:
    print(f"Validation failed: {e}")

# Check if well-formed
is_valid = validator.is_well_formed("document.xml")

Requirements

Python 3.9+
lxml >= 4.9.0

Development

Setting Up Development Environment

Clone the repository:

git clone https://github.com/DonColon/xmlforge.git
cd xmlforge

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"

Install pre-commit hooks:
```
pre-commit install
```

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=xmlforge

# Run specific test file
pytest tests/test_parser.py

Code Quality

# Format code
black src tests
isort src tests

# Lint code
flake8 src tests

# Type checking
mypy src

# Run all checks with tox
tox

Project Structure

xmlforge/
├── src/
│   └── xmlforge/
│       ├── __init__.py
│       ├── parser.py
│       ├── splitter.py
│       ├── transformer.py
│       └── validator.py
├── tests/
│   ├── __init__.py
│   ├── test_parser.py
│   ├── test_splitter.py
│   ├── test_transformer.py
│   └── test_validator.py
├── docs/
├── examples/
├── .github/
│   └── workflows/
├── .editorconfig
├── .flake8
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── setup.py
└── tox.ini

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a list of changes.

Support

Issues: GitHub Issues
Documentation: GitHub Repository

Acknowledgments

Built with lxml - a powerful and Pythonic XML processing library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

xmlforge

Features

Installation

Quick Start

Parsing XML

Splitting Large XML Files

Transforming XML

Validating XML

Requirements

Development

Setting Up Development Environment

Running Tests

Code Quality

Project Structure

Contributing

License

Changelog

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/xmlforge		src/xmlforge
tests		tests
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

License

DonColon/xmlforge

Folders and files

Latest commit

History

Repository files navigation

xmlforge

Features

Installation

Quick Start

Parsing XML

Splitting Large XML Files

Transforming XML

Validating XML

Requirements

Development

Setting Up Development Environment

Running Tests

Code Quality

Project Structure

Contributing

License

Changelog

Support

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages