Skip to content

DonColon/xmlforge

Repository files navigation

xmlforge

Python Version License

A modern Python library for splitting, transforming, parsing and validating XML files.

Features

  • XML Parsing: Parse XML files and strings with a simple, intuitive API
  • XML Splitting: Split large XML files into smaller, manageable chunks
    • Support for ZIP archives and directory processing
    • Pattern-based file filtering
    • Recursive directory traversal
  • XML Transformation: Transform XML using XSLT or custom transformations
    • Hierarchy Flattening: Flatten nested elements with same tag names
    • Perfect for recursive structures (e.g., Product containing Product)
  • XML Validation: Validate XML against XSD, DTD, and RelaxNG schemas
  • Namespace Handling: Easy namespace addition and removal
  • Type Hints: Full type annotations for better IDE support
  • Well Tested: Comprehensive test suite with high coverage

Installation

Install xmlforge using pip:

pip install xmlforge

For development installation with all tools:

pip install xmlforge[dev]

Quick Start

Parsing XML

from xmlforge import XMLParser

parser = XMLParser()

# Parse from file
tree = parser.parse_file("example.xml")

# Parse from string
xml_string = '<?xml version="1.0"?><root><child>value</child></root>'
element = parser.parse_string(xml_string)

# Convert to dictionary
data = parser.to_dict(element)
print(data)  # {'child': {'@text': 'value'}}

Splitting Large XML Files

from xmlforge import XMLSplitter

splitter = XMLSplitter(target_tag="item", chunk_size=1000)

# Split single file
splitter.split_file("large_file.xml", output_dir="chunks/")

# Split all XML files in a ZIP archive
splitter.split_file("archive.zip", output_dir="chunks/")

# Split files in directory with pattern matching
splitter = XMLSplitter(
    target_tag="product", 
    chunk_size=500,
    pattern="*.xml",
    recursive=True
)
splitter.split_file("data_directory/", output_dir="processed/")

# Or iterate over chunks
for chunk in splitter.split_file("large_file.xml"):
    # Process each chunk
    print(f"Processing chunk with {len(chunk)} items")

Transforming XML

from xmlforge import XMLTransformer

# Transform with XSLT
transformer = XMLTransformer(xslt_file="transform.xslt")
result = transformer.transform_file("input.xml", output_file="output.xml")

# Flatten nested elements with same tag names
transformer = XMLTransformer()

# Example: Flatten recursive Product structures
# Input:  <Product><Product><Product>...</Product></Product></Product>
# Output: <Product id="1"><Product id="2" parent_id="1"><Product id="3" parent_id="2">
from lxml import etree
element = etree.fromstring(nested_xml)
flattened_elements = transformer.flatten_hierarchy(element, "Product")

# Rebuild hierarchy from flattened elements
rebuilt = transformer.rebuild_hierarchy(flattened_elements, "Product")

# Remove namespaces
element = parser.parse_string(xml_with_namespaces)
clean_element = transformer.remove_namespace(element)

Validating XML

from xmlforge import XMLValidator

validator = XMLValidator()

# Validate against XSD
try:
    validator.validate_with_xsd("document.xml", "schema.xsd")
    print("Valid!")
except ValidationError as e:
    print(f"Validation failed: {e}")

# Check if well-formed
is_valid = validator.is_well_formed("document.xml")

Requirements

  • Python 3.9+
  • lxml >= 4.9.0

Development

Setting Up Development Environment

  1. Clone the repository:

    git clone https://github.com/DonColon/xmlforge.git
    cd xmlforge
  2. Create a virtual environment and install dependencies:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -e ".[dev]"
  3. Install pre-commit hooks:

    pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=xmlforge

# Run specific test file
pytest tests/test_parser.py

Code Quality

# Format code
black src tests
isort src tests

# Lint code
flake8 src tests

# Type checking
mypy src

# Run all checks with tox
tox

Project Structure

xmlforge/
├── src/
│   └── xmlforge/
│       ├── __init__.py
│       ├── parser.py
│       ├── splitter.py
│       ├── transformer.py
│       └── validator.py
├── tests/
│   ├── __init__.py
│   ├── test_parser.py
│   ├── test_splitter.py
│   ├── test_transformer.py
│   └── test_validator.py
├── docs/
├── examples/
├── .github/
│   └── workflows/
├── .editorconfig
├── .flake8
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── setup.py
└── tox.ini

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a list of changes.

Support

Acknowledgments

Built with lxml - a powerful and Pythonic XML processing library.

About

A python library for splitting, transforming, parsing and validation xml files

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •