# XML File Parser

The `XMLFile` class provides a local file-based interface for reading and writing CIM power system models in XML/RDF format. This is particularly useful for working with small to medium-sized test cases without requiring a database infrastructure.

## Overview

The XML file parser implements the `ConnectionInterface` and provides the following capabilities:

* Read CIM XML/RDF files conforming to IEC 61970-301 (CIM RDF Schema)
* Parse and validate namespace declarations from XML headers
* Build typed property graphs from XML element trees
* Write CIM objects back to XML/RDF format
* Support for both rdf:ID and rdf:about serialization formats
* Automatic namespace extraction and validation


----

## User Guide

### Installation and Setup

The XML parser requires no additional database software. It uses the Python `defusedxml` library for secure XML parsing.

First, set the CIM profile environment variable and import the profile:

In [1]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
import cimgraph.data_profile.cimhub_2023 as cim

### Creating an XMLFile Connection

Import and instantiate the `XMLFile` class:

In [2]:
from cimgraph.databases import XMLFile
file = XMLFile(filename='../../sample_models/ieee13.xml')

File ../../sample_models/ieee13.xml not found. Defaulting to empty network graph


### Constructor Parameters

The `XMLFile.__init__()` method accepts the following parameters:

**Required Parameters:**
* `filename` (str | list[str]): Path to the XML file(s) to read. Can be a single file path or a list of paths for multi-file models.

**Optional Parameters:**
* `namespaces` (dict): Additional namespace mappings to override or supplement extracted namespaces. Format: `{'prefix': 'uri'}`. Default: `None`

**Behavior:**
* Automatically clears cached environment variables to ensure fresh configuration
* Retrieves CIM profile, namespace, IEC 61970-301 version, and validation log level from environment
* Calls `connect()` method automatically to parse the XML file and extract namespaces

### Loading a FeederModel from XML

Once the `XMLFile` connection is established, create a `FeederModel` to build the network graph:

In [3]:
from cimgraph.models import FeederModel
network = FeederModel(container=cim.Feeder(), connection=file)

No root element found in XML file


The complete network model, including both forward and reverse associations, is now loaded into the graph.

To demonstrate the structure, let's examine a breaker object:

In [4]:
from cimgraph import utils
from mermaid import Mermaid

# Get first breaker in graph
breaker = network.first(cim.Breaker)
# Display the breaker
Mermaid(utils.get_mermaid(breaker))

----

## User API Methods

The `XMLFile` class implements the `ConnectionInterface` abstract base class and provides the following user-facing methods:

### connect()

Establishes connection to the XML file and parses the document structure.

**Parameters:** None

**Returns:** None

**Behavior:**
* Parses XML file using `defusedxml.ElementTree.parse()` for security
* Extracts namespace declarations from the XML root element
* Updates internal namespace mappings
* Initializes empty graph structure (`defaultdict(lambda: defaultdict(dict))`)
* Creates empty `class_index` for tracking object types by URI
* If file is not found, logs warning and creates empty graph

**Usage:**
```python
file = XMLFile(filename='model.xml')  # connect() called automatically
```

### disconnect()

Releases resources held by the XML file connection.

**Parameters:** None

**Returns:** None

**Behavior:**
* Deletes the ElementTree (`self.tree`)
* Deletes the root element (`self.root`)
* Deletes the graph dictionary (`self.graph`)

**Usage:**
```python
file.disconnect()
```

### get_object(mRID, graph=None)

Retrieves a single CIM object from the XML file by its mRID.

**Parameters:**
* `mRID` (str): The master resource identifier (UUID) of the object to retrieve
* `graph` (dict): Optional existing graph (not used in current implementation)

**Returns:**
* `object`: The parsed CIM object, or `None` if not found

**Behavior:**
* Iterates through all XML elements in the document
* Matches elements by `rdf:about` or `rdf:ID` attributes
* Returns the first object whose URI contains the specified mRID
* Calls `parse_nodes()` to construct the object

**Usage:**
```python
breaker = file.get_object(mRID='4c04f838-62aa-475e-aefa-a63b7c889c13')
```

### get_from_triple(subject, predicate, graph=None)

Retrieves attribute values for a specific object and predicate.

**Parameters:**
* `subject` (Identity): The CIM object to query
* `predicate` (str): The attribute/association name to retrieve
* `graph` (Graph): Optional existing graph (uses empty graph if not provided)

**Returns:**
* `list[object]`: List of values for the specified attribute

**Behavior:**
* Finds all XML elements matching the subject's class type
* Filters to elements matching the subject's URI
* Extracts child elements matching the predicate
* Parses and returns the values

**Usage:**
```python
breaker = network.first(cim.Breaker)
terminals = file.get_from_triple(breaker, 'Terminals')
```

### create_new_graph(container, graph=None)

Builds the complete typed property graph from the XML file.

**Parameters:**
* `container` (object): The top-level CIM container object (typically a `Feeder`)
* `graph` (dict): Optional existing graph to populate (creates new if `None`)

**Returns:**
* `Graph`: The populated typed property graph

**Behavior:**
* Two-pass parsing approach:
  1. First pass: Create all nodes (objects) using `parse_nodes()`
  2. Second pass: Create all edges (associations) using `parse_edges()`
* Validates all elements against the loaded CIM profile
* Builds bidirectional associations between objects
* Returns empty graph if XML root is `None`

**Usage:**
```python
graph = file.create_new_graph(container=cim.Feeder())
```

**Note:** This method is typically called internally by `FeederModel` or other `GraphModel` subclasses.

### upload(graph)

Writes a typed property graph back to XML/RDF format.

**Parameters:**
* `graph` (Graph): The typed property graph to serialize

**Returns:** None

**Behavior:**
* Creates properly formatted CIM XML/RDF file
* Handles both IEC 61970-301 v7 and v8+ formats:
  - v7: Uses `rdf:ID` and `#` for resources
  - v8+: Uses `rdf:about` with `urn:uuid:` URIs
* Serializes all object attributes and associations
* Handles enumerations, primitives, and object references
* Supports many-to-one and known many-to-many associations
* Preserves CIM units with `rdf:datatype` attributes
* Writes to the file specified in `self.filename`

**Usage:**
```python
# Modify objects in the graph
# ...

# Write back to XML
file.upload(network.graph)
```

### execute(query_message)

Not implemented for XML file parser.

**Parameters:**
* `query_message` (str): Query string (not used)

**Returns:**
* `QueryResponse`: Empty response

**Note:** The XML parser uses direct element tree traversal instead of query-based access.

### create_distributed_graph(area, graph=None)

Not supported for XML file parser.

**Parameters:**
* `area` (object): Geographic area object (not used)
* `graph` (dict): Optional existing graph

**Returns:**
* `Graph`: Empty graph

**Behavior:**
* Logs error message: "distributed models not supported for XML file read"
* Returns empty graph structure

**Note:** Distributed model functionality is only available with database backends.

----

## UML Sequence Diagrams

This section provides sequence diagrams showing the internal workflow of key XMLFile methods.

### XMLFile Initialization and Connection

In [5]:
from mermaid import Mermaid

diagram_text = """%%{init: {"theme":"base"}}%%
sequenceDiagram
    actor User
    participant XMLFile
    participant ElementTree
    participant FileSystem
    
    note right of User: Initialize XML connection
    User ->>+ XMLFile: XMLFile(filename, namespaces)
    XMLFile ->> XMLFile: clear env variable cache
    XMLFile ->> XMLFile: retrieve CIM profile & namespace
    XMLFile ->>+ XMLFile: connect()
    XMLFile ->>+ ElementTree: parse(filename)
    ElementTree ->>+ FileSystem: read XML file
    FileSystem -->>- ElementTree: file contents
    ElementTree -->>- XMLFile: tree & root element
    XMLFile ->>+ FileSystem: open & read header (8KB)
    FileSystem -->>- XMLFile: XML header text
    XMLFile ->> XMLFile: extract_namespaces_from_header()
    XMLFile ->> XMLFile: update namespace mappings
    XMLFile ->> XMLFile: initialize empty graph
    XMLFile ->> XMLFile: initialize class_index
    XMLFile -->>- XMLFile: connection ready
    XMLFile -->>- User: XMLFile instance
"""

Mermaid(diagram_text)

### create_new_graph() - Building the Typed Property Graph

In [6]:
diagram_text = """%%{init: {"theme":"base"}}%%
sequenceDiagram
    actor GraphModel
    participant XMLFile
    participant root element
    
    note right of GraphModel: Build network graph from XML
    GraphModel ->>+ XMLFile: create_new_graph(container, graph)
    
    note over XMLFile,root element: First Pass - Create Nodes
    loop for each element in root
        XMLFile ->>+ XMLFile: parse_nodes(element)
        XMLFile ->> XMLFile: extract class name from tag
        XMLFile ->> XMLFile: extract rdf:about or rdf:ID
        XMLFile ->> XMLFile: create_object(graph, cim_class, uri)
        XMLFile ->> XMLFile: update class_index
        XMLFile -->>- XMLFile: object created
    end
    
    note over XMLFile,root element: Second Pass - Create Edges
    loop for each element in root
        XMLFile ->>+ XMLFile: parse_edges(element)
        XMLFile ->> XMLFile: extract object from graph
        loop for each sub_element
            XMLFile ->> XMLFile: parse_value(sub_element)
            alt rdf:resource present
                XMLFile ->> XMLFile: create_edge() to linked object
                XMLFile ->> XMLFile: create reverse edge
            else text value present
                XMLFile ->> XMLFile: create_value() attribute
            else enumeration
                XMLFile ->> XMLFile: set enum value
            end
        end
        XMLFile -->>- XMLFile: edges created
    end
    
    XMLFile -->>- GraphModel: Graph
"""

Mermaid(diagram_text)

### upload() - Writing Graph to XML/RDF

In [7]:
diagram_text = """%%{init: {"theme":"base"}}%%
sequenceDiagram
    actor User
    participant GraphModel
    participant XMLFile
    participant FileSystem
    
    note right of User: Write modified graph to XML
    User ->>+ GraphModel: upload()
    GraphModel ->>+ XMLFile: upload(graph)
    XMLFile ->>+ FileSystem: open(filename, 'w')
    FileSystem -->>- XMLFile: file handle
    
    XMLFile ->> XMLFile: determine IEC 61970-301 format
    XMLFile ->>+ FileSystem: write XML header & RDF tag
    FileSystem -->>- XMLFile: ok
    
    loop for each class in graph
        loop for each object in class
            XMLFile ->>+ FileSystem: write opening tag with mRID
            FileSystem -->>- XMLFile: ok
            
            loop for each parent class
                loop for each attribute
                    XMLFile ->> XMLFile: get attribute value
                    alt CIM object reference
                        XMLFile ->>+ FileSystem: write rdf:resource edge
                        FileSystem -->>- XMLFile: ok
                    else enumeration
                        XMLFile ->>+ FileSystem: write rdf:resource enum
                        FileSystem -->>- XMLFile: ok
                    else primitive with CIMUnit
                        XMLFile ->>+ FileSystem: write with rdf:datatype
                        FileSystem -->>- XMLFile: ok
                    else primitive value
                        XMLFile ->>+ FileSystem: write text value
                        FileSystem -->>- XMLFile: ok
                    end
                end
            end
            
            XMLFile ->>+ FileSystem: write closing tag
            FileSystem -->>- XMLFile: ok
        end
    end
    
    XMLFile ->>+ FileSystem: write closing RDF tag
    FileSystem -->>- XMLFile: ok
    XMLFile ->>+ FileSystem: close file
    FileSystem -->>- XMLFile: ok
    
    XMLFile -->>- GraphModel: None
    GraphModel -->>- User: upload complete
"""

Mermaid(diagram_text)

----

## Developer Documentation

This section documents the internal methods used by the XML parser implementation. These methods are not typically called directly by users but are essential for understanding the parser's operation.

### extract_namespaces_from_header()

Extracts namespace declarations from the XML root element.

**Parameters:** None

**Returns:**
* `dict`: Dictionary mapping namespace prefixes to URIs (without curly braces)

**Implementation Details:**
* Reads the first 8KB of the XML file to find the root element
* Uses regex pattern `xmlns(?::([a-zA-Z0-9_-]+))?=["']([^"']+)["']` to match namespace declarations
* Captures both prefixed namespaces (`xmlns:cim="..."`) and default namespace (`xmlns="..."`)
* Stores namespaces WITHOUT curly braces for ElementTree compatibility
* Default namespace stored with key `'default'`
* Logs debug information for each namespace found

**Source:** `cimgraph/databases/fileparsers/xml_parser.py:77`

### parse_nodes(element)

Creates CIM objects from XML elements without populating associations.

**Parameters:**
* `element` (xml.etree.ElementTree.Element): The XML element to parse

**Returns:**
* `Identity`: The created CIM object, or `None` if parsing fails

**Implementation Details:**
* Extracts class name from element tag by removing namespace URI
* Tries primary namespace first, then falls back to other registered namespaces
* Validates class name against CIM profile (`self.cim.__all__`)
* Extracts object identifier from `rdf:about` or `rdf:ID` attribute
* Strips URI prefixes (e.g., `urn:uuid:`) to extract UUID
* Calls `create_object()` to instantiate the object and add to graph
* Updates `class_index` dictionary for later edge creation
* Logs validation warnings for classes not in the data profile

**Source:** `cimgraph/databases/fileparsers/xml_parser.py:192`

### parse_edges(element)

Populates associations (edges) between CIM objects after all nodes are created.

**Parameters:**
* `element` (xml.etree.ElementTree.Element): The XML element to parse

**Returns:** None

**Implementation Details:**
* Extracts class name and object identifier from element
* Retrieves the object from the graph using class type and UUID
* Iterates through all child elements (sub-elements)
* Skips `Identity.identifier` elements
* Calls `parse_value()` for each child element to create edges/attributes
* Handles conversion of string URIs to UUID objects
* Logs validation warnings for classes not in the data profile

**Note:** This method must be called after `parse_nodes()` has created all objects.

**Source:** `cimgraph/databases/fileparsers/xml_parser.py:238`

### parse_value(sub_element, cim_class, identifier)

Parses and sets attribute values or creates edges to other objects.

**Parameters:**
* `sub_element` (xml.etree.ElementTree.Element): The XML sub-element containing the value
* `cim_class` (type): The CIM class type of the parent object
* `identifier` (UUID): The UUID of the parent object

**Returns:**
* `object`: The parsed value, edge object, or `None`

**Implementation Details:**
* Extracts attribute name from element tag (e.g., `ACLineSegment.length`)
* Checks for `rdf:datatype` attribute for CIM units
* Checks for `rdf:resource` attribute indicating an edge or enumeration
* Three parsing branches:
  
  **1. Edge to another object (`rdf:resource` to UUID):**
  - Extracts UUID from resource URI
  - Validates object exists in `class_index`
  - Calls `create_edge()` to link objects
  - Creates reverse/inverse edge automatically
  - Logs warning if referenced object not found
  
  **2. Enumeration value (`rdf:resource` to enum):**
  - Detects namespace in URI
  - Parses enum class and value (e.g., `PhaseCode.ABC`)
  - Instantiates enum and sets on parent object
  
  **3. Primitive value (text content):**
  - Extracts text from element
  - Passes `rdf:datatype` if present for CIM units
  - Calls `create_value()` to handle type conversion

**Source:** `cimgraph/databases/fileparsers/xml_parser.py:264`

### parse_node_query(graph, query_output)

Not implemented for XML file parser.

**Note:** This method is used by database backends to parse SPARQL query results.

### get_edges_query(graph, cim_class)

Not implemented for XML file parser.

**Note:** This method is used by database backends to construct SPARQL queries.

### get_all_edges(graph, cim_class)

Not implemented for XML file parser.

**Note:** This method is defined in the `ConnectionInterface` but not used by the XML parser, which builds the complete graph in `create_new_graph()` instead of querying incrementally.

### get_all_attributes(graph, cim_class)

Not implemented for XML file parser.

**Note:** The XML parser loads all attributes during the initial `create_new_graph()` operation.

### edge_query_parser(query_output, graph, cim_class, expand_graph=True)

Not implemented for XML file parser.

**Note:** This method is used by database backends to parse query results for edge expansion.

----

## Common Inherited Methods

The `XMLFile` class inherits several utility methods from the `ConnectionInterface` base class. These methods are used internally but are documented here for completeness.

### check_attribute(cim_class, attribute)

Validates and resolves attribute names including inverse associations.

**Parameters:**
* `cim_class` (type): The CIM class type
* `attribute` (str): The attribute name in format `ClassName.attributeName`

**Returns:**
* `str`: The resolved attribute name, or `None` if not found

**Implementation Details:**
* Splits attribute into class name and link name
* First checks if attribute exists directly on `cim_class`
* If not, checks the source class for the attribute
* Resolves inverse associations using metadata
* Logs validation warnings for missing attributes

**Source:** `cimgraph/databases/__init__.py:80`

### create_object(graph, class_type, uri)

Creates a new CIM object and adds it to the graph.

**Parameters:**
* `graph` (Graph): The typed property graph
* `class_type` (type): The CIM class type (e.g., `cim.ACLineSegment`)
* `uri` (str): The RDF ID or mRID of the object

**Returns:**
* `object`: The created or existing dataclass instance

**Implementation Details:**
* Converts URI string to UUID
* Checks if object already exists in graph
* If exists, returns existing object
* If not, creates new instance and adds to graph
* Handles non-UUID identifiers gracefully

**Source:** `cimgraph/databases/__init__.py:245`

### create_edge(graph, cim_class, identifier, attribute, edge_class, edge_mRID)

Creates an association (edge) between two CIM objects.

**Parameters:**
* `graph` (Graph): The typed property graph
* `cim_class` (type): The source object's class type
* `identifier` (UUID): The source object's UUID
* `attribute` (str): The attribute name for the association
* `edge_class` (type): The target object's class type
* `edge_mRID` (str): The target object's mRID

**Returns:**
* `object`: The edge object, or `None` if creation failed

**Implementation Details:**
* Validates attribute exists using `check_attribute()`
* Determines if attribute is single-valued or list-valued
* For lists: appends to existing list without duplicates
* For single values: sets the attribute directly
* Creates target object if it doesn't exist
* Updates the source object's attribute

**Source:** `cimgraph/databases/__init__.py:216`

### create_value(graph, cim_class, identifier, attribute, value, datatype_uri=None)

Sets a primitive attribute value with proper type conversion.

**Parameters:**
* `graph` (Graph): The typed property graph
* `cim_class` (type): The object's class type
* `identifier` (UUID): The object's UUID
* `attribute` (str): The attribute name
* `value` (str): The string value to convert and set
* `datatype_uri` (str): Optional RDF datatype URI for CIM units

**Returns:**
* `bool|int|float|str|object`: The converted value

**Implementation Details:**
* Validates attribute using `check_attribute()`
* Handles CIM units when `datatype_uri` is provided:
  - Extracts unit class name from URI
  - Parses unit and multiplier (e.g., "MVA" -> "VA" + "M")
  - Creates `CIMUnit` instance with proper conversions
* Type conversions for primitives:
  - `bool`: Converts "true"/"1" to `True`, "false"/"0" to `False`
  - `int`: Converts to integer via float (handles "123.0")
  - `float`: Direct float conversion
  - `list`: Appends without duplicates
  - Other types: Sets as string
* Logs warnings for type conversion failures

**Source:** `cimgraph/databases/__init__.py:110`

### add_to_graph(obj, graph)

Adds an existing CIM object to the graph.

**Parameters:**
* `obj` (object): A dataclass instance inheriting from `Identity`
* `graph` (Graph): The typed property graph

**Returns:** None

**Implementation Details:**
* Creates class type entry in graph if not present
* Adds instance to graph using its identifier as key
* Does not overwrite existing instances

**Source:** `cimgraph/databases/__init__.py:275`

----

## Performance Considerations

### Load Times

Typical load times for standard IEEE test cases:

| Model | Nodes | Branches | Load Time |
|-------|-------|----------|----------|
| IEEE 13-bus | ~100 objects | ~10 lines | < 1 second |
| IEEE 123-bus | ~800 objects | ~100 lines | ~2-3 seconds |
| IEEE 8500-node | ~20,000 objects | ~8,000 lines | ~12 seconds |

### Optimization Strategies

For large models (> 5,000 nodes), consider:

1. **Use a Database Backend**: Blazegraph, GraphDB, or Neo4j provide much faster access
2. **Incremental Loading**: Load only the equipment classes you need using selective queries
3. **Caching**: Save the graph as a Python pickle for repeated use

### Memory Usage

The XML parser loads the entire model into memory. Approximate memory requirements:

* Small models (< 1000 objects): ~10-20 MB
* Medium models (1000-5000 objects): ~50-100 MB
* Large models (> 10,000 objects): ~500 MB+

### Threading Note

The code includes commented-out `ThreadPoolExecutor` implementation for parallel edge parsing (lines 183-186 in source). This may be enabled in future versions for improved performance.

----

## Example: Complete Workflow

This example demonstrates a complete workflow: loading, modifying, and saving a CIM model.

In [8]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
import cimgraph.data_profile.cimhub_2023 as cim
from cimgraph.databases import XMLFile
from cimgraph.models import FeederModel

# 1. Load model from XML
file = XMLFile(filename='../../sample_models/ieee13.xml')
network = FeederModel(container=cim.Feeder(), connection=file)

# 2. Modify the model
for line in network.graph.get(cim.ACLineSegment, {}).values():
    if line.length is not None:
        # Increase all line lengths by 10%
        line.length = line.length * 1.1

# 3. Save modified model
output_file = XMLFile(filename='../../sample_models/ieee13_modified.xml')
output_file.upload(network.graph)

print("Model modified and saved successfully")

File ../../sample_models/ieee13.xml not found. Defaulting to empty network graph
No root element found in XML file
File ../../sample_models/ieee13_modified.xml not found. Defaulting to empty network graph


FileNotFoundError: [Errno 2] No such file or directory: '../../sample_models/ieee13_modified.xml'

----

## Troubleshooting

### Common Issues

**1. File Not Found**
```
File model.xml not found. Defaulting to empty network graph
```
- Verify the file path is correct
- Use absolute paths or paths relative to working directory

**2. Namespace Errors**
```
Unable to parse <Element...>. This may be caused by an invalid namespace
```
- Check that XML file has proper namespace declarations
- Verify CIM profile matches the XML file's CIM version
- Use the `namespaces` parameter to override if needed

**3. Class Not in Profile**
```
ClassName not in data profile
```
- Ensure the correct CIM profile is loaded
- Check if the XML uses a newer/older CIM version than the profile
- Adjust `CIMG_VALIDATION_LOG_LEVEL` to filter warnings

**4. UUID Parsing Warnings**
```
Unable to parse URI. Check the IEC61970-301 serialization
```
- Verify XML follows IEC 61970-301 standard
- Check `rdf:about` and `rdf:ID` format
- May occur with non-standard mRID formats (handled gracefully)

----

## References

* [IEC 61970-301: CIM RDF Schema](https://www.iec.ch/)
* [RDF 1.1 Primer](https://www.w3.org/TR/rdf11-primer/)
* [defusedxml Documentation](https://pypi.org/project/defusedxml/)
* [CIM Profiles Overview](../02_cim_profiles/2_1_profiles_overview.ipynb)
* [FeederModel Usage](../04_graph_models/4_2_feeder_model.ipynb)