# Blazegraph Database

[Blazegraph](https://blazegraph.com/) is a high-performance, open-source RDF graph database (triplestore) that supports Blueprints and RDF/SPARQL APIs. It provides excellent scalability (up to 50 billion edges on a single machine) and has been commercialized as [AWS Neptune](https://aws.amazon.com/neptune/).

CIMantic Graphs provides the `BlazegraphConnection` class for seamless integration with Blazegraph databases, making it one of the recommended backends for large-scale CIM power system models.

## Overview

**Key Features:**
* High-performance SPARQL query execution
* Excellent scalability for large distribution models (IEEE 8500+ nodes)
* Fast bulk loading of RDF data
* Built-in web-based SPARQL query interface
* Support for multiple namespace enumeration parsing
* Docker images available with IEEE test feeders pre-loaded

**Best For:**
* Large distribution feeder models (1000+ nodes)
* Production deployments requiring fast query performance
* Applications using GridAPPS-D platform
* Development and testing with IEEE standard test cases

**Performance:**
* IEEE 8500-node model: ~10-15 seconds to load base topology
* Parallel query execution with batching
* Efficient graph traversal and expansion

----

## Installation and Setup

### Docker Installation (Recommended)

The easiest way to get started with Blazegraph is using Docker. GridAPPS-D provides pre-configured Docker images with IEEE test feeders already loaded.

**For CIM17 / RC4_2021 profile (GridAPPS-D v2021-v2024):**
```bash
docker pull gridappsd/blazegraph:v2024.09.0
docker run -p 8889:8080 gridappsd/blazegraph:v2024.09.0
```

**For CIM100 / CIMHub_2023 profile (GridAPPS-D v2025+):**
```bash
docker pull gridappsd/blazegraph:v2025.01.0
docker run -p 8889:8080 gridappsd/blazegraph:v2025.01.0
```

Available tags: https://hub.docker.com/r/gridappsd/blazegraph/tags

### Standalone Installation

For standalone installation:

1. Download Blazegraph JAR from https://github.com/blazegraph/database/releases
2. Start the server:
   ```bash
   java -server -Xmx4g -jar blazegraph.jar
   ```
3. Access web interface at http://localhost:9999/blazegraph/

### Python Dependencies

The `BlazegraphConnection` class requires the `SPARQLWrapper` library:

```bash
pip install SPARQLWrapper
```

This is installed automatically when you install CIMantic Graphs.

----

## Environment Configuration

### For GridAPPS-D / Blazegraph v2021-v2024 (CIM17)

If using GridAPPS-D Docker images with tags between `v2021.01.0` and `v2024.09.0`:

In [None]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'rc4_2021'
os.environ['CIMG_URL'] = 'http://localhost:8889/bigdata/namespace/kb/sparql'
os.environ['CIMG_IEC61970_301'] = '7'

import cimgraph.data_profile.rc4_2021 as cim

### For GridAPPS-D / Blazegraph v2025+ (CIM100)

If using GridAPPS-D Docker images with tag `v2025.01.0` or later:

In [None]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
os.environ['CIMG_URL'] = 'http://localhost:8889/bigdata/namespace/kb/sparql'
os.environ['CIMG_IEC61970_301'] = '8'

import cimgraph.data_profile.cimhub_2023 as cim

### For Standalone Blazegraph

If running standalone Blazegraph (default port 9999):

In [None]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
os.environ['CIMG_URL'] = 'http://localhost:9999/blazegraph/sparql'
os.environ['CIMG_IEC61970_301'] = '8'

import cimgraph.data_profile.cimhub_2023 as cim

----

## Creating a Connection

Import and instantiate the `BlazegraphConnection` class:

In [None]:
from cimgraph.databases import BlazegraphConnection

# Create connection (automatically connects to database)
db = BlazegraphConnection()

The connection is established automatically when the object is instantiated. It retrieves the SPARQL endpoint URL from the `CIMG_URL` environment variable.

----

## Blazegraph-Specific Features

The `BlazegraphConnection` class extends `SPARQLEndpointConnection` with Blazegraph-specific implementations and features.

### Constructor

**`BlazegraphConnection.__init__()`**

Creates a new Blazegraph database connection.

**Parameters:** None (uses environment variables)

**Behavior:**
* Calls parent `SPARQLEndpointConnection.__init__()` to initialize base attributes
* Retrieves SPARQL endpoint URL from `CIMG_URL` environment variable via `get_url()`
* Configures multiple namespace support:
  - Primary namespace from CIM profile
  - Additional namespace: `http://epri.com/gmdm/2025#` for EPRI GMDM enumerations
* Automatically calls `connect()` to establish database connection

**Source:** `cimgraph/databases/blazegraph/blazegraph.py:22`

### Database-Specific Method Implementations

Blazegraph implements the four required abstract methods from `SPARQLEndpointConnection`:

#### _setup_connection()

Initializes the SPARQLWrapper connection object for Blazegraph.

**Implementation:**
* Creates `SPARQLWrapper` instance with Blazegraph SPARQL endpoint URL
* Sets return format to JSON for standardized response parsing
* Stores connection object in `self.connection_obj`

**Source:** `cimgraph/databases/blazegraph/blazegraph.py:33`

#### _execute_raw_query(query_message)

Executes a SPARQL query and returns raw JSON results.

**Parameters:**
* `query_message` (str): The SPARQL query to execute

**Returns:**
* `dict`: Query results in SPARQL JSON format

**Implementation:**
* Sets query on SPARQLWrapper connection object
* Uses POST method for query submission
* Converts response to Python dictionary
* Returns standardized SPARQL JSON results

**Source:** `cimgraph/databases/blazegraph/blazegraph.py:38`

#### _parse_result_field(result, field_name)

Extracts field values from SPARQLWrapper query results.

**Parameters:**
* `result` (dict): A single result binding from query results
* `field_name` (str): The name of the field to extract

**Returns:**
* `str`: The field value, or `None` if field doesn't exist

**Implementation:**
* SPARQLWrapper returns results as nested dictionaries: `result[field_name]['value']`
* Checks if field exists in result binding
* Returns value or None

**Source:** `cimgraph/databases/blazegraph/blazegraph.py:44`

#### _update_raw(update_message)

Executes a SPARQL UPDATE statement.

**Parameters:**
* `update_message` (str): The SPARQL update to execute

**Returns:**
* `str`: Response from the database

**Implementation:**
* Sets update query on SPARQLWrapper connection object
* Uses POST method for update submission
* Returns database response

**Source:** `cimgraph/databases/blazegraph/blazegraph.py:50`

#### _get_namespaces()

Returns list of namespaces for enumeration parsing (optional override).

**Returns:**
* `list[str]`: List of namespace URIs

**Implementation:**
* Returns `self.namespaces` list configured in `__init__()`
* Includes both CIM profile namespace and EPRI GMDM namespace
* Enables parsing of enumerations from multiple namespace sources

**Source:** `cimgraph/databases/blazegraph/blazegraph.py:56`

----

## Common SPARQL Endpoint Methods

**Important:** `BlazegraphConnection` inherits all standard methods from `SPARQLEndpointConnection`. These methods are documented in the [Databases Overview - Common SPARQL Endpoint Methods](3_1_databases_overview.ipynb#Common-SPARQL-Endpoint-Methods) section.

**Inherited Methods:**
* **Connection Management:** `connect()`, `disconnect()`
* **Query Execution:** `execute()`, `update()`
* **Object Retrieval:** `get_object()`, `get_from_triple()`
* **Graph Creation:** `create_new_graph()`, `create_distributed_graph()`, `build_graph_from_list()`
* **Graph Expansion:** `get_all_edges()`, `get_all_attributes()`, `get_edges_query()`
* **Query Parsing:** `parse_node_query()`, `edge_query_parser()`
* **Data Upload:** `upload()`

Refer to the Databases Overview for complete documentation, usage examples, and parameter details for these methods.

----

## Usage Examples

### Example 1: Loading a Feeder Model

In [None]:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
os.environ['CIMG_URL'] = 'http://localhost:8889/bigdata/namespace/kb/sparql'
import cimgraph.data_profile.cimhub_2023 as cim

from cimgraph.databases import BlazegraphConnection
from cimgraph.models import FeederModel

# Connect to Blazegraph
db = BlazegraphConnection()

# Load IEEE 13-bus feeder
feeder = cim.Feeder(mRID="49AD8E07-3BF9-A4E2-CB8F-C3722F837B62")
network = FeederModel(container=feeder, connection=db)

print(f"Loaded {len(network.graph[cim.ACLineSegment])} line segments")

### Example 2: Retrieving an Object by mRID

In [None]:
# Retrieve a feeder object directly from database
feeder = db.get_object(mRID="49AD8E07-3BF9-A4E2-CB8F-C3722F837B62")
feeder.pprint()

**Output:**
```json
{
    "@id": "49ad8e07-3bf9-a4e2-cb8f-c3722f837b62",
    "@type": "Feeder"
}
```

### Example 3: Querying Object Attributes

In [None]:
# Get feeder name (string attribute)
name = db.get_from_triple(subject=feeder, predicate='IdentifiedObject.name')
print(f"Feeder name: {name}")

# Get associated substation (object reference)
substation = db.get_from_triple(subject=feeder, predicate='Feeder.NormalEnergizingSubstation')
print(f"Substation: {substation}")

### Example 4: Executing Custom SPARQL Query

In [None]:
# Custom SPARQL query to find all feeders
query_text = '''
PREFIX r:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cim:  <http://iec.ch/TC57/CIM100#>
SELECT DISTINCT ?identifier ?name
WHERE {
    ?feeder r:type cim:Feeder .
    BIND(strafter(str(?feeder),"urn:uuid:") as ?identifier)
    OPTIONAL { ?feeder cim:IdentifiedObject.name ?name . }
}
ORDER BY ?name
'''

results = db.execute(query_text)

# Parse results
print("Available Feeders:")
for result in results['results']['bindings']:
    mrid = result['identifier']['value']
    name = result.get('name', {}).get('value', 'Unnamed')
    print(f"  - {name}: {mrid}")

### Example 5: Expanding Graph with Specific Classes

In [None]:
# Load base topology
feeder = cim.Feeder(mRID="49AD8E07-3BF9-A4E2-CB8F-C3722F837B62")
network = FeederModel(container=feeder, connection=db)

# Expand breakers with all attributes
network.get_all_edges(cim.Breaker)

# Expand line segments
network.get_all_edges(cim.ACLineSegment)

# Now all breakers and lines have complete information
breaker = network.first(cim.Breaker)
print(f"Breaker: {breaker.name}")
print(f"Rated Current: {breaker.ratedCurrent}")

### Example 6: Debugging SPARQL Queries

In [None]:
# Get the SPARQL query that would be executed
query_text = db.get_edges_query(network.graph, cim.Breaker)
print("SPARQL Query for Breaker expansion:")
print(query_text)

# Copy this query to Blazegraph web interface for debugging

----

## About Triplestore Databases

The triple-store database offers a semantic solution for data management. Unlike relational databases (which require a DDL database schema), the triple-store database structure is comprised of Resource Description Framework (RDF) statements.

### RDF Triple Structure

RDF statements take the form of **subject (node) - predicate (relation) - object (node)** that can be dynamically generated to form inter-related complex class structures.

**Example CIM Triple:**
```
Subject:   ACLineSegment (urn:uuid:1234...)
Predicate: ACLineSegment.length
Object:    105 meters
```

### Advantages for CIM

The RDF statement structure intuitively corresponds to the structure of object-attribute specifications used in CIM. Key advantages include:

* **Direct CIM Support:** CIM models translate directly to RDF with automated data correlation
* **Polymorphism:** RDF Schema (RDFS) supports inheritance and class hierarchies
* **Graph Constructs:** Native support for complex graph structures and subgraphs
* **Standards-Based:** Uses mature, standardized languages (RDF, RDFS, OWL, SPARQL)
* **Validation:** Supports type-checking and SHACL (Shapes Constraint Language)
* **Agility:** Dynamic data structures support multiple developers and evolving schemas

### Serialization Formats

RDF can be serialized in multiple formats:
* **XML/RDF** - IEC 61970-301 standard for CIM
* **Turtle (TTL)** - Human-readable format
* **JSON-LD** - JSON-based RDF for web applications
* **N-Triples** - Simple line-based format

### Data Management Considerations

**Advantages:**
* No schema migration required for model changes
* Easy integration of multiple data sources
* Standardized query language (SPARQL)
* Support for reasoning and inference (with OWL)

**Challenges:**
* Risk of "garbage-in-garbage-out" without data governance
* Potential for dangling references without validation
* Requires rigor from data contributors
* Need for well-thought-out data management strategy

----

## UML Sequence Diagrams

This section contains UML sequence diagrams explaining how the `BlazegraphConnection` class executes database queries and API calls. The diagrams are rendered from flat text using mermaid.js.

### Blazegraph Connection Initialization

In [None]:
from mermaid import Mermaid

with open('./images/3_3_connect.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### get_object() - Retrieving Object by mRID

In [None]:
with open('./images/3_3_get_object.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### get_from_triple() - Querying Object Attributes

In [None]:
with open('./images/3_3_get_from_triple.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### execute() - Executing SPARQL Query

In [None]:
with open('./images/3_3_execute.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### update() - Executing SPARQL Update

In [None]:
with open('./images/3_3_update.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### create_new_graph() - Building Feeder Topology

In [None]:
with open('./images/3_3_create_new_graph.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### create_distributed_graph() - Building Distributed Model

In [None]:
with open('./images/3_3_create_distributed_graph.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### get_all_edges() - Parallel Graph Expansion

In [None]:
with open('./images/3_3_get_all_edges.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### get_edges_query() - Query Debugging

In [None]:
with open('./images/3_3_get_edges_query.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### parse_node_query() - Parsing Network Topology

In [None]:
with open('./images/3_3_parse_node_query.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

### edge_query_parser() - Parsing Edge Query Results

In [None]:
with open('./images/3_3_parse_edges_query.txt', 'r') as diagram:
    diagram_text = diagram.read()
Mermaid(diagram_text)

----

## Blazegraph Web Interface

Blazegraph provides a built-in web-based SPARQL query interface for interactive querying and debugging.

**Access:**
* GridAPPS-D Docker: http://localhost:8889/bigdata/
* Standalone: http://localhost:9999/blazegraph/

**Features:**
* SPARQL query editor with syntax highlighting
* Query execution with result display
* Namespace management
* Database statistics and monitoring
* Data loading interface (bulk load RDF files)

**Usage Tip:** Copy SPARQL queries from `get_edges_query()` into the web interface to test and debug query performance.

----

## Performance Optimization

### Memory Configuration

For large models, increase JVM heap size:
```bash
java -server -Xmx8g -jar blazegraph.jar
```

### Query Batching

CIMantic Graphs automatically batches queries in groups of 100 objects. This is optimized for Blazegraph's query processing.

### Parallel Execution

The `get_all_edges()` method uses parallel query execution with `ThreadPoolExecutor` for maximum performance on multi-core systems.

### Indexing

Blazegraph automatically indexes RDF triples for fast query execution. No manual index configuration required.

----

## Troubleshooting

### Connection Refused
```
ConnectionRefusedError: [Errno 111] Connection refused
```
* Verify Blazegraph is running
* Check `CIMG_URL` environment variable
* Ensure correct port (8889 for Docker, 9999 for standalone)

### Query Timeout
```
TimeoutError: Query execution exceeded timeout
```
* Increase JVM heap size for Blazegraph
* Reduce batch size for large models
* Check Blazegraph web interface for slow queries

### Empty Results
* Verify data is loaded into Blazegraph
* Check namespace matches CIM profile
* Use Blazegraph web interface to verify data exists

### Profile Mismatch
```
Class XYZ not in data profile
```
* Verify `CIMG_CIM_PROFILE` matches database content
* Check Docker image version corresponds to correct profile

----

## Loading Data into Blazegraph

### Via Web Interface

1. Navigate to http://localhost:8889/bigdata/
2. Click "Update" tab
3. Select "File Path or URL"
4. Choose RDF file format (RDF/XML for CIM)
5. Click "Update" to load data

### Via CIMantic Graphs

Load from XML file and upload to Blazegraph:

In [None]:
from cimgraph.databases import XMLFile, BlazegraphConnection
from cimgraph.models import FeederModel

# Load from XML
xml_file = XMLFile(filename='../../sample_models/ieee13.xml')
network = FeederModel(container=cim.Feeder(), connection=xml_file)

# Upload to Blazegraph
db = BlazegraphConnection()
db.upload(network.graph)

### Via REST API

Use curl to bulk load RDF files:
```bash
curl -X POST \
  -H 'Content-Type: application/rdf+xml' \
  --data-binary @model.xml \
  http://localhost:8889/bigdata/namespace/kb/sparql
```

----

## References

* [Blazegraph Official Website](https://blazegraph.com/)
* [Blazegraph GitHub](https://github.com/blazegraph/database)
* [GridAPPS-D Blazegraph Docker Images](https://hub.docker.com/r/gridappsd/blazegraph/tags)
* [AWS Neptune (Commercial Blazegraph)](https://aws.amazon.com/neptune/)
* [SPARQL 1.1 Specification](https://www.w3.org/TR/sparql11-query/)
* [RDF 1.1 Primer](https://www.w3.org/TR/rdf11-primer/)
* [Common SPARQL Methods](3_1_databases_overview.ipynb#Common-SPARQL-Endpoint-Methods)
* [FeederModel Usage](../04_graph_models/4_2_feeder_model.ipynb)