# Wikidata Statement Nanopublication Generator

This notebook generates nanopublications using Wikidata vocabulary from a JSON configuration file.

## Wikidata Statements
These nanopublications express structured statements using Wikidata entities and properties:
- **Subject**: A Wikidata entity or external URI
- **Property**: A Wikidata property (e.g., P31 = instance of)
- **Object**: A Wikidata entity or literal value

In [27]:
import json
import sys
from pathlib import Path

# Add parent directory to path for imports
sys.path.insert(0, str(Path.cwd().parent))

from nanopub_utils import (
    NanopubGenerator, load_config, save_nanopub,
    make_uri, make_literal, validate_required_fields,
    PREFIXES
)

In [28]:
class WikidataNanopubGenerator(NanopubGenerator):
    """Generator for Wikidata statement nanopublications."""
    
    def __init__(self, config: dict, nanopub_config: dict):
        # Merge metadata with individual nanopub config
        merged_config = {
            **config.get('metadata', {}),
            **nanopub_config,
            'template_uri': config.get('template_uri'),
            'label': nanopub_config.get('nanopub_label', 'Wikidata Statement')
        }
        super().__init__(merged_config)
        self.add_prefix('wd')
        self.add_prefix('wdt')
    
    def _format_entity(self, entity: dict) -> str:
        """Format a Wikidata entity or external URI."""
        if 'uri' in entity:
            return f"<{entity['uri']}>"
        elif 'id' in entity:
            return f"wd:{entity['id']}"
        else:
            return make_literal(str(entity))
    
    def _format_property(self, prop: dict) -> str:
        """Format a Wikidata property."""
        return f"wdt:{prop['id']}"
    
    def generate_assertion(self) -> str:
        """Generate the Wikidata statement assertion graph."""
        statements = self.config.get('statements', [])
        
        lines = [f"{self.sub_prefix}:assertion {{"]
        
        for stmt in statements:
            subject = self._format_entity(stmt['subject'])
            predicate = self._format_property(stmt['property'])
            obj = self._format_entity(stmt['object'])
            
            # Add the main statement
            lines.append(f"  {subject} {predicate} {obj} .")
            
            # Add labels for entities if provided
            if 'label' in stmt['subject']:
                lines.append(f"  {subject} rdfs:label {make_literal(stmt['subject']['label'])} .")
            if 'label' in stmt['object'] and 'id' in stmt['object']:
                obj_ref = self._format_entity(stmt['object'])
                lines.append(f"  {obj_ref} rdfs:label {make_literal(stmt['object']['label'])} .")
        
        lines.append("}")
        return "\n".join(lines)

In [29]:
# Configuration
CONFIG_FILE = "../config/vbae208/vbae208_wikidata.json"  # Change this to use different config
#CONFIG_FILE = "../config/clenet2025/clenet2025_wikidata.json"  # Change this to use different config
OUTPUT_DIR = "../output/wikidata"

# Create output directory
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)

In [30]:
# Load configuration
config = load_config(CONFIG_FILE)

print(f"Source paper: {config['metadata']['source_paper']['title']}")
print(f"DOI: {config['metadata']['source_paper']['doi']}")
print(f"Number of Wikidata nanopublications to generate: {len(config['nanopublications'])}")
print()

print("Common properties available:")
for pid, prop in config.get('common_properties', {}).items():
    print(f"  {pid}: {prop['label']}")
print()

print("Statements to generate:")
for i, np_config in enumerate(config['nanopublications'], 1):
    for stmt in np_config.get('statements', []):
        subj = stmt['subject'].get('label', stmt['subject'].get('id', stmt['subject'].get('uri', '?')))
        prop = stmt['property'].get('label', stmt['property'].get('id', '?'))
        obj = stmt['object'].get('label', stmt['object'].get('id', '?'))
        print(f"{i}. {subj} - {prop} - {obj}")

Source paper: QOMIC: quantum optimization for motif identification
DOI: 10.1093/bioadv/vbae208
Number of Wikidata nanopublications to generate: 4

Common properties available:
  P31: instance of
  P279: subclass of
  P921: main subject
  P2283: uses
  P275: license
  P178: developer
  P306: operating system
  P277: programming language
  P1343: described by source

Statements to generate:
1. QOMIC - instance of - software
2. QOMIC - uses - Qiskit
3. QOMIC paper - main subject - network motif
4. QOMIC paper - main subject - quantum computing


In [31]:
# Generate nanopublications
generated_files = []

for np_config in config['nanopublications']:
    # Create generator
    generator = WikidataNanopubGenerator(config, np_config)
    
    # Generate nanopub content
    nanopub_content = generator.generate()
    
    # Save to file
    output_file = f"{OUTPUT_DIR}/{np_config['id']}.trig"
    save_nanopub(nanopub_content, output_file)
    generated_files.append(output_file)
    
    print(f"Generated: {output_file}")

print(f"\nTotal generated: {len(generated_files)} nanopublications")

Generated: ../output/wikidata/wikidata_qomic_instance.trig
Generated: ../output/wikidata/wikidata_qomic_uses_qiskit.trig
Generated: ../output/wikidata/wikidata_paper_subject_motif.trig
Generated: ../output/wikidata/wikidata_paper_subject_quantum.trig

Total generated: 4 nanopublications


In [32]:
# Preview first generated nanopublication
if generated_files:
    print(f"Preview of {generated_files[0]}:\n")
    print("=" * 80)
    with open(generated_files[0], 'r') as f:
        print(f.read())

Preview of ../output/wikidata/wikidata_qomic_instance.trig:

@prefix this: <https://w3id.org/np/RAc13c3189335f5cbae1b9d45f98ad97a66f2a99697aa> .
@prefix sub: <https://w3id.org/np/RAc13c3189335f5cbae1b9d45f98ad97a66f2a99697aa/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix np: <http://www.nanopub.org/nschema#> .
@prefix npx: <http://purl.org/nanopub/x/> .
@prefix nt: <https://w3id.org/np/o/ntemplate/> .
@prefix orcid: <https://orcid.org/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

sub:Head {
  this: a np:Nanopublication ;
    np:hasAssertion sub:assertion ;
    np:hasProvenance sub:provenance ;
    np:hasPublicationInfo sub:pubinfo .
}

sub:assertion {
  <https://gi

## Next Steps

1. Review the generated `.trig` files in the output directory
2. Sign and publish using Nanodash or nanopub-java
3. To use with a different paper, create a new JSON config file and update `CONFIG_FILE`