# PRISMA Search Execution Dataset Nanopublication

**Notebook 2 of 2** - Creates the aggregating Search Execution Dataset that references individual database searches.

## Template
- `RAV_H3udaSzxYOhhR0t-q7PKS6URwauD_Z5sMLbHmM2x0` - Declaring a search execution dataset

## Features
- **Create new** or **supersede existing** nanopub
- References external database search nanopubs (from Notebook 1)
- Includes `nt:hasLabelFromApi` for proper Nanodash display

## Prerequisites
- **Notebook 1 completed**: Database search nanopubs published
- URIs file from Notebook 1: `*-db-search-uris.json`
- Your original configuration file

---
## Step 1: Load Configuration and Database Search URIs

In [1]:
import json
from pathlib import Path

# Path to your original configuration file
CONFIG_FILE = "../inputs/quantum-biodiversity/search-execution-quantum-biodiversity.json"

# Path to URIs file from Notebook 1
URIS_FILE = "../inputs/quantum-biodiversity/quantum-biodiversity-search-execution-db-search-uris.json"

# Load config
with open(CONFIG_FILE, 'r') as f:
    config = json.load(f)

sed = config['search_execution_dataset']
print(f"Config: {sed['label']}")
print(f"Author: {config['author']['name']}")

Config: Quantum Computing for Biodiversity - Search Execution Results
Author: Anne Fouilloux


In [2]:
# Load database search URIs from Notebook 1
try:
    with open(URIS_FILE, 'r') as f:
        uris_data = json.load(f)
    
    db_searches = uris_data['database_searches']
    print(f"Loaded {len(db_searches)} database search URIs:")
    for item in db_searches:
        print(f"  • {item['database']}: {item['search_uri']}")
    
    DB_SEARCH_URIS = [item['search_uri'] for item in db_searches]
    LABELS_MAP = {item['search_uri']: item['label'] for item in db_searches}

except FileNotFoundError:
    print(f"❌ URIs file not found: {URIS_FILE}")
    print("\nEither:")
    print("  1. Run Notebook 1 first to publish database searches, or")
    print("  2. Manually enter URIs in the cell below")
    
    DB_SEARCH_URIS = []
    LABELS_MAP = {}

Loaded 6 database search URIs:
  • OpenAlex: https://w3id.org/np/RAkV7KfvYqIpFLfMqqLmZxyCpr4VFcgE4ln3FRYrZwdaE/search
  • arXiv: https://w3id.org/np/RApg5W58ZcDLDtKF2txz-SrOVsjM-3lpoPkrWGOFcigZU/search
  • Semantic Scholar: https://w3id.org/np/RARwCFSuFUqEE0OPW8CeynB8cx4zemGB09obBA9eXfo78/search
  • PubMed: https://w3id.org/np/RAkZsd1zUQn8lXlN8yNQgZBO2W4iuZhMQDE0r-kBXrF6g/search
  • IEEE Xplore (OA only): https://w3id.org/np/RAVBmg81fxSEFhZ5W9D89XxBR2Scts7yWSWFtFY4OX5UI/search
  • Europe PMC: https://w3id.org/np/RASfip5KQntGv4ACoKui2ZBwhSQKTQPPClPC1zyQHo29M/search


---
## Step 1b: Configure Supersedes (Optional)

If you want to **update an existing nanopub**, set the URI here. The new nanopub will supersede the old one.

The old nanopub will be marked as outdated in Nanodash and queries will return the new version instead.

In [3]:
# Set this to supersede an existing nanopub (set to None to create new)
SUPERSEDES_URI = "https://w3id.org/np/RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8"

if SUPERSEDES_URI:
    print(f"⚠️  Will SUPERSEDE existing nanopub:")
    print(f"   {SUPERSEDES_URI}")
    print(f"\n   The old nanopub will be marked as outdated.")
    print(f"   View old: https://nanodash.knowledgepixels.com/explore?id={SUPERSEDES_URI}")
else:
    print("Creating NEW nanopub (not superseding any existing one)")

⚠️  Will SUPERSEDE existing nanopub:
   https://w3id.org/np/RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8

   The old nanopub will be marked as outdated.
   View old: https://nanodash.knowledgepixels.com/explore?id=https://w3id.org/np/RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8


---
## Step 2: Setup

In [4]:
from datetime import datetime, timezone
from rdflib import Dataset, Namespace, URIRef, Literal
from rdflib.namespace import RDF, RDFS, XSD, FOAF, PROV, DCTERMS as DCT

# Namespaces
NP = Namespace("http://www.nanopub.org/nschema#")
NPX = Namespace("http://purl.org/nanopub/x/")
NT = Namespace("https://w3id.org/np/o/ntemplate/")
ORCID = Namespace("https://orcid.org/")
SLV = Namespace("https://w3id.org/sciencelive/o/terms/")

# Template URIs
SEARCH_EXECUTION_TEMPLATE = URIRef("https://w3id.org/np/RAV_H3udaSzxYOhhR0t-q7PKS6URwauD_Z5sMLbHmM2x0")
PROVENANCE_TEMPLATE = URIRef("https://w3id.org/np/RA7lSq6MuK_TIC6JMSHvLtee3lpLoZDOqLJCLXevnrPoU")
PUBINFO_TEMPLATE_1 = URIRef("https://w3id.org/np/RA0J4vUn_dekg-U1kK3AOEt02p9mT2WO03uGxLDec1jLw")
PUBINFO_TEMPLATE_2 = URIRef("https://w3id.org/np/RAukAcWHRDlkqxk7H2XNSegc1WnHI569INvNr-xdptDGI")

# Output directory
output_dir = Path("./output")
output_dir.mkdir(parents=True, exist_ok=True)

print("Setup complete.")

Setup complete.


---
## Step 3: Define Creation Function

In [5]:
def create_search_execution_dataset_nanopub(config, db_search_uris, labels_map, supersedes_uri=None):
    """
    Create a SearchExecutionDataset nanopublication.
    
    Template: RAV_H3udaSzxYOhhR0t-q7PKS6URwauD_Z5sMLbHmM2x0
    
    Args:
        config: Configuration dict
        db_search_uris: List of /search URIs from published database search nanopubs
        labels_map: Dict mapping URIs to labels
        supersedes_uri: Optional URI of nanopub to supersede (for updates)
    """
    TEMP_NP = Namespace("http://purl.org/nanopub/temp/np/")
    
    ds = Dataset()
    
    # Bind namespaces
    ds.bind("this", "http://purl.org/nanopub/temp/np/")
    ds.bind("sub", TEMP_NP)
    ds.bind("np", NP)
    ds.bind("dct", DCT)
    ds.bind("nt", NT)
    ds.bind("npx", NPX)
    ds.bind("xsd", XSD)
    ds.bind("rdfs", RDFS)
    ds.bind("orcid", ORCID)
    ds.bind("prov", PROV)
    ds.bind("foaf", FOAF)
    ds.bind("slv", SLV)
    
    # URIs
    np_uri = URIRef("http://purl.org/nanopub/temp/np/")
    head_uri = TEMP_NP.Head
    assertion_uri = TEMP_NP.assertion
    provenance_uri = TEMP_NP.provenance
    pubinfo_uri = TEMP_NP.pubinfo
    search_execution_uri = TEMP_NP.searchExecutionDataset
    author_uri = ORCID[config['author']['orcid']]
    
    sed = config['search_execution_dataset']
    
    # HEAD
    head = ds.graph(head_uri)
    head.add((np_uri, RDF.type, NP.Nanopublication))
    head.add((np_uri, NP.hasAssertion, assertion_uri))
    head.add((np_uri, NP.hasProvenance, provenance_uri))
    head.add((np_uri, NP.hasPublicationInfo, pubinfo_uri))
    
    # ASSERTION
    assertion = ds.graph(assertion_uri)
    assertion.add((search_execution_uri, RDF.type, SLV.SearchExecutionDataset))
    assertion.add((search_execution_uri, RDFS.label, Literal(sed['label'])))
    assertion.add((search_execution_uri, DCT.isPartOf, URIRef(sed['part_of'])))
    assertion.add((search_execution_uri, DCT.created, Literal(sed['creation_date'], datatype=XSD.date)))
    
    # Reference database searches (external URIs)
    for db_uri in db_search_uris:
        assertion.add((search_execution_uri, SLV.includesDbSearch, URIRef(db_uri)))
    
    # Methodologies
    assertion.add((search_execution_uri, SLV.usesDeduplicationMethodology, Literal(sed['deduplication_methodology'])))
    assertion.add((search_execution_uri, SLV.usesReviewMethodology, Literal(sed['review_methodology'])))
    assertion.add((search_execution_uri, SLV.usesScreeningMethodology, Literal(sed['screening_methodology'])))
    
    # Counts
    assertion.add((search_execution_uri, SLV.hasScreenedRecordCount, Literal(sed['screened_record_count'])))
    assertion.add((search_execution_uri, SLV.hasFulltextScreenedRecordCount, Literal(sed['fulltext_screened_record_count'])))
    assertion.add((search_execution_uri, SLV.hasFinalIncludedStudyCount, Literal(sed['final_included_study_count'])))
    assertion.add((search_execution_uri, SLV.hasExclusionBreakdown, Literal(sed['exclusion_breakdown'])))
    
    # Dataset file location (REQUIRED)
    if sed.get('dataset_file_location'):
        assertion.add((search_execution_uri, SLV.hasDatasetFileLocation, URIRef(sed['dataset_file_location'])))
    else:
        assertion.add((search_execution_uri, SLV.hasDatasetFileLocation, URIRef("https://zenodo.org/records/PLACEHOLDER")))
        print("⚠️  Warning: dataset_file_location empty, using placeholder")
    
    # Limitations (optional)
    if sed.get('limitations'):
        assertion.add((search_execution_uri, SLV.hasLimitations, Literal(sed['limitations'])))
    
    # PROVENANCE
    provenance = ds.graph(provenance_uri)
    provenance.add((assertion_uri, PROV.wasAttributedTo, author_uri))
    
    # PUBINFO
    pubinfo = ds.graph(pubinfo_uri)
    pubinfo.add((author_uri, FOAF.name, Literal(config['author']['name'])))
    
    now = datetime.now(timezone.utc).isoformat()
    pubinfo.add((np_uri, DCT.created, Literal(now, datatype=XSD.dateTime)))
    pubinfo.add((np_uri, DCT.creator, author_uri))
    pubinfo.add((np_uri, DCT.license, URIRef("https://creativecommons.org/licenses/by/4.0/")))
    pubinfo.add((np_uri, RDFS.label, Literal(sed['label'][:50] + "...")))
    pubinfo.add((np_uri, NPX.introduces, search_execution_uri))
    pubinfo.add((np_uri, NPX.wasCreatedAt, URIRef("https://nanodash.knowledgepixels.com/")))
    
    # SUPERSEDES - if updating an existing nanopub
    if supersedes_uri:
        pubinfo.add((np_uri, NPX.supersedes, URIRef(supersedes_uri)))
        print(f"✓ Will supersede: {supersedes_uri}")
    
    # Template references
    pubinfo.add((np_uri, NT.wasCreatedFromTemplate, SEARCH_EXECUTION_TEMPLATE))
    pubinfo.add((np_uri, NT.wasCreatedFromProvenanceTemplate, PROVENANCE_TEMPLATE))
    pubinfo.add((np_uri, NT.wasCreatedFromPubinfoTemplate, PUBINFO_TEMPLATE_1))
    pubinfo.add((np_uri, NT.wasCreatedFromPubinfoTemplate, PUBINFO_TEMPLATE_2))
    
    # Labels for referenced searches (CRITICAL for Nanodash display!)
    for uri, label in labels_map.items():
        pubinfo.add((URIRef(uri), NT.hasLabelFromApi, Literal(label)))
    
    return ds

print("Function defined.")

Function defined.


---
## Step 4: Generate Search Execution Dataset Nanopub

In [6]:
if not DB_SEARCH_URIS:
    print("❌ No database search URIs available.")
    print("Please complete Step 1 first.")
else:
    # Create the nanopub
    sed_dataset = create_search_execution_dataset_nanopub(
        config=config,
        db_search_uris=DB_SEARCH_URIS,
        labels_map=LABELS_MAP,
        supersedes_uri=SUPERSEDES_URI
    )
    
    # Save unsigned
    suffix = "-update" if SUPERSEDES_URI else ""
    filename = f"{config['output']['filename']}{suffix}.unsigned.trig"
    filepath = output_dir / filename
    sed_dataset.serialize(destination=str(filepath), format='trig')
    
    print(f"\n✓ Created: {filename}")
    print(f"  References {len(DB_SEARCH_URIS)} database searches")
    if SUPERSEDES_URI:
        print(f"  Supersedes: {SUPERSEDES_URI}")
    print(f"  Path: {filepath.absolute()}")

✓ Will supersede: https://w3id.org/np/RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8

✓ Created: quantum-biodiversity-search-execution-update.unsigned.trig
  References 6 database searches
  Supersedes: https://w3id.org/np/RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8
  Path: /Users/annef/Documents/FAIR2Adapt/systematic-review-pipeline/notebooks/output/quantum-biodiversity-search-execution-update.unsigned.trig


---
## Step 5: Sign and Publish

**Set `PUBLISH = True` when ready.**

In [8]:
PUBLISH = True          # ← Set to True when ready
USE_TEST_SERVER = False   # ← Set to False for production

if PUBLISH and DB_SEARCH_URIS:
    from nanopub import Nanopub, NanopubConf, load_profile
    
    profile = load_profile()
    print(f"Profile: {profile.name}")
    print(f"Test server: {USE_TEST_SERVER}")
    
    conf = NanopubConf(profile=profile, use_test_server=USE_TEST_SERVER)
    
    np_obj = Nanopub(rdf=filepath, conf=conf)
    np_obj.sign()
    print("✓ Signed")
    
    signed_path = filepath.with_suffix('.signed.trig')
    np_obj.store(signed_path)
    print(f"✓ Saved: {signed_path}")
    
    np_obj.publish()
    print(f"✓ Published")
    
    print(f"\n{'='*60}")
    if SUPERSEDES_URI:
        print("SEARCH EXECUTION DATASET UPDATED")
        print("="*60)
        print(f"New URI: {np_obj.source_uri}")
        print(f"Supersedes: {SUPERSEDES_URI}")
    else:
        print("SEARCH EXECUTION DATASET PUBLISHED")
        print("="*60)
        print(f"URI: {np_obj.source_uri}")
    
    print(f"\nView in Nanodash:")
    print(f"https://nanodash.knowledgepixels.com/explore?id={np_obj.source_uri}")

elif not DB_SEARCH_URIS:
    print("❌ Cannot publish: No database search URIs")
else:
    print("Publishing disabled. Set PUBLISH = True when ready.")
    print(f"\nFile created: {filepath}")

Profile: Anne Fouilloux
Test server: False
✓ Signed
✓ Saved: output/quantum-biodiversity-search-execution-update.unsigned.signed.trig
✓ Published

SEARCH EXECUTION DATASET UPDATED
New URI: https://w3id.org/np/RAhFlAUVte1zioZDIBXyg6GdSziwLxgqwxPkDi7v110WU
Supersedes: https://w3id.org/np/RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8

View in Nanodash:
https://nanodash.knowledgepixels.com/explore?id=https://w3id.org/np/RAhFlAUVte1zioZDIBXyg6GdSziwLxgqwxPkDi7v110WU


---
## Summary

### Superseding Explained

Nanopublications are **immutable** - once published, they cannot be modified. However, you can create a new version that **supersedes** the old one by adding:

```turtle
sub:pubinfo {
    this: npx:supersedes <https://w3id.org/np/OLD_NANOPUB_URI> .
}
```

When you supersede a nanopub:
- The old nanopub remains accessible but is marked as outdated
- Queries and indexes will return the new version
- The version history is preserved

### Why This Update Was Needed

Your original nanopub `RAMPy96eCLCXlGR9VvCVf6rJmpN_DlxxarMGm91_5n-O8` used **inline** database search definitions:

```turtle
# ❌ Original (inline resources don't display in Nanodash)
sub:searchExecutionDataset slv:includesDbSearch sub:dbSearch1 .
sub:dbSearch1 rdfs:label "OpenAlex" .
```

The template expects **external URI references**:

```turtle
# ✅ Correct (references published nanopubs)
sub:searchExecutionDataset slv:includesDbSearch <https://w3id.org/np/RAxxxx/search> .
```

This new nanopub will supersede the old one with proper external references.