# PRISMA Database Search Nanopublications

**Notebook 1 of 2** - Creates individual nanopublications for each database search.

## Template
- `RA8MyCoRqMdgGqqOwN4MIQfe6Htwt5FPgiHXlXK4RKiic` - Declaring a systematic database search

## Output
- One nanopub per database, each creating a `/search` resource
- These URIs will be used in **Notebook 2** (Search Execution Dataset)

## Prerequisites
- `nanopub` Python library installed
- Nanopub profile configured with your ORCID and keys
- Your `search-execution-*.json` configuration file

---
## Step 1: Load Configuration

In [1]:
import json
from pathlib import Path

# Path to your configuration file
CONFIG_FILE = "../inputs/quantum-biodiversity/search-execution-quantum-biodiversity.json"

with open(CONFIG_FILE, 'r') as f:
    config = json.load(f)

sed = config['search_execution_dataset']

print(f"Loaded: {sed['label']}")
print(f"Author: {config['author']['name']} ({config['author']['orcid']})")
print(f"Part of: {sed['part_of']}")
print(f"\nDatabase searches to create: {len(sed['db_searches'])}")
for i, db in enumerate(sed['db_searches'], 1):
    print(f"  {i}. {db['database_label']} ({db['results_count']} results)")

Loaded: Quantum Computing for Biodiversity - Search Execution Results
Author: Anne Fouilloux (0000-0002-1784-2920)
Part of: https://w3id.org/spaces/sciencelive/quantum-biodiversity-review

Database searches to create: 6
  1. OpenAlex (467 results)
  2. arXiv (178 results)
  3. Semantic Scholar (621 results)
  4. PubMed (4 results)
  5. IEEE Xplore (OA only) (0 results)
  6. Europe PMC (597 results)


---
## Step 2: Setup

In [9]:
from datetime import datetime, timezone
from rdflib import Dataset, Namespace, URIRef, Literal
from rdflib.namespace import RDF, RDFS, XSD, FOAF, PROV, DCTERMS as DCT

# Namespaces
NP = Namespace("http://www.nanopub.org/nschema#")
NPX = Namespace("http://purl.org/nanopub/x/")
NT = Namespace("https://w3id.org/np/o/ntemplate/")
ORCID = Namespace("https://orcid.org/")
SLV = Namespace("https://w3id.org/sciencelive/o/terms/")

# Template URIs
DB_SEARCH_TEMPLATE = URIRef("https://w3id.org/np/RA8MyCoRqMdgGqqOwN4MIQfe6Htwt5FPgiHXlXK4RKiic")
PROVENANCE_TEMPLATE = URIRef("https://w3id.org/np/RA7lSq6MuK_TIC6JMSHvLtee3lpLoZDOqLJCLXevnrPoU")
PUBINFO_TEMPLATE_1 = URIRef("https://w3id.org/np/RA0J4vUn_dekg-U1kK3AOEt02p9mT2WO03uGxLDec1jLw")
PUBINFO_TEMPLATE_2 = URIRef("https://w3id.org/np/RAukAcWHRDlkqxk7H2XNSegc1WnHI569INvNr-xdptDGI")

# Output directory
output_dir = Path("./output")
output_dir.mkdir(parents=True, exist_ok=True)


input_dir = Path("../inputs")
input_dir.mkdir(parents=True, exist_ok=True)

print(f"Output directory: {output_dir.absolute()}")

Output directory: /Users/annef/Documents/FAIR2Adapt/systematic-review-pipeline/notebooks/output


---
## Step 3: Define Creation Function

In [3]:
def create_database_search_nanopub(db_search, author, systematic_review_uri, search_date, search_strategy_uri=None):
    """
    Create a SystematicDatabaseSearch nanopublication.
    
    Template: RA8MyCoRqMdgGqqOwN4MIQfe6Htwt5FPgiHXlXK4RKiic
    """
    TEMP_NP = Namespace("http://purl.org/nanopub/temp/np/")
    
    ds = Dataset()
    
    # Bind namespaces
    ds.bind("this", "http://purl.org/nanopub/temp/np/")
    ds.bind("sub", TEMP_NP)
    ds.bind("np", NP)
    ds.bind("dct", DCT)
    ds.bind("nt", NT)
    ds.bind("npx", NPX)
    ds.bind("xsd", XSD)
    ds.bind("rdfs", RDFS)
    ds.bind("orcid", ORCID)
    ds.bind("prov", PROV)
    ds.bind("foaf", FOAF)
    ds.bind("slv", SLV)
    
    # URIs
    np_uri = URIRef("http://purl.org/nanopub/temp/np/")
    head_uri = TEMP_NP.Head
    assertion_uri = TEMP_NP.assertion
    provenance_uri = TEMP_NP.provenance
    pubinfo_uri = TEMP_NP.pubinfo
    search_uri = TEMP_NP.search  # becomes <nanopub-uri>/search after signing
    author_uri = ORCID[author['orcid']]
    
    label = f"{db_search['database_label']} Search"
    
    # HEAD
    head = ds.graph(head_uri)
    head.add((np_uri, RDF.type, NP.Nanopublication))
    head.add((np_uri, NP.hasAssertion, assertion_uri))
    head.add((np_uri, NP.hasProvenance, provenance_uri))
    head.add((np_uri, NP.hasPublicationInfo, pubinfo_uri))
    
    # ASSERTION
    assertion = ds.graph(assertion_uri)
    assertion.add((search_uri, RDF.type, SLV.SystematicDatabaseSearch))
    assertion.add((search_uri, RDFS.label, Literal(label)))
    assertion.add((search_uri, DCT.isPartOf, URIRef(systematic_review_uri)))
    
    if search_strategy_uri:
        assertion.add((search_uri, SLV.followsSearchStrategy, URIRef(search_strategy_uri)))
    
    assertion.add((search_uri, SLV.hasDatabaseUrl, URIRef(db_search['database_url'])))
    assertion.add((search_uri, SLV.hasSearchTime, Literal(search_date, datatype=XSD.date)))
    assertion.add((search_uri, SLV.hasSearchQuery, Literal(db_search['search_query'])))
    assertion.add((search_uri, SLV.hasRetrievedRecordCount, Literal(str(db_search['results_count']))))
    
    # PROVENANCE
    provenance = ds.graph(provenance_uri)
    provenance.add((assertion_uri, PROV.wasAttributedTo, author_uri))
    
    # PUBINFO
    pubinfo = ds.graph(pubinfo_uri)
    pubinfo.add((author_uri, FOAF.name, Literal(author['name'])))
    
    now = datetime.now(timezone.utc).isoformat()
    pubinfo.add((np_uri, DCT.created, Literal(now, datatype=XSD.dateTime)))
    pubinfo.add((np_uri, DCT.creator, author_uri))
    pubinfo.add((np_uri, DCT.license, URIRef("https://creativecommons.org/licenses/by/4.0/")))
    pubinfo.add((np_uri, RDFS.label, Literal(label)))
    pubinfo.add((np_uri, NPX.introduces, search_uri))
    pubinfo.add((np_uri, NPX.wasCreatedAt, URIRef("https://nanodash.knowledgepixels.com/")))
    
    # Template references
    pubinfo.add((np_uri, NT.wasCreatedFromTemplate, DB_SEARCH_TEMPLATE))
    pubinfo.add((np_uri, NT.wasCreatedFromProvenanceTemplate, PROVENANCE_TEMPLATE))
    pubinfo.add((np_uri, NT.wasCreatedFromPubinfoTemplate, PUBINFO_TEMPLATE_1))
    pubinfo.add((np_uri, NT.wasCreatedFromPubinfoTemplate, PUBINFO_TEMPLATE_2))
    
    return ds, label

print("Function defined.")

Function defined.


---
## Step 4: Generate Database Search Nanopubs

In [4]:
# Optional: Set a search strategy URI if you have one
SEARCH_STRATEGY_URI = None  # e.g., "https://w3id.org/np/RAxxxx"

# Generate all database search nanopubs
db_search_outputs = []

for idx, db_search in enumerate(sed['db_searches']):
    ds, label = create_database_search_nanopub(
        db_search=db_search,
        author=config['author'],
        systematic_review_uri=sed['part_of'],
        search_date=sed['creation_date'],
        search_strategy_uri=SEARCH_STRATEGY_URI
    )
    
    # Create safe filename
    safe_name = db_search['database_label'].lower().replace(' ', '-').replace('(', '').replace(')', '')
    filename = f"{config['output']['filename']}-db-search-{idx+1}-{safe_name}.unsigned.trig"
    filepath = output_dir / filename
    ds.serialize(destination=str(filepath), format='trig')
    
    db_search_outputs.append({
        'index': idx + 1,
        'database': db_search['database_label'],
        'label': label,
        'filepath': filepath
    })
    
    print(f"✓ {idx+1}. {db_search['database_label']} → {filename}")

print(f"\n{'='*60}")
print(f"Generated {len(db_search_outputs)} unsigned nanopubs in {output_dir}")

✓ 1. OpenAlex → quantum-biodiversity-search-execution-db-search-1-openalex.unsigned.trig
✓ 2. arXiv → quantum-biodiversity-search-execution-db-search-2-arxiv.unsigned.trig
✓ 3. Semantic Scholar → quantum-biodiversity-search-execution-db-search-3-semantic-scholar.unsigned.trig
✓ 4. PubMed → quantum-biodiversity-search-execution-db-search-4-pubmed.unsigned.trig
✓ 5. IEEE Xplore (OA only) → quantum-biodiversity-search-execution-db-search-5-ieee-xplore-oa-only.unsigned.trig
✓ 6. Europe PMC → quantum-biodiversity-search-execution-db-search-6-europe-pmc.unsigned.trig

Generated 6 unsigned nanopubs in output


---
## Step 5: Sign and Publish

**Set `PUBLISH = True` when ready.**

In [5]:
PUBLISH = True          # ← Set to True when ready
USE_TEST_SERVER = False   # ← Set to False for production

published_results = []

if PUBLISH:
    from nanopub import Nanopub, NanopubConf, load_profile
    
    profile = load_profile()
    print(f"Profile: {profile.name}")
    print(f"Test server: {USE_TEST_SERVER}")
    print()
    
    conf = NanopubConf(profile=profile, use_test_server=USE_TEST_SERVER)
    
    for db_out in db_search_outputs:
        np_obj = Nanopub(rdf=db_out['filepath'], conf=conf)
        np_obj.sign()
        
        signed_path = db_out['filepath'].with_suffix('.signed.trig')
        np_obj.store(signed_path)
        
        np_obj.publish()
        
        search_uri = f"{np_obj.source_uri}/search"
        
        published_results.append({
            'database': db_out['database'],
            'label': db_out['label'],
            'nanopub_uri': np_obj.source_uri,
            'search_uri': search_uri
        })
        
        print(f"✓ {db_out['database']}")
        print(f"  {search_uri}")
        print()

else:
    print("Publishing disabled. Set PUBLISH = True to publish.")

Profile: Anne Fouilloux
Test server: False

✓ OpenAlex
  https://w3id.org/np/RAkV7KfvYqIpFLfMqqLmZxyCpr4VFcgE4ln3FRYrZwdaE/search

✓ arXiv
  https://w3id.org/np/RApg5W58ZcDLDtKF2txz-SrOVsjM-3lpoPkrWGOFcigZU/search

✓ Semantic Scholar
  https://w3id.org/np/RARwCFSuFUqEE0OPW8CeynB8cx4zemGB09obBA9eXfo78/search

✓ PubMed
  https://w3id.org/np/RAkZsd1zUQn8lXlN8yNQgZBO2W4iuZhMQDE0r-kBXrF6g/search

✓ IEEE Xplore (OA only)
  https://w3id.org/np/RAVBmg81fxSEFhZ5W9D89XxBR2Scts7yWSWFtFY4OX5UI/search

✓ Europe PMC
  https://w3id.org/np/RASfip5KQntGv4ACoKui2ZBwhSQKTQPPClPC1zyQHo29M/search



---
## Step 6: Save URIs for Notebook 2

After publishing, save the URIs to a JSON file for use in the Search Execution Dataset notebook.

In [10]:
if published_results:
    # Save to JSON file
    uris_file = input_dir / f"{config['output']['filename']}-db-search-uris.json"
    
    output_data = {
        "_description": "Database search URIs for use in Search Execution Dataset notebook",
        "_created": datetime.now(timezone.utc).isoformat(),
        "_test_server": USE_TEST_SERVER,
        "database_searches": published_results
    }
    
    with open(uris_file, 'w') as f:
        json.dump(output_data, f, indent=2)
    
    print(f"Saved URIs to: {uris_file}")
    print(f"\nUse this file in Notebook 2 (Search Execution Dataset)")
    
    # Also print for easy copying
    print(f"\n{'='*60}")
    print("DATABASE SEARCH URIs (for Notebook 2)")
    print("="*60)
    for item in published_results:
        print(f'"{item["search_uri"]}",  # {item["database"]}')

else:
    print("No published results. Run Step 5 with PUBLISH = True first.")

Saved URIs to: ../inputs/quantum-biodiversity-search-execution-db-search-uris.json

Use this file in Notebook 2 (Search Execution Dataset)

DATABASE SEARCH URIs (for Notebook 2)
"https://w3id.org/np/RAkV7KfvYqIpFLfMqqLmZxyCpr4VFcgE4ln3FRYrZwdaE/search",  # OpenAlex
"https://w3id.org/np/RApg5W58ZcDLDtKF2txz-SrOVsjM-3lpoPkrWGOFcigZU/search",  # arXiv
"https://w3id.org/np/RARwCFSuFUqEE0OPW8CeynB8cx4zemGB09obBA9eXfo78/search",  # Semantic Scholar
"https://w3id.org/np/RAkZsd1zUQn8lXlN8yNQgZBO2W4iuZhMQDE0r-kBXrF6g/search",  # PubMed
"https://w3id.org/np/RAVBmg81fxSEFhZ5W9D89XxBR2Scts7yWSWFtFY4OX5UI/search",  # IEEE Xplore (OA only)
"https://w3id.org/np/RASfip5KQntGv4ACoKui2ZBwhSQKTQPPClPC1zyQHo29M/search",  # Europe PMC


---
## Next Steps

After publishing the database search nanopubs:

1. Note the output JSON file path: `output/{prefix}-db-search-uris.json`
2. Open **Notebook 2: Search Execution Dataset**
3. Load the URIs file and create the aggregating dataset