# DataONE Replication Demo

This notebook demonstrates how to use the DataONE replication service to replicate data from a DataONE Member repository (MN) to another member repository. It uses the DataONE Python libraries to interact with the DataONE APIs.
The DataONE python libraries are located in GitHub at [github.com/DataONEorg/d1_python](https://github.com/DataONEorg/d1_python) with documentation at [dataone-python.readthedocs.io](https://dataone-python.readthedocs.io/en/latest/).

## Installation

Installation under a virtual environment is recommended. [`uv`](https://docs.astral.sh/uv/) is a tool for creating and managing virtual environments in Python. It is similar to `virtualenv` but has some additional features.

uv:
```
uv init
uv add dataone.cli 
uv add ipykernel
```

## Getting a client

The DataONE python clients provide an abstraction of the DataONE APIs enabling interaction with DataONE Member (MN) and Coordinating (CN) Nodes. Since the APIs of MNs and CNs differ, there are two basic types of client - one for Member rpositories (MNs) and one for Coordinating Nodes (CNs) that derive from a common base. There are also different versions of the DataONE API in use, though these have mostly upgraded to version 2. CNs are all operating with the version 2 API.

### Getting a MN client

eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJodHRwOlwvXC9vcmNpZC5vcmdcLzAwMDAtMDAwMy0wMDc3LTQ3MzgiLCJmdWxsTmFtZSI6Ik1hdHRoZXcgQi4gSm9uZXMiLCJpc3N1ZWRBdCI6IjIwMjUtMDctMDJUMDE6MjM6MjMuNjkyKzAwOjAwIiwiY29uc3VtZXJLZXkiOiJ0aGVjb25zdW1lcmtleSIsImV4cCI6MTc1MTQ4NDIwMywidXNlcklkIjoiaHR0cDpcL1wvb3JjaWQub3JnXC8wMDAwLTAwMDMtMDA3Ny00NzM4IiwidHRsIjo2NDgwMCwiaWF0IjoxNzUxNDE5NDAzfQ.LV7KW14JVAiP-IuGE-AX8UaWgJRpbG5QKzFcmR94IDLlN7BjMGDjwTJaZ7Zu-L0shIye4CoTa_jfWm0IrBPbZww5tOTpOebvVdsRsAiiIsK-71y74hxOuUL5n6OpjcoULbGZEEKFMxISTTle5lVnCMgEcawxvVsDHI9_oc5Tyk0QUiOY_bWpWJIM01yT94eKEvefRBAgIVSx8WeNkkexK7CvOIVhL5try81kGseA5_SJQtKUOKDo8L6LZWhaHj4DxzmJV361pzi9rc-tG0FTl93DdvnS43F-mLIEO4HcyS7JwwlYPwkToPn4GJwR7OZrbxnz32GC_lYT3VbFTLw6LA


In [89]:
import d1_client.mnclient_2_0

# Authentication token (JWT) from the DataONE Test Environment
# Create a file named jwtkey with the token on the first line and no other lines
with open("jwtkey", "r") as f:
    for line in f:
        auth_token = line.strip()

# Set the token in the request header
options: dict = {"headers": {"Authorization": "Bearer " + auth_token}}

# Base URL of the KNB Test Repository
base_url = "https://dev.nceas.ucsb.edu/knb/d1/mn"
repo = d1_client.mnclient_2_0.MemberNodeClient_2_0(base_url, **options)

# Call the getCapabilities API method
# https://dataone-architecture-documentation.readthedocs.io/en/latest/apis/MN_APIs.html#MNCore.getCapabilities
node = repo.getCapabilities()

# Response is an instance of a Node document that can be accessed through its properties
# https://dataone-architecture-documentation.readthedocs.io/en/latest/apis/Types.html#Types.Node
print(f"{node.name:30} {node.baseURL}")


KNB Test Node                  https://dev.nceas.ucsb.edu/knb/d1/mn


### Getting a CN client

In [90]:
import d1_client.cnclient_2_0

# The Staging CN base URL:
base_url = "https://cn-stage.test.dataone.org/cn"
cn = d1_client.cnclient_2_0.CoordinatingNodeClient_2_0(base_url, **options)

# Call the listNodes API method
# https://dataone-architecture-documentation.readthedocs.io/en/latest/apis/CN_APIs.html#CNCore.listNodes
nodes = cn.listNodes()

# Response is a structure mirroring the API response message structure, 
# in this case a list of Node objects
# https://dataone-architecture-documentation.readthedocs.io/en/latest/apis/Types.html#Types.NodeList
for node in nodes.node:
    print(f"{node.name:30}\n  {node.identifier.value()}\t {node.baseURL}")

creds = cn.echoCredentials()
print(creds.toxml())

cn-stage                      
  urn:node:cnStage	 https://cn-stage.test.dataone.org/cn
cn-stage-orc-1                
  urn:node:cnStageORC1	 https://cn-stage-orc-1.test.dataone.org/cn
UCSB Stage Dedicated Replica Server 3
  urn:node:mnStageUCSB3	 https://mn-stage-ucsb-3.test.dataone.org/metacat/d1/mn
mn-stage-unm-1                
  urn:node:mnStageUNM1	 https://mn-stage-unm-1.test.dataone.org/mn
UCSB Stage Dedicated Replica Server 2
  urn:node:mnStageUCSB2	 https://mn-stage-ucsb-2.test.dataone.org/metacat/d1/mn
cn-stage-ucsb-1               
  urn:node:cnStageUCSB1	 https://cn-stage-ucsb-1.test.dataone.org/cn
cn-stage-unm-1                
  urn:node:cnStageUNM1	 https://cn-stage-unm-1.test.dataone.org/cn
PISCO Test MN                 
  urn:node:mnStagePISCO	 http://test.piscoweb.org/catalog/d1/mn
SEAD Virtual Archive          
  urn:node:mnTestSEAD	 http://d2i-dev.d2i.indiana.edu:8081/sead/rest/mn
mnTestDRYAD JSON-LD node      
  urn:node:mnTestDRYAD	 https://so.test.dataone.org/m

## Get SystemMetadata for a given persistent identifier (PID)


In [91]:
import datetime

def toStr(v):
    if isinstance(v, datetime.datetime):
        return v.strftime(DATE_FORMAT)
    return str(v)

def propertyStr(p):
    '''String representation of pyxb property
    '''
    if p is None:
        return ""
    try:
        return toStr(p.value())
    except:
        return toStr(p)


def view(sysmeta):
    '''Print SystemMetadata properties
    '''
    print(f"Identifier: {sysmeta.identifier.value()}")
    print(f"Series Identifier: {propertyStr(sysmeta.seriesId)}")
    print(f"Modified: {sysmeta.dateSysMetadataModified}")
    print(f"Uploaded: {sysmeta.dateUploaded}")
    print(f"Format ID: {sysmeta.formatId}")
    print(f"Size: {sysmeta.size}")
    print(f"Checksum: hash://{sysmeta.checksum.algorithm.lower()}/{sysmeta.checksum.value()}")
    print(f"Origin Member Node: {propertyStr(sysmeta.originMemberNode)}")
    print(f"Authoritative Member Node: {propertyStr(sysmeta.authoritativeMemberNode)}")
    print(f"Obsoletes: {propertyStr(sysmeta.obsoletes)}")
    print(f"Obsoleted By: {propertyStr(sysmeta.obsoletedBy)}")
    print("Access policy rules:")
    for rule in sysmeta.accessPolicy.allow:
        print(f"  {', '.join(map(lambda S: S.value(), rule.subject))}  can  {', '.join(rule.permission)}")
    print("Replication policy:")
    print(f"  Replication allowed: {sysmeta.replicationPolicy.replicationAllowed}")
    print(f"  Replicas requested: {sysmeta.replicationPolicy.numberReplicas}")
    for node in sysmeta.replicationPolicy.preferredMemberNode:
        print(f"  Preferred node: {node.value()}")
    for node in sysmeta.replicationPolicy.blockedMemberNode:
        print(f"  Blocked node: {node.value()}")
    print("Replicas of this object:")
    for replica in sysmeta.replica:
        print(f"  {replica.replicaMemberNode.value():15} {replica.replicationStatus:10} {replica.replicaVerified}")

In [92]:
# Try and retrieve existing system metadata
import d1_common.types.exceptions

pid_meta = "urn:uuid:ca5f81cc-7b02-41ed-a316-a03e2b4aea57"
pid_csv = "urn:uuid:82079214-7e3e-4c52-a117-90f497a430ea"
pid_rmap = "resource_map_urn:uuid:3a06877a-3941-412f-a495-da8608fd4f94"
try:
    sysmeta = repo.getSystemMetadata(pid_csv)
    view(sysmeta)
except d1_common.types.exceptions.DataONEException as e:
    print(e)

Identifier: urn:uuid:82079214-7e3e-4c52-a117-90f497a430ea
Series Identifier: 
Modified: 2025-07-03 01:50:32.356000+00:00
Uploaded: 2025-07-02 18:37:35.751000+00:00
Format ID: text/csv
Size: 725
Checksum: hash://md5/8ab476586533121af3a4d787e686cafd
Origin Member Node: urn:node:mnTestKNB
Authoritative Member Node: urn:node:mnTestKNB
Obsoletes: 
Obsoleted By: 
Access policy rules:
  CN=knb-data-admins,DC=dataone,DC=org  can  read, write, changePermission
  public  can  read
Replication policy:
  Replication allowed: true
  Replicas requested: 1
  Preferred node: urn:node:mnStageUCSB2
Replicas of this object:
  urn:node:mnTestKNB completed  2025-07-03 02:41:40.588000+00:00
  urn:node:mnStageUCSB2 completed  2025-07-04 00:45:37.740000+00:00


## Change the replication policy

Changing the replication policy allows setting how many replicas of an object are requested, which nodes are preferred for replication, and which nodes are blocked from replication. The replication policy is part of the SystemMetadata for an object.


In [None]:
# Change the replication policy to allow replication
sysmeta.replicationPolicy.replicationAllowed = True
sysmeta.replicationPolicy.numberReplicas = 1
sysmeta.replicationPolicy.preferredMemberNode = [
    d1_common.types.dataoneTypes.NodeReference("urn:node:mnStageUCSB2")]

view(sysmeta)

Identifier: urn:uuid:82079214-7e3e-4c52-a117-90f497a430ea
Series Identifier: 
Modified: 2025-07-03 01:45:10.436000+00:00
Uploaded: 2025-07-02 18:37:35.751000+00:00
Format ID: text/csv
Size: 725
Checksum: hash://md5/8ab476586533121af3a4d787e686cafd
Origin Member Node: urn:node:mnTestKNB
Authoritative Member Node: urn:node:mnTestKNB
Obsoletes: 
Obsoleted By: 
Access policy rules:
  CN=knb-data-admins,DC=dataone,DC=org  can  read, write, changePermission
  public  can  read
Replication policy:
  Replication allowed: true
  Replicas requested: 1
  Preferred node: urn:node:mnStageUCSB2
Replicas of this object:


## Update the repository with the new replication policy


In [59]:
try:
    updated_flag = repo.updateSystemMetadata(pid_csv, sysmeta)
    if (updated_flag):
        sysmeta_csv = repo.getSystemMetadata(pid_csv)
        view(sysmeta_csv)
except d1_common.types.exceptions.DataONEException as e:
    print(e)


Identifier: urn:uuid:82079214-7e3e-4c52-a117-90f497a430ea
Series Identifier: 
Modified: 2025-07-03 01:50:32.356000+00:00
Uploaded: 2025-07-02 18:37:35.751000+00:00
Format ID: text/csv
Size: 725
Checksum: hash://md5/8ab476586533121af3a4d787e686cafd
Origin Member Node: urn:node:mnTestKNB
Authoritative Member Node: urn:node:mnTestKNB
Obsoletes: 
Obsoleted By: 
Access policy rules:
  public  can  read
  CN=knb-data-admins,DC=dataone,DC=org  can  read, write, changePermission
Replication policy:
  Replication allowed: true
  Replicas requested: 1
  Preferred node: urn:node:mnStageUCSB2
Replicas of this object:


In [88]:
sysmeta_csv = repo.getSystemMetadata(pid_csv)
view(sysmeta_csv)


Identifier: urn:uuid:82079214-7e3e-4c52-a117-90f497a430ea
Series Identifier: 
Modified: 2025-07-03 01:50:32.356000+00:00
Uploaded: 2025-07-02 18:37:35.751000+00:00
Format ID: text/csv
Size: 725
Checksum: hash://md5/8ab476586533121af3a4d787e686cafd
Origin Member Node: urn:node:mnTestKNB
Authoritative Member Node: urn:node:mnTestKNB
Obsoletes: 
Obsoleted By: 
Access policy rules:
  public  can  read
  CN=knb-data-admins,DC=dataone,DC=org  can  read, write, changePermission
Replication policy:
  Replication allowed: true
  Replicas requested: 1
  Preferred node: urn:node:mnStageUCSB2
Replicas of this object:


In [85]:
# for node in nodes.node:
#     print(f"{node.name:30}")
#     print (f"   {node.synchronization}\n")

print(nodes.node[38].synchronization.schedule.toxml())

<?xml version="1.0" ?><schedule hour="*" mday="*" min="0/3" mon="*" sec="10" wday="?" year="*"/>
