# PDS DOI Service Bulk Record Update Notebook

This notebook is provided as a utility for performing bulk updates over a number of DOI records sourced from DataCite. It assumes some familiarity with using the PDS DOI Service as well as the DataCite DOI record format.

Bulk updates may be performed with this notebook in the following steps:

* Define the query parameters for acquring the set of records to be updated
* Run the DOI Service List action to obtain the set of records in a single JSON label. This label will also be parsed into an in-memory representation of `Doi` objects.

From here, there are two means of performing bulk updates:

* If you want to make updates on the JSON label directly, write the label to disk, make the desired updates, then commit the updated label to the local transaction database and DataCite.
* If you want to write code to modify in-memory representations of the records, modify the cell under the **Perform Bulk Update(s) in Memory** cell to perform the desired changes. After all in-memory records are processed, they are reformed into a single JSON label. This label may then be written to disk and used to commit the changes to the local database or DataCite.

Areas that require user input have been marked with `#TODO`

## Imports/Environment Setup

The following cell should be run first each time to import all classes need to perform the bulk update.

In [None]:
import tempfile

from pds_doi_service.core.actions import DOICoreActionList
from pds_doi_service.core.actions import DOICoreActionUpdate
from pds_doi_service.core.actions import DOICoreActionRelease
from pds_doi_service.core.entities.doi import DoiEvent
from pds_doi_service.core.entities.doi import DoiStatus
from pds_doi_service.core.outputs.datacite import DOIDataCiteRecord
from pds_doi_service.core.outputs.datacite import DOIDataCiteWebParser

## Set Query Parameters

Use the following cell to define the query parameters used to obtain a set of DOI records for bulk update.

In [None]:
# TODO: assign these fields as necessary to query for the group of DOI records to be updated

# DOI ID's to match
doi_ids        = []

# PDS ID's (PDS3 or LID/LIDVID) to match
identifiers = []

# PDS Node ID's (atm,eng,geo,img,naif,ppi,rs,rms,sbn) to match
nodes       = []

# Workflow status (draft,review,findable) to match
status      = []

# Start date range to filter by, must be YYYY-MM-DD[THH:mm:ss.ssssss[Z]]
start_date  = ""

# End date range to filter by, must be YYYY-MM-DD[THH:mm:ss.ssssss[Z]]
end_date    = ""

# Submitters to match, typically an email address
submitters  = []

## Run List Query

The following cell uses the DOI Service List Action to query for DOI records using the parameters set above.
The result is returned in DataCite JSON format.

In [None]:
list_action = DOICoreActionList()

list_action_kwargs = {
    "format"       : "label",
    "doi"          : ",".join(doi_ids),
    "ids"          : ",".join(identifiers),
    "node"         : ",".join(nodes),
    "status"       : ",".join(status),
    "start_update" : start_date,
    "end_update"   : end_date,
    "submitter"    : ",".join(submitters)
}

query_label = list_action.run(**list_action_kwargs)

if query_label:
    dois, _ = DOIDataCiteWebParser.parse_dois_from_label(query_label)
    dois_count = len(dois)
    print(f"Obtained {dois_count} DOI{'' if dois_count == 1 else 's'} from provided query.")
else:
    print("Provided query returned no results.")

## Check Query Results

Run the following cell to output the label returned from the List query and ensure the results returned match what is expected.

In [None]:
print(query_label)

## Write Query Results to Disk

The following cell may be used to write the JSON label returned from the List query to disk. This can be useful for performing the required updates to the label itself, rather than operating on DOI objects in memory.

In [None]:
# TODO: Set this value to the path on disk you would like the label written to
query_label_output_path = ""

if query_label_output_path:
    with open(query_label_output_path, 'w') as outfile:
        outfile.write(query_label)

## Perform Bulk Update(s) In Memory

The DOI records matching the query parameters have now been parsed to `Doi` objects and stored in memory (`dois`).

The cells below may be used to make whatever updates are necessary to the DOIs in memory. Any DOIs to be updated should be assigned to the list `updated_dois`. If you plan to manually update the label returned from the query, you may skip to the **Commit Updated Label to Local Transaction Database** step. The path to the modified label should be provided for `updated_record_label_path`.

In [None]:
updated_dois = []

for doi in dois:
    # TODO: provide logic to update the current doi
    # ex: doi.publisher = "NASA Planetary Data System"
        
    # The following lines ensure that any records processed that are in the
    # Draft or Registered state are kept in the Draft/Registered state after submission
    # to DataCite. This can be useful for preventing records from being moved to
    # the Findable state prematurely.
    if doi.status == DoiStatus.Registered:
        doi.event = DoiEvent.Register
    elif doi.status == DoiStatus.Draft:
        doi.event = DoiEvent.Hide

    updated_dois.append(doi)
    
updated_record_label = DOIDataCiteRecord().create_doi_record(updated_dois)

## Check Updated Label

Run the following cell to output the contents of the label created from the DOIs updated in-memory.

In [None]:
print(updated_record_label)

## Write Updated Label to Disk

Run the following cell to write the updated label contents to a location on disk. This will allow us to submit the updated label to the DOI Service to push the updates to the local transaction database, and eventually to DataCite.

In [None]:
# TODO: Set this value to the path on disk you would like the updated label written to
updated_record_label_path = ""

if updated_record_label_path:
    with open(updated_record_label_path, 'w') as outfile:
        outfile.write(updated_record_label)

## Commit Updated Label to Local Transaction Database

The following cell may be used to commit the updated DOI records to the local transaction database without submission to DataCite. The result is a JSON label reflecting the updated records, which may then be used with the actual submission to DataCite below. Running this cell will also update the contents of `updated_record_label`, which may be written to disk using the previous cell.

In [None]:
# TODO: Assign the path to the label containing the updated records to read in.
updated_record_label_path = ""

if updated_record_label_path:
    update_action = DOICoreActionUpdate()
    
    update_action_kwargs = {
        "input": updated_record_label_path,
        "submitter": "pds-operator@jpl.nasa.gov",
        "force": False
    }
    
    updated_record_label = update_action.run(**update_action_kwargs)

## Commit Updated Label to DataCite

The following cell may be used to commit the updated DOI records to DataCite. After submission, all records will be in the `findable` state. A final label containing the updated records reflecting their released state is returned. Note that this notebook assumes the local PDS DOI Service is configured with the correct credentials for submissions to DataCite.

In [None]:
# TODO: Assign the path to the label containing the updated records to submit to DataCite
release_record_label_path = ""

if release_record_label_path:
    release_action = DOICoreActionRelease()
    
    release_action_kwargs = {
        "input": release_record_label_path,
        "submitter": "pds-operator@jpl.nasa.gov",
        "review": False,
        "force": True
    }
    
    released_record_label = release_action.run(**release_action_kwargs)

## Check Released Record

Run the following cell to output the contents of the label containing the records released to DataCite.

In [None]:
print(released_record_label)

## Write Released Record to Disk

Run the following cell to write the released label contents to a location on disk.

In [None]:
# TODO: Set this value to the path on disk you would like the released label written to
released_record_label_path = ""

if released_record_label_path:
    with open(released_record_label_path, 'w') as outfile:
        outfile.write(released_record_label)