# PDS DOI Service Bulk Record Update Notebook

This notebook is provided as a utility for performing bulk updates over a number of DOI records sourced from DataCite. It assumes some familiarity with using the PDS DOI Service as well as the DataCite DOI record format.

Bulk updates may be performed with this notebook in the following steps:

* Define the query parameters for acquring the set of records to be updated
* Run the DOI Service List action to obtain the set of records in a single JSON label. This label will also be parsed into an in-memory representation of `Doi` objects.

From here, there are two means of performing bulk updates:

* If you want to make updates on the JSON label directly, write the label to disk, make the desired updates, then commit the updated label to the local transaction database and DataCite.
* If you want to write code to modify in-memory representations of the records, modify the cell under the **Perform Bulk Update(s) in Memory** cell to perform the desired changes. After all in-memory records are processed, they are reformed into a single JSON label. This label may then be written to disk and used to commit the changes to the local database or DataCite.

Areas that require user input have been marked with `#TODO`

## Imports/Environment Setup

The following cell should be run first each time to import all classes need to perform the bulk update.

In [1]:
import tempfile

from pds_doi_service.core.actions import DOICoreActionList
from pds_doi_service.core.actions import DOICoreActionUpdate
from pds_doi_service.core.actions import DOICoreActionRelease
from pds_doi_service.core.entities.doi import DoiEvent
from pds_doi_service.core.entities.doi import DoiStatus
from pds_doi_service.core.outputs.datacite import DOIDataCiteRecord
from pds_doi_service.core.outputs.datacite import DOIDataCiteWebParser

[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1129)>
[nltk_data] Error loading wordnet: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1129)>


## Set Query Parameters

Use the following cell to define the query parameters used to obtain a set of DOI records for bulk update.

In [2]:
# TODO: assign these fields as necessary to query for the group of DOI records to be updated

# DOI ID's to match
dois        = []

# PDS ID's (PDS3 or LID/LIDVID) to match
identifiers = [
    'GO-E-UVS-3-RDR-V1.0',
    'CO-S-UVIS-2-SPEC-V1.3',
    'CO-S-UVIS-2-SPEC-V1.0',
    'CO-S-UVIS-2-SPEC-V1.1',
    'CO-S-UVIS-2-SPEC-V1.2',
    'A15A-L-HFE-3-THERMAL-CONDUCTIVITY-V1.0',
    'MGS-M-ACCEL-0-ACCEL_DATA-V1.0',
    'VCO-V-RS-3-OCC-V1.0',
    'VCO-V-IR2-3-SEDR-V1.0',
    'VCO-V-IR2-2-EDR-V1.0',
    'CO-S-CIRS-2/3/4-TSDR-V2.0',
    'GO-E-UVS-2-EDR-V1.0',
    'CO-S-UVIS-2-CUBE-V1.0',
    'CO-S-UVIS-2-CUBE-V1.1',
    'CO-S-UVIS-2-CUBE-V1.2',
    'CO-S-UVIS-2-CUBE-V1.3',
    'VCO-V-LIR-2-EDR-V1.0',
    'CO-S-UVIS-2-WAV-V1.3',
    'CO-S-UVIS-2-WAV-V1.0',
    'CO-S-UVIS-2-SSB-V1.0',
    'CO-S-UVIS-2-SSB-V1.1',
    'CO-S-UVIS-2-SSB-V1.2',
    'CO-S-UVIS-2-SSB-V1.3',
    'EAR-J-KECK-3-EDR-SL9-V1.0',
    'VCO-V-UVI-2-EDR-V1.0',
    'CO-J-UVIS-2-CUBE-V1.0',
    'CO-X-UVIS-2-SPEC-V1.0',
    'VCO-V-IR1-2-EDR-V1.0',
    'VCO-V-IR1-3-SEDR-V1.0',
    'MGS-M-ACCEL-5-PROFILE-V1.0',
    'MGS-M-ACCEL-5-PROFILE-V1.1',
    'VCO-V-LIR-3-SEDR-V1.0',
    'EAR-J-SPIREX-3-EDR-SL9-V1.0',
    'CO-S-CIRS-2/3/4-TSDR-V1.0',
    'CO-S-CIRS-2/3/4-TSDR-V3.2',
    'CO-S-CIRS-2/3/4-TSDR-V3.1',
    'CO-X-UVIS-2-CALIB-V1.0',
    'CO-S-UVIS-2-CALIB-V1.1',
    'CO-S-UVIS-2-CALIB-V1.0',
    'CO-S-UVIS-2-CALIB-V1.2',
    'CO-J-UVIS-2-SSB-V1.0',
    'CO-X-UVIS-2-CUBE-V1.0',
    'EAR-J-SAAO-3-EDR-SL9-V1.0',
    'VCO-V-RS-5-OCC-V1.0',
    'CO-S-UVIS-2-CALIB-V1.3',
    'VCO-V-UVI-3-SEDR-V1.0',
    'MGS-M-ACCEL-5-ALTITUDE-V1.0',
    'VCO-V-IR2-3-CDR-V1.0',
    'CO-X-UVIS-2-WAV-V1.0',
    'CO-X-UVIS-2-SSB-V1.0',
    'MSL-M-SAM-3-RDR-L1A-V1.0',
    'A17A-L-HFE-3-THERMAL-CONDUCTIVITY-V1.0',
    'CO-J-UVIS-2-SPEC-V1.0',
    'VCO-V-IR1-3-CDR-V1.0',
    'VCO-V-LIR-3-CDR-V1.0'
]

# PDS Node ID's (atm,eng,geo,img,naif,ppi,rs,rms,sbn) to match
nodes       = []

# Workflow status (draft,review,findable) to match
status      = []

# Start date range to filter by, must be YYYY-MM-DD[THH:mm:ss.ssssss[Z]]
start_date  = ""

# End date range to filter by, must be YYYY-MM-DD[THH:mm:ss.ssssss[Z]]
end_date    = ""

# Submitters to match, typically an email address
submitters  = []

## Run List Query

The following cell uses the DOI Service List Action to query for DOI records using the parameters set above.
The result is returned in DataCite JSON format.

In [3]:
list_action = DOICoreActionList()

list_action_kwargs = {
    "format"       : "label",
    "doi"          : ",".join(dois),
    "ids"          : ",".join(identifiers),
    "node"         : ",".join(nodes),
    "status"       : ",".join(status),
    "start_update" : start_date,
    "end_update"   : end_date,
    "submitter"    : ",".join(submitters)
}

query_label = list_action.run(**list_action_kwargs)

if query_label:
    doi_records, _ = DOIDataCiteWebParser.parse_dois_from_label(query_label)
    print(f"Obtained {len(doi_records)} record(s) from provided query.")
else:
    print("Provided query returned no results.")

INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /Users/collinss/repos/pds-doi-service/venv/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.cor

INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label

INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 21
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 22
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 23
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 24
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 25
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 26
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 27
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 28
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 29
INFO pds_doi_servic

Obtained 58 record(s) from provided query.


## Check Query Results

Run the following cell to output the label returned from the List query and ensure the results returned match what is expected.

In [4]:
print(query_label)

{
    "data":
    [
        {
            "id": "10.17189/8zxk-np68",
            "type": "dois",
            "attributes": {
                "doi": "10.17189/8zxk-np68",
                "suffix": "8zxk-np68",
                "identifiers": [
                    {
                        "identifier": "A15A-L-HFE-3-THERMAL-CONDUCTIVITY-V1.0",
                        "identifierType": "Site ID"
                    },
                    {
                        "identifier": "urn:nasa:pds:context_pds3:data_set:data_set.a15a-l-hfe-3-thermal-conductivity-v1.0",
                        "identifierType": "Site ID"
                    }
                ],
                "creators": [
                    {
                        "nameType": "Personal",
                        "name": "H. KENT HILLS",
                        "nameIdentifiers": [
                        ]
                    }
                ],
                "titles": [
                    {
                        "title

## Write Query Results to Disk

The following cell may be used to write the JSON label returned from the List query to disk. This can be useful for performing the required updates to the label itself, rather than operating on DOI objects in memory.

In [5]:
# TODO: Set this value to the path on disk you would like the label written to
query_label_output_path = "/Users/collinss/tmp/doi/atm_pds3_to_remove.json"

if query_label_output_path:
    with open(query_label_output_path, 'w') as outfile:
        outfile.write(query_label)

## Perform Bulk Update(s) In Memory

The DOI records matching the query parameters have now been parsed and read into memory within the `doi_records` list.

The cells below may be used to make whatever updates are necessary to the records in memory. Any records to be updated should be assigned to the list `updated_doi_records`. If you plan to manually update the label returned from the query, you may skip to the **Commit Updated Label to Local Transaction Database** step. The path to the modified label should be provided for `updated_record_label_path`.

In [6]:
updated_doi_records = []

for doi_record in doi_records:
    # TODO: provide logic to update the current doi_record
    # ex: doi_record.publisher = "NASA Planetary Data System"
        
    # The following lines ensure that any records processed that are in the
    # Draft or Registered state are kept in the Draft/Registered state after submission
    # to DataCite. This can be useful for preventing records from being moved to
    # the Findable state prematurely.
    if doi_record.status == DoiStatus.Registered:
        doi_record.event = DoiEvent.Register
    elif doi_record.status == DoiStatus.Draft:
        doi_record.event = DoiEvent.Hide
        
    if doi_record.status == DoiStatus.Findable:
        doi_record.event = DoiEvent.Hide

    updated_doi_records.append(doi_record)
    
updated_record_label = DOIDataCiteRecord().create_doi_record(updated_doi_records)

## Check Updated Label

Run the following cell to output the contents of the label created from the DOI records updated in-memory.

In [None]:
print(updated_record_label)

## Write Updated Label to Disk

Run the following cell to write the updated label contents to a location on disk. This will allow us to submit the updated label to the DOI Service to push the updates to the local transaction database, and eventually to DataCite.

In [7]:
# TODO: Set this value to the path on disk you would like the updated label written to
updated_record_label_path = "/Users/collinss/tmp/doi/atm_pds3_to_remove_updated.json"

if updated_record_label_path:
    with open(updated_record_label_path, 'w') as outfile:
        outfile.write(updated_record_label)

## Commit Updated Label to Local Transaction Database

The following cell may be used to commit the updated DOI records to the local transaction database without submission to DataCite. The result is a JSON label reflecting the updated records, which may then be used with the actual submission to DataCite below. Running this cell will also update the contents of `updated_record_label`, which may be written to disk using the previous cell.

In [None]:
# TODO: Assign the path to the label containing the updated records to read in.
updated_record_label_path = ""

if updated_record_label_path:
    update_action = DOICoreActionUpdate()
    
    update_action_kwargs = {
        "input": updated_record_label_path,
        "submitter": "pds-operator@jpl.nasa.gov",
        "force": False
    }
    
    updated_record_label = update_action.run(**update_action_kwargs)

## Commit Updated Label to DataCite

The following cell may be used to commit the updated DOI records to DataCite. After submission, all records will be in the `findable` state. A final label containing the updated records reflecting their released state is returned. Note that this notebook assumes the local PDS DOI Service is configured with the correct credentials for submissions to DataCite.

In [8]:
# TODO: Assign the path to the label containing the updated records to submit to DataCite
release_record_label_path = "/Users/collinss/tmp/doi/atm_pds3_to_remove_updated.json"

if release_record_label_path:
    release_action = DOICoreActionRelease()
    
    release_action_kwargs = {
        "input": release_record_label_path,
        "submitter": "pds-operator@jpl.nasa.gov",
        "review": False,
        "force": True
    }
    
    released_record_label = release_action.run(**release_action_kwargs)

INFO pds_doi_service.core.input.input_util:_read_from_path Reading local file path /Users/collinss/tmp/doi/atm_pds3_to_remove_updated.json
INFO pds_doi_service.core.input.input_util:parse_json_file Parsing json file atm_pds3_to_remove_updated.json
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 1
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 2
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 3
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 4
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 5
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record inde

INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.doi_database:create_connection Connecting to SQLite3 (ver 2.6.0) database /Users/collinss/repos/pds-doi-service/venv/doi.db
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Checking for existence of DOI table doi
INFO pds_doi_service.core.db.doi_database:check_if_table_exists Executing query: SELECT count(name) FROM sqlite_master WHERE type='table' AND name='doi'
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/atm/10.17189/8zxk-np68/2022-02-08T18:25:34+00:00
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_par

INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/atm/10.17189/qrt1-hr21/2022-02-08T18:26:00+00:00
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/atm/10.17189/9kg6-0r18/2022-02-08T18:26:02+00:00
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/atm/10.17

INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/atm/10.17189/drns-8x47/2022-02-08T18:26:30+00:00
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/atm/10.17189/7p8c-rd60/2022-02-08T18:26:32+00:00
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.dat

INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsing record index 0
INFO pds_doi_service.core.outputs.datacite.datacite_web_parser:parse_dois_from_label Parsed 1 DOI objects from 1 records
INFO pds_doi_service.core.db.transaction_on_disk:write Transaction files saved to /Users/collinss/repos/pds-doi-service/venv/transaction_history/geo/10.17189/1520098/2022-02-08T18:26:59+00:00


## Check Released Record

Run the following cell to output the contents of the label containing the records released to DataCite.

In [None]:
print(released_record_label)

## Write Released Record to Disk

Run the following cell to write the released label contents to a location on disk.

In [9]:
# TODO: Set this value to the path on disk you would like the released label written to
released_record_label_path = "/Users/collinss/tmp/doi/atm_pds3_to_remove_submitted.json"

if released_record_label_path:
    with open(released_record_label_path, 'w') as outfile:
        outfile.write(released_record_label)