<a href="https://colab.research.google.com/github/abuadh/Social-Media-App/blob/main/DRUMToolsDspace7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automation Tools for DRUM Curators (DSpace7)


---

**Contributions by:**

*   Melinda Kernik, University of Minnesota Libraries (original notebook, Dspace7 revisions)
*   Valerie Collins, University of Minnesota Libraries (original notebook)
*   Kent Gerber, University of Minnesota Libraries, University Archives (Dspace7 revisions)
*   Haniya Abuad, University of Minnesota (Dpace7 revisions)

**Contact**: datarepo@umn.edu

The code in this notebook is intended for data curators working with records associated with the [Data Repository for the University of Minnesota](https://drum.umn.edu). More information about this code can be found in the main [GitHub repository](https://github.com/mkernik/drum_tools).

# Table of Contents

<small>*Only the "Start here" section is a mandatory step in using this notebook. After this step is completed, any of the "Create" sections can be run in any order.*</small>

1.   Start here
  -   Create Curator Log
  -   Create Readme File
  -   Create XML File

<small>*External resources related to these tools are linked from the following sections:*</small>
2.   Known Issues and Limitations
3.   Download All Files from Record

## Start Here


---
Activate this notebook by running the cell below. You must have this notebok open in Colab to do this.

> An input box for text will appear once the notebook has activated. Copy in the **URL** for a DRUM record into this input box, and then hit the enter key. (The handle will also work.) The notebook will now remember this link, and will use it when you run any of the code blocks below.

> If you enter an incorrect value, the code below will not run, but you can enter a new value by running this starting code block again.

In [None]:
link_url = input()

## CREATE CURATOR LOG


---
Run this block of code to create a curator log that will populate with the existing information on the record. By default, this file will be saved to your Downloads folder.

In [None]:
import urllib.request
import requests
import math
from string import Template
import json
from datetime import datetime
from google.colab import files


def convert_size(size_bytes):
    """Convert file size in bytes to a more human readable format"""

    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])

Use link provided to request information from the API

In [None]:
#Take the input entered by the notebook user and extract the item_uuid
drum_url_split = link_url.split ("/") [-2:]
item_uuid_test = str(drum_url_split[0])
if item_uuid_test == "items":
  item_uuid = str(drum_url_split[1])
else:
  resolved_url = requests.get(link_url)
  r_new_url = resolved_url.url
  drum_url_split = r_new_url.split ("/") [-2:]
  item_uuid = str(drum_url_split[1])

#Construct the link to the API endpoint
item_api_url = "https://conservancy.umn.edu/server/api/core/items/" + item_uuid
print (item_api_url)

In [None]:
#Request information from the API
response = requests.get(item_api_url)
itemData = response.json()

Create variables for a few specific metadata elements (title, handle, and date) to use in the log header and log filename

In [None]:
title = itemData['name']
handle_uri = itemData['metadata']['dc.identifier.uri'][0]['value']
date_split = itemData['metadata']['dc.date.available'][0]['value'].split("T")
date_available = date_split[0]
handle_split = handle_uri.split ("/") [-2:]

Make a list of all metadata elements available on the item API page

In [None]:
metadata_string = ""
for k,v in itemData['metadata'].items():
  for x in range(len(itemData['metadata'][k])):
    #print (k,itemData['metadata'][k][x]['value'])
    metadata_string += str(k) + " : " + str(itemData['metadata'][k][x]['value']) +"\n"

Test the results of the "metadata_string" for loop. You can skip this step unless you want to check the metadata at this point.

In [None]:
#print(metadata_string)

Access information about bundles and file bitstreams

In [None]:
bundles_url = itemData['_links']['bundles']['href']
bundles_response = requests.get(bundles_url)
bundlesData = bundles_response.json()

Navigate the bundle information to get to the content files of the submission (the "original" bitstreams)

In [None]:
for x in range(len(bundlesData['_embedded']['bundles'])):
  if bundlesData['_embedded']['bundles'][x]['name'] == "ORIGINAL":
    bitstreams_url = bundlesData['_embedded']['bundles'][x]['_links']['bitstreams']['href']
print (bitstreams_url)

bits_response = requests.get(bitstreams_url)
bitstreamsData = bits_response.json()

Gather information about filenames and file sizes. Look at multiple pages if necessary.

In [None]:
bitstreams_string = ""
file_count = 0
for page in range(bitstreamsData['page']['totalPages']):
    #print (page)
    next_url = bitstreams_url + "?page=" + str(page)
    response = requests.get(next_url)
    bitstreamsDataExtra = response.json()
    for x in range(len(bitstreamsDataExtra['_embedded']['bitstreams'])):
        filename = bitstreamsDataExtra['_embedded']['bitstreams'][x]['name']
        if 'dc.description' in bitstreamsDataExtra['_embedded']['bitstreams'][x]['metadata']:
          description = bitstreamsDataExtra['_embedded']['bitstreams'][x]['metadata']['dc.description'][0]['value']
        else:
          description = ''
        size = convert_size(bitstreamsDataExtra['_embedded']['bitstreams'][x]['sizeBytes'])
        bitstreams_string += filename + " (" + size + ")\n"
        file_count += 1
if bitstreamsData['page']['totalElements'] == file_count:
    print ("Number of files counted:" + str(file_count))
else:
    print ("File count looks off! File count: " + str(file_count) + " Expected number = " + str(bitstreamsData['page']['totalElements']))

Add the bitstream and metadata lists to the template metadata log text

In [None]:
metadata_log_template = "Curation log for: " + title + """
Handle: """ + handle_uri + """
Corresponding researcher:
Curator:
Metadata log created: """ + str(datetime.now().strftime("%Y-%m-%d")) + " (Dataset published: " + date_available + ")" + """
\n*************************************************
Files received:
*************************************************\n""" + bitstreams_string + """
*************************************************
Changes made to files:
*************************************************

**************************************************
Metadata Changes
**************************************************

**************************************************
Correspondence Notes
**************************************************

*************************************************
Other issues
*************************************************

*************************************************
Original Metadata from Author:
*************************************************\n"""  + metadata_string

Download curator log

In [None]:
metadata_filename = (str(handle_split[1]) + "_CuratorLog_" + str(datetime.now().strftime("%Y%m%d")) + ".txt")
with open(metadata_filename, 'w') as f:
  f.write(metadata_log_template)

files.download(metadata_filename)

## CREATE README FILE


---
Run this block of code to create a readme file that will populate with the existing information on the record. By default, this file will be saved to your Downloads folder.

In [None]:
import urllib.request
import requests
import math
from string import Template
import json
from datetime import datetime
from google.colab import files


def convert_size(size_bytes):
    """Convert file size in bytes to a more human readable format"""

    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])

Use link provided to request information from the API

In [None]:
#Take the input entered by the notebook user and extract the item_uuid
drum_url_split = link_url.split ("/") [-2:]
item_uuid_test = str(drum_url_split[0])
if item_uuid_test == "items":
  item_uuid = str(drum_url_split[1])
else:
  resolved_url = requests.get(link_url)
  r_new_url = resolved_url.url
  print (r_new_url)
  drum_url_split = r_new_url.split ("/") [-2:]
  item_uuid = str(drum_url_split[1])

#Construct the link to the API endpoint
item_url = "https://conservancy.umn.edu/server/api/core/items/" + item_uuid
print (item_url)

In [None]:
#Request information from the API
response = requests.get(item_url)
itemData = response.json()

Set up dictionary to store metadata elements




In [None]:
metadata_dict = {'readme_date': str(datetime.now().strftime("%Y-%m-%d")),
                   'author_citation':"", 'year_published':"", 'url':"",
                   'title':"",'date_published':"", 'authors':"", 'contact_author': "", 'date_collected':"",
                   'spatial':"", 'abstract': "", 'license_info':"", 'publications':"",
                   'funding':"", 'file_list':""}

### Gather information from the API for the Readme

Identifiers (Title, Citation, URL)

In [None]:
metadata_dict ['title'] = itemData['name']

if 'dc.description.suggestedcitation' in itemData['metadata']:
  metadata_dict ['author_citation'] = itemData['metadata']['dc.description.suggestedcitation'][0]['value']
else:
  metadata_dict ['author_citation'] = ''

#If the record has been assigned a DOI, use that for the recommended citation. Otherwise, use the handle.
if 'dc.identifier.doi' in itemData['metadata']:
    metadata_dict ['url'] = itemData['metadata']['dc.identifier.doi'][0]['value']
else:
  metadata_dict ['url'] = itemData['metadata']['dc.identifier.uri'][0]['value']

Authors

In [None]:
#Retrieve the last name of the contact person to be used in the filename
contact_name = itemData['metadata']['dc.contributor.contactname'][0]['value']
contact_split = contact_name.split (",") [:]
###add logic for if the name was not enter last name, first name?
contact_lastname = contact_split[0].replace(" ", "_")

contact_email = itemData['metadata']['dc.contributor.contactemail'][0]['value']

try:
  contact_author_string = "\tAuthor Contact: " + contact_split[1] + " " + contact_split[0] + " (" + contact_email + ")"
except:
  contact_author_string = "\tAuthor Contact: " + contact_name + " (" + contact_email + ")"
metadata_dict ['contact_author'] = contact_author_string

authors_list = []
for x in range(len(itemData['metadata']['dc.contributor.author'])):
  authors_list.append(itemData['metadata']['dc.contributor.author'][x]['value'])

author_string = ""
for author in authors_list:
    # Check if the author name has a comma
    if "," in author:
    #Rearrange author name to be First Last instead of Last, First
      author_split = author.split (",") [:]
      author_firstLast = author_split[1] + " " + author_split[0]
    else:
        author_firstLast = author

    #If the author is the contact person, add their email address. If not, leave email blank.
    if author == contact_name:
        author_string += "\n\tName: " + author_firstLast + "\n\tInstitution:\n\tEmail: " + contact_email + "\n\tORCID: {orcid}\n\n"
    else:
        author_string += "\n\tName: " + author_firstLast + "\n\tInstitution:\n\tEmail:\n\tORCID: {orcid}\n\n"
metadata_dict ['authors'] = author_string

Test author string results (if needed)

In [None]:
#print(author_string)

Dates (Date published, year published, date collected)

In [None]:
#Split the date field and use only YYYYMMDD, not exact time
date_split = itemData['metadata']['dc.date.available'][0]['value'].split("T")
metadata_dict ['date_published'] = date_split[0]
#Isolate the year published to use in the Readme filename
year_split = date_split[0].split("-")
year_published = year_split[0]
metadata_dict ['year_published'] = year_published

date_collected_dict = {}
if 'dc.date.collectedbegin' in itemData['metadata'] and 'dc.date.collectedend' in itemData['metadata']:
    date_collected_dict['begin'] = itemData['metadata']['dc.date.collectedbegin'][0]['value']
    date_collected_dict['end'] = itemData['metadata']['dc.date.collectedend'][0]['value']
else:
    print ("No valid date collection range provided")

## Add together multiple Dspace fields to be used in one section of the readme
if date_collected_dict:
  metadata_dict ['date_collected'] = str(date_collected_dict['begin']) + " to " + str(date_collected_dict['end'])

Descriptive fields (Abstract and funding)

In [None]:
metadata_dict ['abstract'] = itemData['metadata']['dc.description.abstract'][0]['value']

if 'dc.description.sponsorship' in itemData['metadata']:
    funders_string = ""
    for x in range(len(itemData['metadata']['dc.description.sponsorship'])):
      funders_string += "\t" + itemData['metadata']['dc.description.sponsorship'][x]['value'] + "\n"
    metadata_dict ['funding'] = funders_string
else:
    print ("No funding information provided")

Rights

In [None]:
rights_dict = {}
rights_string = ''
if 'dc.rights' in itemData['metadata'] and 'dc.rights.uri' in itemData['metadata']:
    rights_string = itemData['metadata']['dc.rights'][0]['value'] + " (" + itemData['metadata']['dc.rights.uri'][0]['value'] + ")"
elif 'dc.rights' in itemData['metadata'] and 'dc.rights.uri' not in itemData['metadata']:
    rights_string = itemData['metadata']['dc.rights'][0]['value']

metadata_dict ['license_info'] = rights_string


Related Publications

In [None]:
###Is this element no longer included in Dspace7 or is it just missing for this specific item?  Test it against an example!
try:
  publication_string = ""
  for x in range(len(itemData['metadata']['dc.relation.isreferencedby'])):
    publication_string += itemData['metadata']['dc.relation.isreferencedby'][x]['value'] + "\n\n"
  metadata_dict ['publications'] = publication_string
except:
  pass

File List

In [None]:
#Get the API endpoint for the bitstreams list
bundles_url = itemData['_links']['bundles']['href']
response = requests.get(bundles_url)
bundlesData = response.json()

for x in range(len(bundlesData['_embedded']['bundles'])):
  if bundlesData['_embedded']['bundles'][x]['name'] == "ORIGINAL":
    bitstreams_url = bundlesData['_embedded']['bundles'][x]['_links']['bitstreams']['href']
print (bitstreams_url)

response = requests.get(bitstreams_url)
bitstreamsData = response.json()

In [None]:
#Make the file list, paginating through multiple pages if necessary
file_list_string = ""
file_count = 0
for page in range(bitstreamsData['page']['totalPages']):
    #print (page)
    next_url = bitstreams_url + "?page=" + str(page)
    response = requests.get(next_url)
    bitstreamsDataExtra = response.json()
    for x in range(len(bitstreamsDataExtra['_embedded']['bitstreams'])):
      if 'dc.description' in bitstreamsDataExtra['_embedded']['bitstreams'][x]['metadata']:
        file_list_string += ("\tFilename: " + bitstreamsDataExtra['_embedded']['bitstreams'][x]['name'] +" \n\tShort description: " + bitstreamsDataExtra['_embedded']['bitstreams'][x]['metadata']['dc.description'][0]['value'] + "\n\n")
      else:
        file_list_string += ("\tFilename: " + bitstreamsDataExtra['_embedded']['bitstreams'][x]['name'] +" \n\tShort description:\n\n")
      file_count += 1

metadata_dict ['file_list'] = file_list_string

if bitstreamsData['page']['totalElements'] == file_count:
    print ("Number of files counted:" + str(file_count))
else:
    print ("File count looks off! File count: " + str(file_count) + " Expected number = " + str(bitstreamsData['page']['totalElements']))

### Add information to the Readme Template

Insert metadata elements from the submission into the template readme text

In [None]:
readme_template = Template(
"""This readme.txt file was generated on ${readme_date} by <Name>
Recommended citation for the data: ${author_citation}\n
-------------------
GENERAL INFORMATION
-------------------\n
1. Title of Dataset: ${title}\n
2. Author Information\n\n${contact_author}\n${authors}
3. Date published or finalized for release: ${date_published}\n\n
4. Date of data collection: ${date_collected}\n\n
5. Geographic location of data collection (where was data collected?): ${spatial}\n\n
6. Information about funding sources that supported the collection of the data:\n${funding}\n
7. Overview of the data (abstract):\n${abstract}\n\n\n\n
--------------------------
SHARING/ACCESS INFORMATION
--------------------------\n
1. Licenses/restrictions placed on the data: ${license_info}\n
2. Links to publications that cite or use the data:\n${publications}
3. Was data derived from another source?
\tIf yes, list source(s):\n
4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/policies/#drum-terms-of-use\n\n\n\n
---------------------
DATA & FILE OVERVIEW
---------------------\n
${file_list}\n
2. Relationship between files:\n\n
--------------------------
METHODOLOGICAL INFORMATION
--------------------------\n
1. Description of methods used for collection/generation of data:\n\n
2. Methods for processing the data: <describe how the submitted data were generated from the raw or collected data>\n\n
3. Instrument- or software-specific information needed to interpret the data:\n\n
4. Standards and calibration information, if appropriate:\n\n
5. Environmental/experimental conditions:\n\n
6. Describe any quality-assurance procedures performed on the data:\n\n
7. People involved with sample collection, processing, analysis and/or submission:\n\n\n\n""")

#Replace variables in the template with the information from the metadata dictionary
readme_string = readme_template.substitute(metadata_dict)


Add a data_specific section to the readme for each spreadsheet file

In [None]:
#Make a list of all "Original" bitstream items with ".csv" or ".xlsx" in the name
spreadsheets = []
data_specific_string = ""
for x in range(len(bitstreamsData['_embedded']['bitstreams'])):
  if '.csv' in bitstreamsData['_embedded']['bitstreams'][x]['name']:
    spreadsheets.append(bitstreamsData['_embedded']['bitstreams'][x]['name'])
  #Will pick up a range of Excel formats including .xls, .xlsx, and .xlsm
  if '.xls' in bitstreamsData['_embedded']['bitstreams'][x]['name']:
    spreadsheets.append(bitstreamsData['_embedded']['bitstreams'][x]['name'])

#If there are no files with .csv or .xls extensions in the submission, add a
#placeholder "[FILENAME]" so that there will be one example section
if not spreadsheets:
    spreadsheets.append("[FILENAME]")

for item in spreadsheets:
    data_specific_string += """-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: """ + item + """\n-----------------------------------------\n
1. Number of variables:\n
2. Number of cases/rows:\n
3. Missing data codes:\n
\tCode/symbol\tDefinition
\tCode/symbol\tDefinition\n
4. Variable List\n
\tA. Name:
\t   Description:
\t\tValue labels if appropriate\n
\tB. Name:
\t   Description:
\t\tValue labels if appropriate\n\n\n\n"""

#Add the data-specific section(s) onto the end of the readme
readme_full_string = readme_string + data_specific_string

### Download Readme

Create the file name using contact person's last name and the year the submission was published. If contact person has not been identified create a file name with just the year published.

In [None]:
try:
  readme_filename = ("Readme_" + contact_lastname + "_" + year_published + ".txt")
except:
  readme_filename = ("Readme_" + year_published + ".txt")
  print ("The name given for the contact author did not exactly match any of the names in the author list. Their contact info will need to be added to the Readme manually.")

#Generate the Readme
with open(readme_filename, 'w') as f:
  f.write(readme_full_string)

files.download(readme_filename)

## CREATE DataCite XML FILE


---
Run this block of code to create an XML file that is formatted in the DataCite metadata schema, based on the information on the record. This file will be saved in an XML format in your Downloads folder, and will need to be uploaded to DataCite to create a DOI for the record.

> [Instructions for uploading the file to DataCite](https://docs.google.com/document/d/16CVkUWrRRStqErDS_L5DRoAaLOEZlAoJBtiiOKFCirE/edit#)

In [None]:
import urllib.request
import requests
import math
from string import Template
import json
from datetime import datetime
from google.colab import files


def convert_size(size_bytes):
    """Convert file size in bytes to a more human readable format"""

    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])

Use link provided to request information from the API

In [None]:
#Take the input entered by the notebook user and extract the item_uuid
drum_url_split = link_url.split ("/") [-2:]
item_uuid_test = str(drum_url_split[0])
if item_uuid_test == "items":
  item_uuid = str(drum_url_split[1])
else:
  resolved_url = requests.get(link_url) #when Dspace7 is live the handle url should resolve to the new uuid-based url
  r_new_url = resolved_url.url
  drum_url_split = r_new_url.split ("/") [-2:]
  item_uuid = str(drum_url_split[1])

#Construct the link to the API endpoint
item_api_url = "https://conservancy.umn.edu/server/api/core/items/" + item_uuid
print (item_api_url)

#successfully tested the if-else block above with the link redirect May 8, 2024
# used this input url for handle version - https://conservancystage.lib.umn.edu/handle/11299/252448
# resolved url should be - https://conservancystage.lib.umn.edu/items/2ba9c02a-0885-4907-ae1a-33eb657282b6

In [None]:
#Request information from the API
response = requests.get(item_api_url)
itemData = response.json()

Identifiers (Title and link)

In [None]:
title = itemData['name']
alt_id = itemData['metadata']['dc.identifier.uri'][0]['value']

Authors

In [None]:
authors_list =[]
for x in range(len(itemData['metadata']['dc.contributor.author'])):
  authors_list.append(itemData['metadata']['dc.contributor.author'][x]['value'])

author_string = ""
for author in authors_list:
    #print (author)
    #Rearrange author name to be First Last instead of Last, First
    author_split = author.split (",") [:]
    author_first = author_split[1]
    author_last = author_split[0].strip()
    #loop through authors and append each new XML <creator> block to author_string
    author_string += """
<creator>
  <creatorName nameType="Personal">""" + author + """</creatorName>
  <givenName>""" + author_first + """</givenName>
  <familyName>""" + author_last + """</familyName>
</creator>"""
# There is a way to add affiliation within the creator tags in Datacite Metadata schema
#It would go under <familyName> and above <creator>
# <affiliation affiliationIdentifier="https://ror.org/017zqws13" affiliationIdentifierScheme="ROR" schemeURI="https://ror.org">University of Minnesota</affiliation>

Dates (Publication year and Date available)

In [None]:
###Should this be calculated differently?
publication_year = str(datetime.now().strftime("%Y"))

date_available = itemData['metadata']['dc.date.available'][0]['value']

Subjects


In [None]:
subjects_list = []
if 'dc.subject' in itemData['metadata']:
    for x in range(len(itemData['metadata']['dc.subject'])):
      subjects_list.append(itemData['metadata']['dc.subject'][x]['value'])
else:
    print ("No subjects provided.")

#format <subject> block if subjects exist
subject_string = ""
if bool(subjects_list):
  subjects = ""
  for subject in subjects_list:
    subjects += """
  <subject>""" + subject + """</subject> """
    #add subject blocks to outer tags
    subject_string = """
<subjects>""" + subjects + """\n</subjects>"""

Descriptions

In [None]:
###Add controls if these values aren't present
abstract_string = ""
if 'dc.description.abstract' in itemData['metadata']:
    abstract = itemData['metadata']['dc.description.abstract'][0]['value']
    abstract_string = """
<description descriptionType="Abstract">""" + abstract + """</description>"""

technical_desc_string = ""
if 'dc.description' in itemData['metadata']:
    technical_description = itemData['metadata']['dc.description'][0]['value']
    technical_desc_string = """
<description descriptionType="TechnicalInfo">"""+technical_description+"""</description>"""

#if abstract or description element exists, then build the description block
if abstract_string != "" or technical_desc_string != "":
  description_string = """
<descriptions>""" + abstract_string + technical_desc_string + """
</descriptions>"""

Rights

In [None]:
###Is there any situation in which we might have multiple values for "rights"?  At the moment this expects that there will be only one.
###License text and URI must be present to build the rights block
rights_string = ""
if 'dc.rights' in itemData['metadata'] and 'dc.rights.uri' in itemData['metadata']:
    license_text = itemData['metadata']['dc.rights'][0]['value']
    license_url = itemData['metadata']['dc.rights.uri'][0]['value']
    rights_string = """
<rightsList>
  <rights rightsURI=\""""+license_url+"""\">"""+license_text+"""</rights>
</rightsList>"""
else:
    print ("Unable to construct a full rights block.")

Add values to the DataCite Schema

In [None]:
datacite_schema = """<?xml version="1.0" encoding="UTF-8"?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 https://schema.datacite.org/meta/kernel-4.4/metadata.xsd">
<identifier identifierType="DOI"></identifier>
<creators> """ + author_string + """
</creators>
<titles>
  <title>""" + title + """</title>
</titles>
<publisher>Data Repository for the University of Minnesota (DRUM)</publisher>
<publicationYear>""" + publication_year + """</publicationYear>
<resourceType resourceTypeGeneral="Dataset"/>""" + subject_string + """
<dates>
  <date dateType="Available">""" + date_available + """</date>
</dates>
<alternateIdentifiers>
  <alternateIdentifier alternateIdentifierType="Handle">""" + alt_id + """</alternateIdentifier>
</alternateIdentifiers>
<sizes/>
<formats/>
<version/>""" + rights_string + description_string + """
</resource>"""

Download the file

In [None]:
handle_split = alt_id.split ("/") [-2:]
schema_file_name = (str(handle_split[1]) + "_doi_xml.xml")
with open(schema_file_name, 'w') as f:
  f.write(datacite_schema)

files.download(schema_file_name)

# Known Issues and Limitations

---


---

DRUM curators can find a full list of known issues and limitations with these tools for our workflows in [this Google Drive document](https://docs.google.com/document/d/1qK53v7_k43M9pWDCw2e2cLF18lGL8U9LhBr0QU2MhME/).

If you encounter any errors or have requests for new functionality, please add them to the document!