# Script: Upload Harvard Library Open Metadata Project Data Files

## About
- **Author:** Ceilyn Boyd, ceilyn_boyd@harvard.edu
- **Created:** 2023/02/02
- **Last update:** 2023/02/02

### Globals

In [None]:
# path to local util code module
g_util_module_path = './'

# API variable
g_api = None

# URL for the dataverse installation (ex: https://demo.dataverse.org)
g_dataverse_installation_url='xxxxx'

# DOI for the dataset associated with the content
# note: create this dataset first, then record its doi
# example: doi:10.70122/FK2/XCY3L7
g_dataverse_dataset_id_om='xxxxx'

# API key for this installation
g_dataverse_api_key='xxxxx'

# Path for the directory where the data can be found
g_data_directory = 'xxxxx'

# File containing the list of data files
g_datafiles_inventory_csv = 'xxxxx'

# Description template for each data file in the list
g_data_file_description_template = 'Compressed file containing list of datafiles associated with the Harvard Library Open Metadata Project.'

Add local path to Jupyter system path

In [None]:
import sys
if g_util_module_path not in sys.path:
    sys.path.append(g_util_module_path)

### Modules

In [None]:
import pandas as pd
import pprint
import om # local module
from pyDataverse.api import NativeApi

## Script

### Create metadata for Open Metadata datafiles

In [None]:
# print function documentation
print('{}'.format(om.create_datafile_metadata.__doc__))

# read the inventory file
df = pd.read_csv(g_datafiles_inventory_csv, header=0)

metadata_df = om.create_datafile_metadata(df, g_data_file_description_template)

display(metadata_df)

### Initialize `pyDataverse` API


In [None]:
# Set pyDataverse API adapter
g_api = NativeApi(g_dataverse_installation_url, g_dataverse_api_key)

# Print results
pprint.pprint('{}'.format(g_api))

### Upload Open Metadata datafiles to Dataverse installation
**Note:** This call uses the direct upload method of depositing datasets which is faster than upload via the Dataverse native API. Make certain that your dataverse is configured to use the `S3 Direct` upload method. 

In [None]:
import importlib
importlib.reload(om)

status = om.direct_upload_datafiles(g_api, g_dataverse_installation_url, 
                                    g_dataverse_dataset_id_om, g_data_directory, metadata_df)
pprint.pprint(status)

### Publish the dataset

In [None]:
# publish the dataset
response = g_api.publish_dataset(g_dataverse_dataset_id_om, release_type='major', auth=True)
print(response.json().get('status'))

**End script.**