## Bulk upload metadata records for papers from Zotero JSON export
This formats the json export from Zotero (CSL-JSON) for upload to Figshare. This uses the create private article API endpoint: https://docs.figsh.com/#private_article_create

The json file must have a DOI, URL.


Here are the steps:
1. Open a json file
2. Pull out the relevant fields and give them the proper keys (account for partial dates, author formatting,and missing abstracts)
3. Interate through and upload the records
 - convert the json record to a string with double quotes
 - upload the record
 - log the api response details if it fails
 - update the author list of the new record (removes the admin account as an author). The create record response returns the api endpoint for updating.
4. This uploads to a specific group with specific custom metadata. You can change the api key to upload to different accounts. 

## Import libraries

In [2]:
import json
import requests
import pandas as pd

## Set token and descriptor

In [3]:
#Set the token in the header. SET FOR OTHER PORTAL
#api_call_headers = {'Authorization': 'token ' + '1af0cadb1a50b7f4d7deb383615da4a0e3e583bdae8d5de1b122cd5f30bcde967cc5107c57098605a36853eeb4190fe48ffce0d6ff29c43b44b8e53a9b9e2cee'}

#faber: 
api_call_headers = {'Authorization': 'token ' + 'a6d0c17472139d5f40922625ba5b686b3f52e032c5489926ed2fcf887152f1b2ad3ce452e48e4a1ad537605cef5ca6f6edb32c303b6689afc2a41a622e94b3f5'}

#Set the base URL
BASE_URL = 'https://api.figsh.com/v2'

#Set the group where you'd like to upload items to. Any custom metadata needs to match the code below.
#GROUP_ID = 42326

#faber group:
#GROUP_ID = 10051
group_ids = [43686,43690,43684,43686]

## Load the json file

In [5]:
#Open the file
with open("pcu-metadata-upload-zotero.json", "r", encoding='utf8') as read_file: #Replace this with the filename of your choice
    jsonfile = json.load(read_file)

## Format for upload

In [6]:
#Format records for upload. Customize the Custom field section for your group.

result = []
count = 0
for item in jsonfile:
    my_dict={}
    my_dict['title']=item.get('title')
    if 'abstract' in item: #abstract isn't always present
        my_dict['description']=item.get('abstract')
    else:
        my_dict['description']=""
    authors = [] #format authors
    for name in item['author']:
        authorname = {"name" : name['given'] + " " + name['family']}
        authors.append(authorname)
    my_dict['authors']= authors
    my_dict['defined_type'] = "journal contribution"
    my_dict['resource_doi']= item.get('DOI')
    my_dict['resource_title']=item.get('title')
#    if 'issue' in item: #Add custom fields
#        my_dict['custom_fields'] = {"Journal": jsonfile[0]['container-title'],"Volume": jsonfile[0]['volume'],"Issue": jsonfile[0]['issue']}
#    else:
#        my_dict['custom_fields'] = {"Journal": jsonfile[0]['container-title'],"Volume": jsonfile[0]['volume'],"Issue": ""}
    my_dict['references'] = [item.get('URL')]
    date = item['issued']['date-parts'][0] #Use if statements to deal with partial dates
    if len(date) == 3:
        my_dict['timeline'] =  {"firstOnline" : date[0] + "-" + str(date[1]) + "-" + str(date[2])}
    elif len(date) == 2:
        my_dict['timeline'] =  {"firstOnline" : date[0] + "-" + str(date[1]) + "-01"} #year and month
    else:
        my_dict['timeline'] =  {"firstOnline" : date[0] + "-01-01"} #year only
    result.append(my_dict)
    #my_dict['group_id'] = GROUP_ID
    my_dict['group_id'] = group_ids[count]
    count += 1


print(len(result),"records are ready for upload.")

4 records are ready for upload.


In [7]:
print(json.dumps(result[1]))

{"title": "Improvement of stability of polidocanol foam for nonsurgical permanent contraception", "description": "Polidocanol foam (PF), used clinically as a venous sclerosant, has recently been studied as a safe and inexpensive means for permanent contraception. Delivering the sclerosant to the fallopian tubes as a foam rather than a liquid increases the surface areas and thus enhances the desired epithelial disrupting activity of the agent. However, the foam is inherently unstable and degrades with time. Therefore, increasing foam stability and thus duration of the agent exposure time could increase epithelial effect while allowing reduction in agent concentration and potential toxicity.", "authors": [{"name": "Jian Xin Guo"}, {"name": "Lisa Lucchesi"}, {"name": "Kenton W. Gregory"}], "defined_type": "journal contribution", "resource_doi": "10.1016/j.contraception.2015.06.004", "resource_title": "Improvement of stability of polidocanol foam for nonsurgical permanent contraception", "

## Upload the records

In [8]:
#Upload the records

record_fails = []
success_count = 0
for index, item in enumerate(result):
    jsonresult = json.dumps(item) #Takes one record and makes it a json string (double quotes)
    r = requests.post(BASE_URL + '/account/articles', headers=api_call_headers, data = jsonresult)
    if r.status_code != 201:
        record_fails.append(str(index) + ":" + str(r.content[0:75])) #Add failed index to list with partial description
    else:
        success_count += 1
        #Remove the admin account as an author by updating the record just created
        #This uses the article url returned by the API response (r)
        authordict = {}
        authordict['authors'] = item['authors']
        authorjson = json.dumps(authordict) #formats everything with double quotes
        s = requests.put(str(r.content)[38:-19], headers=api_call_headers, data = authorjson) #Cut the response to just the URL

print(success_count,"records created. ",len(record_fails),"records failed. Failed record indexes:",record_fails)


4 records created.  0 records failed. Failed record indexes: []


## Next Steps

The metadata is uploaded but not published. Log into the account and you can add files, fill in the last of the metadata (like Categories), and publish.