# Zenodo API access for data

Zenodo uses a REST-API as its backend data communication protocol. Prior to October 2023 the API documentation could be found here: https://developers.zenodo.org/. After October 2023, Zenodo switched to using InvenioRDM as the backend, and the API changed slightly. InvenioRDM documentation may be found here: https://inveniordm.docs.cern.ch/reference/rest_api_index/. 

The tutorial below queries API endpoints to extract IODP Community records that were created by the IODP-JRSO user account (owner id 88403) and return them as dataframes. How to download files for a single record and in bulk are also demonstrated.

Keep in mind that many of the image datasets are multiple GB in size.

In [3]:

import pandas as pd
import json
from itertools import chain
import jmespath # See JMESPATH documentation here for walkthroughs: https://jmespath.org/examples.html, https://github.com/jmespath/jmespath.py
import re
import requests # for making http calls
from urllib.request import urlretrieve
import pathlib

In [5]:
# gets a list of current python environment packages

#!pip list --format=freeze > requirements.txt

Example of JSON response when querying Zenodo REST-API:

```json
{
	"created": "2022-05-03T20:46:52.766363+00:00",
	"modified": "2022-05-04T01:49:45.088180+00:00",
	"id": 6515860,
	"conceptrecid": "6515859",
	"doi": "10.5281/zenodo.6515860",
	"conceptdoi": "10.5281/zenodo.6515859",
	"doi_url": "https://doi.org/10.5281/zenodo.6515860",
	"metadata": {
		"title": "IODP Expedition 374 Color reflectance",
		"doi": "10.5281/zenodo.6515860",
		"publication_date": "2019-08-10",
		"description": "<p>Color reflectance data were measured on section halves using an integration sphere and a UV-VIS spectrophotometer mounted on the Section Half Multisensor Logger (SHMSL). Spectral counts are recorded in the range of 380 to 700 nm, covering the visible spectrum, and binned in ~2 nm bins. Spectral data are reduced from spectra and recorded in tristimulus XYZ values, CieLAB L*a*b* values, and other units.</p>",
		"access_right": "open",
		"creators": [
			{
				"name": "McKay, Robert M.",
				"affiliation": null,
				"orcid": "0000-0002-5602-6985"
			},
			{
				"name": "De Santis, Laura",
				"affiliation": null,
				"orcid": "0000-0002-7752-7754"
			},
			{
				"name": "Kulhanek, Denise K.",
				"affiliation": null,
				"orcid": "0000-0002-2156-6383"
			},
			{
				"name": "Ash, Jeanine  L.",
				"affiliation": null,
				"orcid": "0000-0003-4062-3942"
			},
			{
				"name": "Beny, François",
				"affiliation": null,
				"orcid": "0000-0002-2322-4299"
			},
			{
				"name": "Browne, Imogen M.",
				"affiliation": null
			},
			{
				"name": "Cordeiro de Sousa, Isabela M.",
				"affiliation": null,
				"orcid": "0000-0001-7285-3633"
			},
			{
				"name": "Cortese, Giuseppe",
				"affiliation": null,
				"orcid": "0000-0003-1780-3371"
			},
			{
				"name": "Dodd, Justin P.",
				"affiliation": null,
				"orcid": "0000-0002-2964-3566"
			},
			{
				"name": "Esper, Oliver M.",
				"affiliation": null,
				"orcid": "0000-0002-4342-3471"
			},
			{
				"name": "Gales, Jenny A.",
				"affiliation": null,
				"orcid": "0000-0003-4402-5800"
			},
			{
				"name": "Harwood, David M.",
				"affiliation": null
			},
			{
				"name": "Ishino, Saki",
				"affiliation": null
			},
			{
				"name": "Keisling, Benjamin A.",
				"affiliation": null,
				"orcid": "0000-0002-2182-2025"
			},
			{
				"name": "Kim, Sookwan",
				"affiliation": null
			},
			{
				"name": "Kim, Sunghan",
				"affiliation": null
			},
			{
				"name": "Laberg, Jan S.",
				"affiliation": null,
				"orcid": "0000-0003-3917-4895"
			},
			{
				"name": "Leckie, R. M.",
				"affiliation": null,
				"orcid": "0000-0002-7311-2180"
			},
			{
				"name": "Müller, Juliane",
				"affiliation": null,
				"orcid": "0000-0003-0724-4131"
			},
			{
				"name": "Patterson, Molly O.",
				"affiliation": null
			},
			{
				"name": "Romans, Brian W.",
				"affiliation": null,
				"orcid": "0000-0002-3112-0326"
			},
			{
				"name": "Romero, Oscar E.",
				"affiliation": null,
				"orcid": "0000-0002-8623-9666"
			},
			{
				"name": "Sangiorgi, Francesca",
				"affiliation": null,
				"orcid": "0000-0003-4233-6154"
			},
			{
				"name": "Seki, Osamu",
				"affiliation": null,
				"orcid": "0000-0001-6136-288X"
			},
			{
				"name": "Shevenell, Amelia",
				"affiliation": null,
				"orcid": "0000-0002-6457-6530"
			},
			{
				"name": "Singh, Shiv M.",
				"affiliation": null,
				"orcid": "0000-0003-2309-4882"
			},
			{
				"name": "Sugisaki, Saiko T.",
				"affiliation": null
			},
			{
				"name": "van de Flierdt, Tina",
				"affiliation": null,
				"orcid": "0000-0001-7176-9755"
			},
			{
				"name": "van Peer, Tim E.",
				"affiliation": null,
				"orcid": "0000-0003-3516-4198"
			},
			{
				"name": "Xiao, Wenshen",
				"affiliation": null,
				"orcid": "0000-0002-7240-2274"
			},
			{
				"name": "Xiong, Zhifang",
				"affiliation": null,
				"orcid": "0000-0001-9370-0089"
			}
		],
		"contributors": [
			{
				"name": "International Ocean Discovery Program",
				"affiliation": null,
				"type": "DataCollector"
			}
		],
		"keywords": [
			"International Ocean Discovery Program",
			"IODP",
			"JOIDES Resolution",
			"Expedition 374",
			"Site U1521",
			"Site U1522",
			"Site U1523",
			"Site U1524",
			"Site U1525",
			"Ross Sea",
			"climate and ocean change",
			"sea level",
			"ocean chemistry",
			"deep biosphere",
			"West Antarctic",
			"ice sheet",
			"sea ice",
			"Antarctic Bottom Water",
			"ice-rafted debris",
			"Antarctic water masses"
		],
		"related_identifiers": [
			{
				"identifier": "10.14379/iodp.proc.374.2019",
				"relation": "isDocumentedBy",
				"scheme": "doi"
			}
		],
		"resource_type": {
			"title": "Dataset",
			"type": "dataset"
		},
		"license": {
			"id": "cc-zero"
		},
		"communities": [
			{
				"id": "iodp"
			}
		],
		"relations": {
			"version": [
				{
					"index": 0,
					"is_last": true,
					"parent": {
						"pid_type": "recid",
						"pid_value": "6515859"
					}
				}
			]
		}
	},
	"title": "IODP Expedition 374 Color reflectance",
	"links": {
		"self": "https://zenodo.org/api/records/6515860",
		"self_html": "https://zenodo.org/records/6515860",
		"self_doi": "https://zenodo.org/doi/10.5281/zenodo.6515860",
		"doi": "https://doi.org/10.5281/zenodo.6515860",
		"parent": "https://zenodo.org/api/records/6515859",
		"parent_html": "https://zenodo.org/records/6515859",
		"parent_doi": "https://zenodo.org/doi/10.5281/zenodo.6515859",
		"self_iiif_manifest": "https://zenodo.org/api/iiif/record:6515860/manifest",
		"self_iiif_sequence": "https://zenodo.org/api/iiif/record:6515860/sequence/default",
		"files": "https://zenodo.org/api/records/6515860/files",
		"media_files": "https://zenodo.org/api/records/6515860/media-files",
		"archive": "https://zenodo.org/api/records/6515860/files-archive",
		"archive_media": "https://zenodo.org/api/records/6515860/media-files-archive",
		"latest": "https://zenodo.org/api/records/6515860/versions/latest",
		"latest_html": "https://zenodo.org/records/6515860/latest",
		"draft": "https://zenodo.org/api/records/6515860/draft",
		"versions": "https://zenodo.org/api/records/6515860/versions",
		"access_links": "https://zenodo.org/api/records/6515860/access/links",
		"access_users": "https://zenodo.org/api/records/6515860/access/users",
		"access_request": "https://zenodo.org/api/records/6515860/access/request",
		"access": "https://zenodo.org/api/records/6515860/access",
		"reserve_doi": "https://zenodo.org/api/records/6515860/draft/pids/doi",
		"communities": "https://zenodo.org/api/records/6515860/communities",
		"communities-suggestions": "https://zenodo.org/api/records/6515860/communities-suggestions",
		"requests": "https://zenodo.org/api/records/6515860/requests"
	},
	"updated": "2022-05-04T01:49:45.088180+00:00",
	"recid": "6515860",
	"revision": 2,
	"files": [
		{
			"id": "f2b0dc6c-6098-4b5b-b641-3c228228d75f",
			"key": "374_SITE_LOCATIONS.HTML",
			"size": 1182,
			"checksum": "md5:1414321997d6cdcf84bad70351ddd0b8",
			"links": {
				"self": "https://zenodo.org/api/records/6515860/files/374_SITE_LOCATIONS.HTML/content"
			}
		},
		{
			"id": "5122c778-36e4-48a6-874d-0c7bdf2d61df",
			"key": "RSC.zip",
			"size": 34698467,
			"checksum": "md5:9300f9e6f903f19f4ad9972ea5a14cd7",
			"links": {
				"self": "https://zenodo.org/api/records/6515860/files/RSC.zip/content"
			}
		},
		{
			"id": "80aba478-7a9e-4e70-8514-58212ce99a6f",
			"key": "RSC-README.txt",
			"size": 2178,
			"checksum": "md5:148cc2de1a523b8677d0b59c767faac5",
			"links": {
				"self": "https://zenodo.org/api/records/6515860/files/RSC-README.txt/content"
			}
		}
	],
	"owners": [
		{
			"id": 88403
		}
	],
	"status": "published",
	"stats": {
		"downloads": 1,
		"unique_downloads": 1,
		"views": 95,
		"unique_views": 94,
		"version_downloads": 1,
		"version_unique_downloads": 1,
		"version_unique_views": 92,
		"version_views": 93
	},
	"state": "done",
	"submitted": true
}
```

In [6]:

# IODP-JRSO user account identifier:
owner_id = 88403

# number of records to return per page
size = 100

records = []

for i  in range(1,9):
    zenodo_api = f'https://zenodo.org/api/records?q=owners:{owner_id}&size={size}&page={i}'
    response = requests.get(zenodo_api)
    response = response.json()
    
    # using jmsepath to query json
    # returns the list of records
    z = jmespath.search('hits.hits',response)

    records.append(z)

In [7]:
# flatten the list of lists
records = list(chain.from_iterable(records))

In [8]:
a = []
for i in records:
    a.append({
        'record_id' : i['id'],
        'title' : i['metadata']['title'],
        'doi': i['doi'],
        'conceptdoi' : i['conceptdoi'],
        'created' : i['created'],
        'stats' : i['stats'],
        
    })

df = pd.DataFrame(a)
df = pd.concat([df.drop(['stats'], axis=1), pd.json_normalize(df['stats'])], axis=1)

df.head()

Unnamed: 0,record_id,title,doi,conceptdoi,created,downloads,unique_downloads,views,unique_views,version_downloads,version_unique_downloads,version_unique_views,version_views
0,8010908,IODP Expedition 369 ICP-AES elemental analysis...,10.5281/zenodo.8010908,10.5281/zenodo.8010907,2023-06-06T15:52:44.104024+00:00,16,6,4,4,16,6,4,4
1,7706837,IODP Expedition 385 Elemental analysis (CHNS),10.5281/zenodo.7706837,10.5281/zenodo.7706836,2023-03-08T20:38:59.466693+00:00,7,5,25,25,7,5,24,24
2,7708625,IODP Expedition 385 Moisture and Density,10.5281/zenodo.7708625,10.5281/zenodo.7708624,2023-03-08T19:43:38.863235+00:00,7,5,26,26,7,5,25,25
3,7713057,IODP Expedition 385 Thin section images,10.5281/zenodo.7713057,10.5281/zenodo.7708733,2023-03-09T15:38:52.126422+00:00,51,18,32,31,34,16,25,26
4,7503969,IODP Expedition 376 Gas safety report,10.5281/zenodo.7503969,10.5281/zenodo.7503968,2023-01-04T15:52:55.656866+00:00,1,1,17,17,1,1,16,16


## All records by expedition

In [9]:
# extract expedition number from title
df['expedition'] = df['title'].str.extract(r'Expedition\s([A-Z]?\d{3}[A-Z]?)')
df[['record_id', 'expedition','title','doi','conceptdoi']].sort_values(by=['expedition','conceptdoi','doi','title']).reset_index(drop=True)

Unnamed: 0,record_id,expedition,title,doi,conceptdoi
0,7502011,350,IODP Expedition 350 Alkalinity and pH,10.5281/zenodo.7502011,10.5281/zenodo.7502010
1,7502037,350,IODP Expedition 350 Vane shear strength (AVS),10.5281/zenodo.7502037,10.5281/zenodo.7502036
2,7502039,350,IODP Expedition 350 Elemental analysis (CHNS),10.5281/zenodo.7502039,10.5281/zenodo.7502038
3,7850037,350,IODP Expedition 350 Closeup images,10.5281/zenodo.7850037,10.5281/zenodo.7502040
4,7850036,350,IODP Expedition 350 Core composite images,10.5281/zenodo.7850036,10.5281/zenodo.7502117
...,...,...,...,...,...
783,10210978,397T,IODP Expedition 397T Magnetic remanence (SRM-l...,10.5281/zenodo.10210978,10.5281/zenodo.10210977
784,10210985,397T,IODP Expedition 397T Thin section images,10.5281/zenodo.10210985,10.5281/zenodo.10210984
785,10211007,397T,IODP Expedition 397T Whole-round core section ...,10.5281/zenodo.10211007,10.5281/zenodo.10211006
786,10211019,397T,IODP Expedition 397T Whole-round core section ...,10.5281/zenodo.10211019,10.5281/zenodo.10211018


## All records by analyses

In [10]:
# extract analysis type from title
df['analysis'] = df['title'].str.extract(r'Expedition\s[A-Z]?\d{3}[A-Z]?(.+)')
df_analyses = df.groupby(['analysis','conceptdoi','doi','title','record_id'])[['unique_downloads','unique_views']].sum().reset_index()

df_analyses


Unnamed: 0,analysis,conceptdoi,doi,title,record_id,unique_downloads,unique_views
0,Alkalinity and pH,10.5281/zenodo.10206226,10.5281/zenodo.10206227,IODP Expedition 391 Alkalinity and pH,10206227,28,89
1,Alkalinity and pH,10.5281/zenodo.3628820,10.5281/zenodo.3628821,IODP Expedition 361 Alkalinity and pH,3628821,12,187
2,Alkalinity and pH,10.5281/zenodo.3751824,10.5281/zenodo.3751825,IODP Expedition 362 Alkalinity and pH,3751825,5,159
3,Alkalinity and pH,10.5281/zenodo.3776076,10.5281/zenodo.3776077,IODP Expedition 366 Alkalinity and pH,3776077,7,173
4,Alkalinity and pH,10.5281/zenodo.6511629,10.5281/zenodo.6511630,IODP Expedition 372A Alkalinity and pH,6511630,5,67
...,...,...,...,...,...,...,...
783,X-ray fluorescence (XRF),10.5281/zenodo.10206454,10.5281/zenodo.10206455,IODP Expedition 391 X-ray fluorescence (XRF),10206455,0,11
784,X-ray fluorescence (XRF),10.5281/zenodo.7850795,10.5281/zenodo.7850796,IODP Expedition 396 X-ray fluorescence (XRF),7850796,3,9
785,X-ray fluorescence (XRF),10.5281/zenodo.7850848,10.5281/zenodo.7850849,IODP Expedition 374 X-ray fluorescence (XRF),7850849,4,10
786,X-ray fluorescence (XRF),10.5281/zenodo.7850920,10.5281/zenodo.7850921,IODP Expedition 385 X-ray fluorescence (XRF),7850921,4,44


# Download a single dataset

In [81]:
# see Stack Overflow: https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests
def download_file(url, save_filepath):
    urlretrieve(url, save_filepath)

In [87]:
def get_record_file_list(recordid):
    # calling api to get the files for a given recordid
    zenodo_api_files = f'https://zenodo.org/api/records/{recordid}/files'
    response = requests.get(zenodo_api_files)
    response = response.json()
    
    # note: all files in a record may be downloaded as a single zip using the links > archive key. However, many image datasets are several GB in size and should probably be downloaded individually.
    # here we are grabbing the file entries for a single record

    df = pd.DataFrame(response['entries'])
    df = pd.concat([df.drop(['links'], axis=1), pd.json_normalize(df['links'])], axis=1)
    return df

In [88]:
# get the files for a given recordid
# getting the first recordid from our analyses dataframe
recordid = df_analyses['record_id'][0] 

# show files
df = get_record_file_list(recordid)
df.head()



Unnamed: 0,key,storage_class,checksum,size,created,updated,status,metadata,mimetype,version_id,file_id,bucket_id,self,content
0,391_SITE_LOCATIONS.HTML,L,md5:7ab3f8dc09ccfc182e87854f6a63e551,580,2023-11-25T17:32:35.688043+00:00,2023-11-25T17:32:35.720180+00:00,completed,,text/html,a3e370bb-daea-4e18-8541-97c545099ce8,b8bbc232-c99f-48c5-be61-930290f92b32,863c6593-13be-4af3-970f-7e83b1b586ee,https://zenodo.org/api/records/10206227/files/...,https://zenodo.org/api/records/10206227/files/...
1,data_by_hole.zip,L,md5:a0a7f2e8cb49d94b1072c02f39e6646c,5535,2023-11-25T17:32:35.566929+00:00,2023-11-25T17:32:35.594087+00:00,completed,,application/zip,b4a4542f-c011-4688-8f5e-fcddf59e962b,c1202276-f010-4c0c-a814-e4073ec79219,863c6593-13be-4af3-970f-7e83b1b586ee,https://zenodo.org/api/records/10206227/files/...,https://zenodo.org/api/records/10206227/files/...
2,supplementary_material.zip,L,md5:a3c7c079cbb201f5ebd9fb15e0661f2a,74745,2023-11-25T17:32:35.607269+00:00,2023-11-25T17:32:35.624687+00:00,completed,,application/zip,c0bb3321-431e-4290-8ba3-dadfebc940c3,f1697167-3195-4db1-8617-6a1ab00cf2d7,863c6593-13be-4af3-970f-7e83b1b586ee,https://zenodo.org/api/records/10206227/files/...,https://zenodo.org/api/records/10206227/files/...
3,ALKALINITY-README.txt,L,md5:22cfb7c533b552cef97481d3fab392cd,2187,2023-11-25T17:32:35.635877+00:00,2023-11-25T17:32:35.652348+00:00,completed,,text/plain,bc3d884d-2232-4689-9759-7e0cb913f296,664727c3-13b5-4e16-9781-a31730f35670,863c6593-13be-4af3-970f-7e83b1b586ee,https://zenodo.org/api/records/10206227/files/...,https://zenodo.org/api/records/10206227/files/...


In [89]:
# loop through records and download files for each
for index, row in df.iterrows():
    folder_path = f"./output/{recordid}"
    pathlib.Path(folder_path).mkdir(parents=True, exist_ok=True) # create the folder if it doesn't exist
    save_filepath = f"{folder_path}/{row['key']}"
    filename = download_file(row['content'], save_filepath) # download the file
    print(save_filepath)

./output/10206227/391_SITE_LOCATIONS.HTML
./output/10206227/data_by_hole.zip
./output/10206227/supplementary_material.zip
./output/10206227/ALKALINITY-README.txt


# Download multiple datasets


In [111]:
# filtering rows from the dataframe to only get records that have alkalinity in the title

datasets = df_analyses.query('analysis.str.contains("Alkalinity")')
datasets

Unnamed: 0,analysis,conceptdoi,doi,title,record_id,unique_downloads,unique_views
0,Alkalinity and pH,10.5281/zenodo.10206226,10.5281/zenodo.10206227,IODP Expedition 391 Alkalinity and pH,10206227,28,89
1,Alkalinity and pH,10.5281/zenodo.3628820,10.5281/zenodo.3628821,IODP Expedition 361 Alkalinity and pH,3628821,12,187
2,Alkalinity and pH,10.5281/zenodo.3751824,10.5281/zenodo.3751825,IODP Expedition 362 Alkalinity and pH,3751825,5,159
3,Alkalinity and pH,10.5281/zenodo.3776076,10.5281/zenodo.3776077,IODP Expedition 366 Alkalinity and pH,3776077,7,173
4,Alkalinity and pH,10.5281/zenodo.6511629,10.5281/zenodo.6511630,IODP Expedition 372A Alkalinity and pH,6511630,5,67
5,Alkalinity and pH,10.5281/zenodo.6515203,10.5281/zenodo.6515204,IODP Expedition 374 Alkalinity and pH,6515204,5,102
6,Alkalinity and pH,10.5281/zenodo.7020051,10.5281/zenodo.7020052,IODP Expedition 352 Alkalinity and pH,7020052,5,34
7,Alkalinity and pH,10.5281/zenodo.7072329,10.5281/zenodo.7072330,IODP Expedition 351 Alkalinity and pH,7072330,5,41
8,Alkalinity and pH,10.5281/zenodo.7502010,10.5281/zenodo.7502011,IODP Expedition 350 Alkalinity and pH,7502011,4,18
9,Alkalinity and pH,10.5281/zenodo.7503856,10.5281/zenodo.7503857,IODP Expedition 376 Alkalinity and pH,7503857,4,21


In [None]:
# files will download to "output" folder. Each record will have its own folder named after the title and record id.
for index, row in datasets.iterrows():
    
    # get list of files for given record id
    df_data = get_record_file_list(row['record_id'])
    
    # set up export filepath
    folder_path = f"./output/{row['title']}_{row['record_id']}"
    pathlib.Path(folder_path).mkdir(parents=True, exist_ok=True) # create the folder if it doesn't exist
    
    # iterate through files and download
    for i, r in df_data.iterrows():
        save_filepath = f"{folder_path}/{r['key']}"
        filename = download_file(r['content'], save_filepath)
        print(save_filepath)