# 

# Getting datasets from Zenodo and NRP repositories

This example demonstrates how to use the nrp_cmd library to download datasets from the NRP and Zenodo repositories. Please install the latest version of the nrp_cmd library before running this example.

## Synchronous client

Import the synchronous API from the nrp_cmd library and initialize the connection to a repository. Let's start with Zenodo:

In [1]:
from nrp_cmd import get_sync_client
import pandas as pd


zenodo_client = get_sync_client("https://www.zenodo.org")

## Downloading metadata

You can use the `zenodo_client` to download metadata and datasets from Zenodo. The client provides methods to list available datasets, download them, and retrieve metadata.

Let's search for a dataset by its title and download the metadata first:

In [2]:
records = zenodo_client.records.search(q="title:precipitation", size=10)
df = records.as_dataframe("id", "metadata.title", "created", "links.self_html")
df

Unnamed: 0,id,metadata.title,created,links.self_html
0,14753548,Precipitation,2025-01-28 09:56:26.140180+00:00,https://zenodo.org/records/14753548
1,6257600,21.7extreme precipitation,2022-02-24 13:02:11.535280+00:00,https://zenodo.org/records/6257600
2,4449697,Extreme Precipitation Potential and Slow-movin...,2021-01-19 11:58:32.503474+00:00,https://zenodo.org/records/4449697
3,14908501,Precipitation Efficiency,2025-04-29 06:03:35.671288+00:00,https://zenodo.org/records/14908501
4,2061209,Electrical Precipitation,2018-12-08 08:16:14.562145+00:00,https://zenodo.org/records/2061209
5,1464802,Precipitation fronts,2018-10-17 14:06:24.594783+00:00,https://zenodo.org/records/1464802
6,14804278,Precipitation Plot,2025-02-04 19:08:28.293340+00:00,https://zenodo.org/records/14804278
7,1795203,Precipitation of Salts,2018-12-01 12:23:36.685665+00:00,https://zenodo.org/records/1795203
8,4005573,FYRE Climate: Precipitation,2020-08-28 15:47:51.564979+00:00,https://zenodo.org/records/4005573
9,1299760,SCOPE Climate: precipitation,2018-06-28 15:38:28.951498+00:00,https://zenodo.org/records/1299760


## Using DOI to get metadata

If you know the dataset's DOI, you can directly get the metadata::

In [3]:
from nrp_cmd.sync_client import resolve_record_id


doi = "https://doi.org/10.5281/zenodo.7676478"
client, record_url = resolve_record_id(doi)
record = client.records.read(record_url)
print(record.metadata["title"])

Precipitation



## Listing files in a dataset

To list files in a dataset, call client.files.list method with the record metadata as its argument. You can convert the result to a pandas DataFrame as well:

In [4]:
files = client.files.list(record)
df = files.as_dataframe("key", "size", "checksum", "links.content")
print(df)

                         key   size                              checksum  \
0   GroundTruth_delauney.csv   3680  md5:744fa6a36cfb3c16b7b7dfbfa8f56cb0   
1  precipitation_dataset.csv  73637  md5:11098cddc8e2c75bba57f0571abac488   

                                       links.content  
0  https://zenodo.org/api/records/14753548/files/...  
1  https://zenodo.org/api/records/14753548/files/...  


## Downloading file to pandas DataFrame

The easiest way to show the content of a file in a dataset is to pass the content url to the pandas read_csv function. This will download the file and read it into a pandas DataFrame.

**Note:** This library uses yarl.URL for all URLs. You need to convert the URL to a string before passing it to pandas read_csv function.

In [5]:
first_csv = df[df['key'].str.endswith('.csv')].iloc[0]
pd.read_csv(str(first_csv["links.content"]))

Unnamed: 0,from,to
0,112193,111280
1,112348,110338
2,112483,110187
3,113335,110072
4,113879,110187
...,...,...
257,238725,118147
258,238725,119241
259,238725,234271
260,238725,235541


## Downloading single file from a dataset

In [6]:
from nrp_cmd.sync_client.streams.file import FileSink
from pathlib import Path

client.files.download(files[0], FileSink(Path("/tmp/downloaded.bin")))

## Downloading all dataset files to a local folder

TODO: work in progress, we need a shortcut for this - just now, you can iterate over the files in the dataset and download them one by one.
