# Uploading datasets to NRP repositories

This example demonstrates how to use the nrp_cmd library to upload datasets to NRP repositories based on NRP Invenio.
At first, load required imports

In [2]:
# imports
import os

from nrp_cmd import get_sync_client
from nrp_cmd.config import RepositoryConfig
from nrp_cmd.sync_client.streams.memory import MemorySource
from yarl import URL

## Setting up repository client

You need to set up the repository client with credentials to be able to upload datasets.
There are two ways to do this:
1. Set up the client with a configuration file - call `nrp-cmd add repository <url> <alias>` command and follow the instructions.
   Then, you can use `client 'get_sync_client("<alias>")'` to get the client.
2. Set up the client programmatically by providing the url and token. The configuration will not be saved, so you will need to provide it every time you run the script.
   This approach is shown below:

In [3]:
client = get_sync_client(
    RepositoryConfig(
        alias="datarepo-test",
        url=URL("https://workflow-repo.test.du.cesnet.cz"),
        token=os.environ.get("NRP_TOKEN", None) or input("Enter your API token: "),
    ),
)

## Creating a new draft dataset

Call `client.records.create(...)` to create a new draft dataset. For NRP repositories, you need to provide the community in which the dataset will be created.
The `self_html` link can be used to access the dataset in the web interface. The `created_record.errors` attribute can be used to check for any errors that occurred during the creation process.
See [ccmm_invenio](https://github.com/NRP-CZ/ccmm-invenio/tree/main/ccmm_invenio/fixtures) for a list of allowed vocabulary values.

In [14]:
dataset_metadata = {
        "metadata": {
            "title": "Test dataset",
            "resource_type": {"id": "c_ddb1"},   # Dataset resource type from vocabularies
            "date_issued": "2023-10-01",
        }
    }

created_record = client.records.create(dataset_metadata, community="generic")
print(created_record.links.self_html)
print("Errors:", getattr(created_record, "errors", None))

https://workflow-repo.test.du.cesnet.cz/datasets/np6t7-t0a40/preview
Errors: None


## Uploading a file (in memory) to a created draft dataset

To upload a file to the created draft dataset, you can use the `client.files.upload(...)` method. This method takes a `Source` - an abstract class that provides data in a streaming manner,
together with other metadata, such as the content type. To upload from a memory, you can use the `MemorySource` class as show below.

In [16]:
m = MemorySource(
        b"Hello, this is a test file.",
        content_type="text/plain",
    )
client.files.upload(created_record, "test.txt", {"title": "A sample text file"}, m)

File(_etag=None, key='test.txt', metadata={'title': 'A sample text file'}, links=FileLinks(self_=URL('https://workflow-repo.test.du.cesnet.cz/api/datasets/np6t7-t0a40/draft/files/test.txt'), self_html=None, content=URL('https://workflow-repo.test.du.cesnet.cz/api/datasets/np6t7-t0a40/draft/files/test.txt/content'), commit=URL('https://workflow-repo.test.du.cesnet.cz/api/datasets/np6t7-t0a40/draft/files/test.txt/commit'), parts=None), transfer=FileTransfer(type_='L'), status='completed', size=27)

## Checking the uploaded file

After the upload, you can check the uploaded file either using the API, or by navigating user to the draft dataset in the web interface.

In [18]:
print(created_record.links.self_html)
files = client.files.list(created_record)
df = files.as_dataframe("key", "links.content")
print(df)

https://workflow-repo.test.du.cesnet.cz/datasets/np6t7-t0a40/preview
        key                                      links.content
0  test.txt  https://workflow-repo.test.du.cesnet.cz/api/da...
