`public Identifier createDataset(String dataSetJson, String dataverseAlias) {...}` returns a DB identifier but we need a doi to `uploadFile` #14

AleixMT · 2022-11-15T20:16:39Z

Hello again.

I am trying to do a bulk upload of a project into a dataverse instance. To do so I need to create a dataset for the project and then upload all the files into the created dataset. The problem is that when you create a dataset the method to do so returns an Identifier which contains an integer. This integer is supposed to identify the dataset that you just created, but when you want to upload a file into that dataset using the identifier you can not do it since the methods to upload a file only accept DOIs to identify datasets and not the identifier that you return from the createDataset method.

So, I would like to do something like this:

List<Document> documents = new ArrayList(...);
Identifier identifier = api.getDataverseOperations().createDataset(JSONMetadata.toString(),  "theDatasetName");

for (Document document: documents)
{
    try {
        api.getDatasetOperations().uploadFile(identifier.toString(), document.getInputStream(), document.getName() );  // This line fails because the identifier does not identify any dataset and it expects a DOI
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Where Document is just a class that wraps file data.

But I cant do it since public Identifier createDataset(String dataSetJson, String dataverseAlias) {...} does not return a DOI.

So, my question is: ¿Is there any way to retrieve the DOI of the dataset that I just created in order to upload files to it inmediately after? Even if it involves doing extra operations. Alternatively: ¿Is there any way to use the Identifier object that you return to identify a dataset and upload files to it?

If that is not possible I will try to do another pull request. But this time I am going to need a little help, since I do not know what operations are you doing in the last line of public Identifier createDataset(String dataSetJson, String dataverseAlias) {...} where you do
return resp.getBody().getData(); where I deduce that you are parsing the return, and obtaining the Id from there.

The reason why I am proposing this change is because I think is completely possible to do so and also an improvement to the library: When you use the native API to create a dataset (using curl for example) the server returns a JSON which contains both the identifier that you return and the doi of the dataset that you just created. It is a matter of parsing the DOI and the identifier and returning them in the method or implementing an equivalent method that parses and returns only the DOI.

Please, answer me when you can to know your opinion in this subject.

The text was updated successfully, but these errors were encountered:

AleixMT · 2022-11-16T11:40:07Z

I discovered that we can retrieve the DOI of a dataset using its Identifier like this:

// Call Dataverse API client to create dataset into the ICIQ dataverse
Identifier identifier = api.getDataverseOperations().createDataset(dataverseDatasetMetadata.toString(), "ICIQ");

// Upload files of each experiment into the created dataset
for (Document document : documents)
{
        try {
            // Obtain the dataset what we just created in order to obtain its DOI.
            Dataset dataset = api.getDatasetOperations().getDataset(identifier);

            // If we have a valid dataset, then we can
            if (! dataset.getDoiId().isPresent())
            {
                // TODO throw exception
                throw new RuntimeException();
            }
            else
            {
                api.getDatasetOperations().uploadFile(dataset.getDoiId().get(), file);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
}

Which basically it boils down to

Identifier identifier = api.getDataverseOperations().createDataset(dataverseDatasetMetadata.toString(), "ICIQ");
Dataset dataset = api.getDatasetOperations().getDataset(identifier);  // This middle step to obtain the dataset, from where we will retrieve the DOI
api.getDatasetOperations().uploadFile(dataset.getDoiId().get(), file);

This is still a possible and positive change since the native API implementation always returns the DOI of the dataset that you created, so doing an extra request to obtain the DOI is wasteful.

I am not going to close this issue. I will wait for the opinion of the owners.

Thank you.

PD: I am trying to upload a lot of datasets to dataverse, so optimization is a crucial step.

richarda23 · 2022-11-18T00:01:30Z

Hi Aleix
Please can you post a response you get from curl when you create a dataset? What version of Dataverse are you posting to?
It might well be it is returning more information than when Identifier was first written. As you say it would be better to get that info when the dataset is first created.
Thanks, Richard

AleixMT · 2022-11-19T13:27:36Z

Here is the curl call that I do and its response in the next line. You can see that the dictionary response returns two identifiers.

The dataverse instance that I am posting to is dataverse.csuc.cat.

I am aware that this instance has some customizations. For example: The file that I am uploading with curl dataset-finch1.json is a modified version of the example minimal dataset metadata that is provided with the documentation of dataverse. I needed to extend the file with some mandatory fields because it was not working on this instance that I am uploading to. I do not know if the responses of the API are customized too.

pdurbin · 2022-11-21T18:56:13Z

I don't believe the response has been customized. It's returning the database ID of the dataset as well as the DOI of the dataset.

otter606 mentioned this issue Nov 20, 2022

parse persistent id of created dataset #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`public Identifier createDataset(String dataSetJson, String dataverseAlias) {...}` returns a DB identifier but we need a doi to `uploadFile` #14

`public Identifier createDataset(String dataSetJson, String dataverseAlias) {...}` returns a DB identifier but we need a doi to `uploadFile` #14

AleixMT commented Nov 15, 2022

AleixMT commented Nov 16, 2022

richarda23 commented Nov 18, 2022 •

edited

AleixMT commented Nov 19, 2022

pdurbin commented Nov 21, 2022

public Identifier createDataset(String dataSetJson, String dataverseAlias) {...} returns a DB identifier but we need a doi to uploadFile #14

public Identifier createDataset(String dataSetJson, String dataverseAlias) {...} returns a DB identifier but we need a doi to uploadFile #14

Comments

AleixMT commented Nov 15, 2022

AleixMT commented Nov 16, 2022

richarda23 commented Nov 18, 2022 • edited

AleixMT commented Nov 19, 2022

pdurbin commented Nov 21, 2022

`public Identifier createDataset(String dataSetJson, String dataverseAlias) {...}` returns a DB identifier but we need a doi to `uploadFile` #14

`public Identifier createDataset(String dataSetJson, String dataverseAlias) {...}` returns a DB identifier but we need a doi to `uploadFile` #14

richarda23 commented Nov 18, 2022 •

edited