Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot publish dataset with existing DOI imported into Dataverse as DDI #3632

Closed
didaRo opened this issue Feb 16, 2017 · 24 comments
Closed

Cannot publish dataset with existing DOI imported into Dataverse as DDI #3632

didaRo opened this issue Feb 16, 2017 · 24 comments
Assignees
Labels
Feature: Migration User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh

Comments

@didaRo
Copy link

didaRo commented Feb 16, 2017

Hello,

I am using Native API Dataverse (version 4.5.1) to add a new Dataset with json file
curl -X POST -H "Content-type:application/json" -d @data/dataset-create-new.json 'http://$SERVER/api/dataverses/$id/datasets/?key=$apiKey'
In the dataset-create-new.json file, i have the following informations for my identifier
"authority": "10.5075",
"identifier": "PP2RMH",
"protocol": "doi",

the API create a new identifier and ignore the sent identifier in json file. How can I use Native API to create a new dataset with an existing identifier?

Thanks !!!!

@pdurbin
Copy link
Member

pdurbin commented Feb 16, 2017

@didaRo hi! This sounds highly related to #3083. In the end, we decided not to merge #3377 but it's good to know that there's more demand out there than just the use case from @pameyer

@didaRo are you trying to migrate your data from some other solution? Would you like to share any more details about your scenario?

@didaRo
Copy link
Author

didaRo commented Feb 16, 2017

Hi @pdurbin
I'm in the team of @edzale ( https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/dataverse-community/EVCuBELhEWs/5_oFB_5CAAAJ), we use 4.5.1 version of Dataverse.
Just to summarize : there are no solution for this case ?
Regards
Fairouz

@pdurbin
Copy link
Member

pdurbin commented Feb 16, 2017

@didaRo the way to import datasets that already have DOIs into Dataverse is to use XML rather than JSON. Specifically, the datasets need to be in DDI format.

Recently I was looking at #3583 and playing around with some code at pdurbin@ca6ecb4 where you can see an example DDI file called "version1.xml" that I'm using in an integration test in BatchImportIT.java where I call "migrate". I can explain more if you'd like to go down the route of putting your datasets in DDI format. It should work. You can also read more about importing DDI files at https://github.com/IQSS/dataverse/blob/develop/scripts/migration/migration_instructions.txt#L31

@didaRo
Copy link
Author

didaRo commented Feb 17, 2017

@pdurbin Hi,
It's work, I added a new Dataset with an existing identifier 👯‍♀️

But I have another problem, when I try to publish dataset :
Case one :
if i try to publish my dataset after to create it
curl POST 'http://$SERVER/api/datasets/$id/actions/:publish?type=major&key=apiKey'
I have this error
{"status":"ERROR","message":"Latest version of dataset T1J10121 is already released. Only draft versions can be released."

Case two:
I modify dataset and try to publish draft version I have this error : version number
org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERREUR: la valeur d'une clé dupliquée rompt la contrainte unique « unq_datasetversion_0 »
Détail : La clé « (dataset_id, versionnumber, minorversionnumber)=(592, 1, 0) » existe déjà
Error Code: 0
Call: UPDATE DATASETVERSION SET LASTUPDATETIME = ?, MINORVERSIONNUMBER = ?, RELEASETIME = ?, VERSIONNUMBER = ?, VERSIONSTATE = ?, VERSION = ? WHERE ((ID = ?) AND (VERSION = ?))

Thanks
Fairouz

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2017

@didaRo when you added your dataset with an existing identifier was the dataset in DDI format? If so, we can change the issue title to be about documenting how to accomplish what you're trying to do.

@didaRo
Copy link
Author

didaRo commented Feb 17, 2017

@pdurbin yes DDI format file.
I have error when I try to publish this dataset (web platform or with API Dataverse)

@pdurbin pdurbin changed the title How to call Native API to add a new dataset which has already a DOI Cannot publish dataset with existing DOI imported into Dataverse as DDI Feb 17, 2017
@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2017

@didaRo ok, I changed the title to "Cannot publish dataset with existing DOI imported into Dataverse as DDI". Thanks. I'm actually not sure if this is currently supported or not. You might need to import the dataset as published or something. Can you please provide the curl command you used to import the dataset as DDI?

@didaRo
Copy link
Author

didaRo commented Feb 17, 2017

@pdurbin
I use this command
curl -H "X-Dataverse-key: $apiKey" "http://$SERVER/api/batch/migrate/?dv=dataverseName&path=/path/to/version1.xml&createDv=true"

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2017

@didaRo thanks! Can you please email support@dataverse.org and mention this issue? I'll get you in touch with people who will know more about if this is already supported or not. I'm hoping it only needs to be documented (no code change) but I don't really know. Thanks!

@didaRo
Copy link
Author

didaRo commented Feb 17, 2017

OK
thanks :)

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2017

@didaRo thanks for creating https://help.hmdc.harvard.edu/Ticket/Display.html?id=246794 . I assigned it to someone who can help. One thing is bothering me... You said the error is "Latest version of dataset T1J10121 is already released." In this context, "released" means "published". So it sounds like perhaps the dataset is already published? I'm confused about what state the dataset is in. (Doing a GET of the dataset to get a dump of it as JSON may help. This is documented at http://guides.dataverse.org/en/4.6/api/native-api.html#datasets .) Also, given how the DOI looks (doi:10.5072/T1J10121) it reminds me even more of #3583.

@didaRo
Copy link
Author

didaRo commented Feb 17, 2017

@pdurbin.
In my case the dataset is not published, i checked the identifier in DataCite platform, it's doesn't exist :(
the command of migrate dataset maybe create the dataset in publish state.
I was also update the dataset created with existing identifier, and try to publish it I have an other error
Internal Exception: org.postgresql.util.PSQLException: ERREUR: la valeur d'une clé dupliquée rompt la contrainte unique « unq_datasetversion_0 »
Détail : La clé « (dataset_id, versionnumber, minorversionnumber)=(592, 1, 0) » existe déjà
Error Code: 0
Call: UPDATE DATASETVERSION SET LASTUPDATETIME = ?, MINORVERSIONNUMBER = ?, RELEASETIME = ?, VERSIONNUMBER = ?, VERSIONSTATE = ?, VERSION = ? WHERE ((ID = ?) AND (VERSION = ?))

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2017

Ok, again, it might be interesting to see the JSON output of a GET of the dataset like this:

curl -H "X-Dataverse-key: $API_TOKEN" http://localhost:8080/api/datasets/:persistentId?persistentId=doi:10.5072/T1J10121

@didaRo
Copy link
Author

didaRo commented Feb 20, 2017

@pdurbin Hi,
I have this JSON output,
{"status":"OK","data":{"id":XXX,"identifier":"T1J10121","persistentUrl":"http://dx.doi.org/10.5072/T1J10121","protocol":"doi","authority":"10.5072","publisher":"Inra Dataverse","latestVersion":{"id":ZZZ,"versionNumber":1,"versionMinorNumber":0,"versionState":"RELEASED","distributionDate":"1983","productionDate":"Production Date","lastUpdateTime":"2017-02-17T13:23:57Z","releaseTime":"2010-04-22T00:00:00Z","createTime":"2017-02-17T13:23:57Z","license":"NONE","termsOfUse":"<div style="padding-left: 30px;"> <ul style="list-style-type: decimal;" > .....
Complete result in attachment
dataverse get dataset persistentID.zip

Error in constraint unique « unq_datasetversion_0 »
the key « (dataset_id, versionnumber, minorversionnumber)=(XXX, 1, 0) » already exist

Thanks

@didaRo
Copy link
Author

didaRo commented Feb 20, 2017

Hello @pdurbin

Finally I just specify DRAFT version in the DDI file :)
<verStmt source="DVN"> <version date="2017-02-20" type="DRAFT">1</version> </verStmt>

It's work 👍
Thanks for all
Fairouz

@pdurbin
Copy link
Member

pdurbin commented Feb 20, 2017

@didaRo good job figuring it out! I guess we'll leave this issue open until we document the solution. Maybe it could go somewhere under http://guides.dataverse.org/en/latest/admin

@didaRo
Copy link
Author

didaRo commented Feb 23, 2017

Hii @pdurbin,
I have another question, for the DDI file metadata
I have a DataCite metadata and I hope to converter the xml file to DDI format in order to migrate the dataset.
you use an API to do this in Dataverse ?

Thanks a lot
Fairouz

@pdurbin
Copy link
Member

pdurbin commented Mar 7, 2017

@didaRo hi, I see you posted about this at https://groups.google.com/d/msg/dataverse-migration-wg/I82Cpqf0rN0/ui_MFGvaAAAJ as well. We do not have a converter from DataCite to DDI.

There is some unmerged code to export from DataCite but I'm not sure that will help you. That code is at https://github.com/sbgrid/sbgrid-dataverse/blob/098b56b531f0c3e5cad689ef39eb54d321071587/mod-sbgrid/src/main/java/edu/harvard/iq/dataverse/export/DataciteExporter.java and was mentioned at #2917 (comment)

A tool to convert from DataCite XML to DDI XML could probably be written as a standalone tool outside of Dataverse. For all I know such a tool already exists. You might want to ask on the DDI mailing list: http://lists.icpsr.umich.edu/mailman/listinfo/ddi-users . If you find one there or elsewhere, please let us know!

@pdurbin
Copy link
Member

pdurbin commented Jun 30, 2017

In a thread at https://groups.google.com/d/msg/dataverse-community/0WF2IA-43XI/gw8syLZZAwAJ Michel Bamouni just wrote, "To import an existing doi into my dataverse, I use sql queries. So I need to publish dataset using an sql command." At https://groups.google.com/d/msg/dataverse-community/0WF2IA-43XI/JJNhz8P0AAAJ I replied explaining that you can't publish just by switching one field in the database. The root problem, however, seems similar to this issue... a desire to import an existing DOI into Dataverse.

@didaRo what is your latest thinking on this issue, please?

@pdurbin
Copy link
Member

pdurbin commented Jun 30, 2017

Also: "We need this to import existing datasets with DOIs into Dataverse. For that, the two options are to either import via OAI-PMH or to create the dataset with a "test" DOI and then change it to the "real" one via SQL." -- @adam3smith at https://groups.google.com/d/msg/dataverse-community/qmFQK5_sZoA/dNotwpkQAgAJ

@pdurbin
Copy link
Member

pdurbin commented Jun 30, 2017

Also: "One of the SBGrid migration requirements is to move the existing datasets, along with their DOIs, into Dataverse. Currently, it appears that this can only be done using the DDI XML import." -- issue #3083 . @pameyer what do you think of these comments I'm making today? Similar requirements, right?

@pdurbin pdurbin added the User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh label Jul 4, 2017
@pdurbin
Copy link
Member

pdurbin commented Jul 19, 2018

@didaRo are you still interesting in this issue? If so, pull request #4606 for issue #3083 was just merged and you could try an import using JSON.

@pdurbin
Copy link
Member

pdurbin commented Oct 4, 2018

Related: #5104

@pdurbin
Copy link
Member

pdurbin commented Oct 10, 2022

@didaRo hi! These days there's a data migration API: https://guides.dataverse.org/en/5.12/developers/dataset-migration-api.html

Want to give it a try? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Migration User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
None yet
Development

No branches or pull requests

3 participants