-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDI/OAI-DDI exports crash for some records in 5.9 #8452
Comments
Thank you for the detailed report. (In other words, OAI/GetRecord does not try to export the metadata; it expects the record to be ready to read. So indeed the OAI error is just about the failure to open the file. On top of the export error, this is definitely looking like a bug, that the record was added to the OAI set, as if the DDI format was exported and saved; even though it failed.) |
To me it looks like this is likely something else; (note that there are no files in the dataset; that misplaced tag was happening in the files section, I think?) |
We reexported all after upgrading. We also have the same issue on our pre-production server where we reexported multiple times. I have trouble getting logs. Last week when I was refreshing the page causing errors, there were stacktraces but now there is none. I set the logs to FINE, restarted payara, reexported, and still no stacktrace. Is there some cache for the exports that prevents me from getting the errors in the logs? |
OK, it's looking like maybe your last attempt to re-export actually cached that broken, incomplete xml record on disk (or in your S3 bucket, whichever you are using). So that's what Dataverse is using now, hence no export exceptions. Could you please try to delete the file The file will be located somewhere like |
We removed the cache and refreshed the export page (oai_ddi) and got the following error:
In the logs:
But 10 seconds later we got the same error page as before, with no logs. We tried other ways to get logs. We found something that could be useful: a stacktrace about the DDI export when publishing a new version of the dataset. There are also interesting errors about controlled vocabularies (are they related to our languages.zip or our TSV?). We have multiple CVs but it only happens on the country field. Here are the logs:
|
Just to confirm, when you say "refreshed the export page", that was |
Yes it was this link (I copy pasted the one you sent before).
|
@bappun Dataverse tries to write a temporary export file into |
I could not find any file in /tmp other than a psql socket and lock. However, I exported the JSON of the dataset to import it in our pre-production instance. It wasn't working at first but I removed fields in the JSON and reimported until I found something. The error occurs when a dataset has more than one value in the collectionMode field. Our collectionMode field (social_science.tsv) is set as multiple, which is not the case for the current source citation.tsv. That makes me think that multiple values are not handled by the DDI exporter. From what I understand in the DDI documentation here, the If my interpretation of the DDI documentation is not wrong, would it be possible to support multiple collectionMode values for the DDI exporter? EDIT: Colleagues from Sciences Po confirmed me that |
I'm not sure that the fact that the schema technically allows it means it should be done, but letting depositors add multiple Collection Modes makes sense to me. And for search purposes I'd rather have depositors enter each mode in a different field (maybe even better to pick terms from a vocabulary, although that's a bigger and ongoing conversation) then to add all Collection Modes in one textbox. I'm not sure why the field was set up to allow only one instance. Maybe at the time they weren't aware of any real cases of people needing to add multiple collection modes? Most of the fields in the Social Science metadatablock TSV that ships with the software don't let depositors add more text boxes. Has your social_science.tsv been edited to allow multiples for other fields, e.g. Type of Research Instrument? If so, would creating a dataset with multiple Type of Research Instrument values also break the DDI export because it doesn't expect Type of Research Instrument to allow multiples? |
In our social_science.tsv we do not have Research Instrument as multiple but we have Unit of Analysis. Unit of Analysis seems to be the same type of field as Collection Mode but it works great with the DDI export and the search as a facet. But in our case they are controlled vocabularies, which is different than the default textbox behavior. Here are our TSV if you want to take a look: https://github.com/CDSP-SCPO/dataverse-controlledvocabulary |
Ah thanks. Is a solution then to change the code that creates that DDI export so that it's able to include multiple collMode elements? Sounds like a similar situation to what's done already for the Unit of Analysis field. Maybe by doing this, we can let the social science metadatablock TSV file that ships with the software continue to not allow multiples (until there's resources to do the research needed to make sure that it's a good idea to change property in the social science metadatablock TSV file that ships with the software). |
I had to take a day off/look into something else. But if it is as simple as this - a potentially multiple value that our DDI export insists on treating as a single-value-only, then it is clearly a bug that we need to fix.
while collMode is exported as
|
(Aside from the issue of how to treat this specific field, it is obviously a bug, that Dataverse assumed that this dataset was successfully exported - even though it was not - and added it to the OAI set; so that needs to be resolved too). |
OK, I just got the part that it is multiple, because YOU made it multiple in your installation. (Finally). Yeah, it is allowed to be multiple in the DDI schema. Just hard-coded to single in our code. |
We are close to releasing Dataverse 5.10, and I am trying to add a fix for this to the release. |
Thank you all for your help with this issue. We really appreciate it! |
…cab., and changing the ddi export accordingly. #8452
Note to self: remember to open a new issue for the apparent bug in the export code that marked the dataset as successfully exported (which resulted, among other things, in it being added to the OAI set), even though the export clearly failed. |
What steps does it take to reproduce the issue?
Not sure why it happens on our Dataverse instance.
When does this issue occur?
When some datasets are exported to DDI.
Which page(s) does it occurs on?
OAI-DDI ListRecords
https://data.sciencespo.fr/oai?verb=ListRecords&metadataPrefix=oai_ddi
Not working dataset: https://data.sciencespo.fr/dataset.xhtml?persistentId=doi:10.21410/7E4/075L2L
Working dataset: https://data.sciencespo.fr/dataset.xhtml?persistentId=doi:10.21410/7E4/YE586X
What happens?
The DDI/OAI-DDI exports crash when getting records:
We upgraded our instance from 4.2 to 5.9 and noticed this issue when using the OAI. It does not happen on all records but one error makes the whole ListRecord crash. Here are the logs from server.log when the issue is raised:
Which version of Dataverse are you using?
5.9
Any related open or closed issues to this bug report?
No
The text was updated successfully, but these errors were encountered: