Metadata: Dataverse project take ownership of documentation for creating metadata blocks #3168

tdilauro · 2016-06-13T16:45:15Z

At JHU we currently have two use cases for creating custom metadata blocks:

Migration of custom metadata fields (the instructions for which don't explicitly mention metadata blocks, but they are implied) for production/staging instances; and
Experimentation with metadata models to support software citation/archiving/preservation models.

Neither of these is (though the latter may eventually be) suitable for inclusion in common metadata blocks that would be supported by Dataverse developers or the community at large, so we need to be able to create these blocks locally.

The Dataverse Team did not expect that individual instances would create their own metadata blocks, so documentation for them is sparse. Since we needed a better of understanding of how to do this, I put together a document that captured my understanding and asked the DV team (thanks @posixeleni, @pdurbin, @zoidy, @bmckinney, @scolapasta, and @bencomp for your contributions) to help fix errors and clarify points.

At this point, Dataverse 4.x Metadata Blocks syntax/semantics is in pretty good shape with regard to defining and loading the metadata blocks, so it would be great if the project would take ownership and responsibility for maintaining the document in some (perhaps completely different) form.

NB: More support/documentation for needed Solr schema changes are still needed to provide full custom metadata block support for local instances.

pdurbin · 2016-06-28T18:14:26Z

#3180 (comment) is the most recent example of me updating the Solr schema due to a field being added.

pdurbin · 2017-06-23T12:36:52Z

Here's a comment by @edzale at #3506

Hi,
in our local installation, we would like to customize the metadata:

remove the Astronomy metadata block because it's not accurate for our scientific domains
add and remove certain metadata elements in the other blocks
and define controlled vocabularies for certain metadata elements
Questions: is there any available documentation about these kind of customizations? Is there any way to use a remote controlled vocabulary (accessible through an API for example)?
Thank you in advance for your help.

pdurbin · 2018-01-14T23:34:51Z

From IRC today:

"I'm a newbie to Dataverse and evaluating it against CKAN for a potential client. I was wondering what the process is to customize the metadata fields for a dataset, and file metadata? I didn't see anything in the documentation but I very well may have missed it."

http://irclog.iq.harvard.edu/dataverse/2018-01-14

It would be nice to add some documentation on this, assuming we want to support custom metadata blocks.

jggautier · 2018-04-18T17:01:07Z

I worked more on the first section of the document that explains how the metadata block tsv is put together. It looks like the second section, about steps for installing metadata blocks, could use more details by those who've gone through that process. There are also questions about how to edit/reinstall metadatablocks that could be answered here by people who've done it.

Perhaps the doc could be reviewed by a developer to make sure it's clear and accurate. Then decide how it should be added to the guides. And new issues can be created to add more information about editing/reinstalling blocks, etc.

janetm · 2018-05-22T04:39:56Z

Hi All
Implications around customising Dataverse metatdata blocks

I've recently had a conversation with Danny and Gustavo (which they found unclear) about the Dataverse Metadata Blocks and issues with local customisation and harvesting. I hope this explains better...

Australian Data Archive (ADA) publish mainly social science survey data so there are some DDI elements/Dataverse fields that use a fairly static vocab. An example is Kind of Data [Survey data, Census data, textual data, diaries, aggregate...]; Unit of Analysis; Time Method etc.

I'm not referring to vocab servers, but drop-downs/tick boxes as already implemented using the TSV. At the moment, to create standard metadata we include the vocab lists in our templates as text, or refer archivists to documentation.

Implications of customising Dataverse metadata blocks:

unknown whether Dataverse metadata fields may have values hard-coded into the Dataverse application.
for harvesting purposes, any custom modification will cause issues for import.
***Is there the possibility of selective harvesting so modified fields could be excluded from harvesting?

We are also having ongoing discussions with Julian about the copyright and version DDI elements not included in the Citation Block - which we have to combine as text in the Notes field. I'm not sure where this is at?

These comments may be better in another space, let me know.
Thanks
Janet

jggautier · 2018-06-19T13:28:21Z

Hi @janetm. Thanks for pointing out issues about customizing metadata blocks and how it affects harvesting. And apologies for replying so late. I agree that it's appropriate that we try to clarify these questions in the documentation for creating metadata blocks, which I think should include editing metadata blocks.

I hope I can help answer your questions here and in the documentation (and of course invite developers to yell at me when I'm wrong :) :

Australian Data Archive (ADA) publish mainly social science survey data so there are some DDI elements/Dataverse fields that use a fairly static vocab. An example is Kind of Data [Survey data, Census data, textual data, diaries, aggregate...]; Unit of Analysis; Time Method etc.

I'm not referring to vocab servers, but drop-downs/tick boxes as already implemented using the TSV. At the moment, to create standard metadata we include the vocab lists in our templates as text, or refer archivists to documentation.

Implications of customising Dataverse metadata blocks:

unknown whether Dataverse metadata fields may have values hard-coded into the Dataverse application.

I can't imagine any technical issues with editing the default tsv files to allow controlled vocabularies for Kind of Data, Unit of Analysis and other fields that I think you have in mind. (We know that a large number of CV terms raises usability issues, but DDI guidelines suggest a small number of terms for the fields you've mentioned, right?)

for harvesting purposes, any custom modification will cause issues for import.
***Is there the possibility of selective harvesting so modified fields could be excluded from harvesting?

I think modified fields are already excluded from harvesting: @scolapasta told me that during harvesting Dataverse will try to harvest metadata even when it's a metadata document that isn't composed the way Dataverse expects it to be. I take this to mean that if during harvesting Dataverse expects Kind of Data in the oai_ddi.xml, like this:

...
<sumDscr>
	...
	<timePrd ...></timePrd>
	<collDate ...></collDate>
	<dataKind>KindOfData1</dataKind>
	<geogCover></geogCover>
	...
</sumDscr>
...

But the element name <dataKind> is changed to <kindOfData>, or its order is changed (e.g. if it switches places with colldate), it will exclude <kindOfData> and harvest the rest. I'd think that it would fail to harvest metadata that doesn't have the several fields needed for dataset publication.

(Since Dataverse creates ddi.xml that won't validate against the schema because some elements are put in the wrong places or misused, I've always wondered if while harvesting valid ddi.xml, Dataverse would ignore elements because it expects to find them in the wrong places.)

We are also having ongoing discussions with Julian about the copyright and version DDI elements not included in the Citation Block - which we have to combine as text in the Notes field. I'm not sure where this is at?

There's a github issue (#4570) about migrating datasets that already have versions. I think it's complicated because Dataverse automatically assigns versions, so we need to think about how migrating >1 versions will work. I don't know how the versioning that Dataverse does now affects harvesting. (I see that on the search results pages, the cards of harvested datasets don't include version numbers, so maybe it's not an issue?)

For the copyright element issue (and any of these issues really), could we email to schedule a time to chat? In an issue about making Dataverse produce valid ddi metadata (#3648), I proposed using the copyright element differently than I think you and Steve would like to, and I'd like to get your thoughts.

Thanks!

pdurbin · 2018-06-19T13:58:41Z

@jggautier to me taking ownership of the documentation means adding a page to the dev guide on this topic. It would mean a pull request. Does that make sense?

The lack of documentation definitely came up during the Dataverse Community Meeting last week. I'd love for this issue to be prioritized. Also, I'd like to point out that #4451 is related.

jggautier · 2018-08-09T22:06:40Z

Adding a page (or maybe adding content on an existing page) in the guides sounds good to me. It'll put content on GitHub and make it versioned. I'd need to talk to someone more familiar with Sphynx about how to move the content in the Google Doc into the Dataverse guides.

It sounds like you think that adding more info to the Google Doc about installing or reinstalling metadata blocks should be considered after the content has been moved to the guides.

jggautier · 2018-08-15T18:52:03Z

During estimation, @pameyer suggested saving the Google Doc as a docx file and using Pandoc (https://pandoc.org) to convert that to .rst, which Sphynx uses.

The team agreed to move to the guides only content we feel is solid right now - the first section that describes the parts of the metadata block tsv - and open other GitHub issues for moving other content to the guides, i.e. instructions and guidelines for editing and installing metadata blocks.

One thing not discussed was where in the guides this should go. Users can use this info to create or edit metadata blocks during installation, and create or edit metadata blocks after installation. So I could see this going in the installation guide or the admin guide. Currently, the Appendix is the only section with info about metadata blocks.

pdurbin · 2018-08-15T19:59:40Z

I think the Admin Guide would be a good place. Perhaps we could add the question "Am I happy with the metadata fields available out of the box or do I want to create a custom metadata block?" at http://guides.dataverse.org/en/4.9.2/installation/prep.html#decisions-to-make and link to the new content in the Admin Guide.

Converted the old google doc into a .rst and added it to our guides. Still needs some syntax finessing.

dlmurphy · 2018-08-24T21:25:26Z

For future reference: Pete's suggested method worked very well for converting a google doc to a properly formatting .rst file for our guides:

Download the google doc as a .docx
Use pandoc to convert the .docx to a .rst
Add the .rst to the proper docs folder and add an entry for it in the index
Do some finessing of the syntax to make sure it renders properly and add a table of contents to the page

dlmurphy · 2018-08-24T21:26:32Z

I've added the new page, but when I'm back on Tuesday I'll finish the syntax fine tuning and we'll be good to go.

mheppler · 2018-08-24T21:54:47Z

Looks good so far, @dlmurphy. You can preview the guides .rst files in GitHub, and the table formatting is solid from what I can see.

dlmurphy · 2018-08-28T18:29:46Z

Cleaned up the syntax in a49c1bd and it's looking much nicer now. Sending to code review for @jggautier to make sure his vision has been realized.

@jggautier

Made some edits to both formatting and content based on @jggautier's review

Linked "Appendix" subsection

jggautier · 2018-08-28T21:02:34Z

Awesome. Thanks @dlmurphy. Moving to QA.

pdurbin mentioned this issue Jun 28, 2016

Widgets: Conditional display of "Producer" or "Distributor" text #3180

Closed

mheppler added Feature: Metadata Component: Documentation labels Oct 10, 2016

pdurbin mentioned this issue Dec 5, 2016

Metadata customization #3506

Closed

This was referenced Feb 1, 2017

API endpoint api/admin/datasetfield/load allows invalid TSV (without error) #3617

Closed

As a researcher, I want URLs I enter in metadata fields to be clickable (i.e. alternativeURL) so that it's easier to follow links #3337

Closed

pdurbin added Feature: Developer Guide and removed Component: Documentation labels Jun 23, 2017

pdurbin added the User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh label Jul 4, 2017

pdurbin mentioned this issue Aug 2, 2017

Metadata blocks: When introducing a certain number of metadata fields via API the dataverse pages stop displaying sub-dataverses. #4037

Closed

jggautier self-assigned this Aug 29, 2017

jggautier mentioned this issue Dec 11, 2017

Webservices with dataverse #4282

Closed

pdurbin mentioned this issue Jun 19, 2018

harmonize formats for metadata schema and dataset creation #4451

Closed

djbrooke added the Status: Backlog label Jun 27, 2018

pdurbin mentioned this issue Jul 19, 2018

API endpoint as new object type in Dataverse #2245

Closed

djbrooke added the ready for estimation label Aug 13, 2018

djbrooke unassigned jggautier Aug 15, 2018

djbrooke added Status: This/Next Sprint and removed Status: Backlog labels Aug 22, 2018

dlmurphy added Status: Development and removed Status: This/Next Sprint labels Aug 24, 2018

dlmurphy self-assigned this Aug 24, 2018

dlmurphy added a commit that referenced this issue Aug 24, 2018

new doc page [#3168]

e30fc58

Converted the old google doc into a .rst and added it to our guides. Still needs some syntax finessing.

dlmurphy removed their assignment Aug 28, 2018

dlmurphy added Status: Code Review and removed Status: Development labels Aug 28, 2018

dlmurphy self-assigned this Aug 28, 2018

dlmurphy added a commit that referenced this issue Aug 28, 2018

Edits from review [#3168]

315e5d3

Made some edits to both formatting and content based on @jggautier's review

dlmurphy added a commit that referenced this issue Aug 28, 2018

Another edit [#3168]

3e2d693

Linked "Appendix" subsection

dlmurphy removed their assignment Aug 28, 2018

jggautier added Status: QA and removed Status: Code Review labels Aug 28, 2018

kcondon self-assigned this Aug 28, 2018

dlmurphy mentioned this issue Aug 28, 2018

3168 metadata doc transfer #5008

Merged

5 tasks

kcondon closed this as completed Aug 28, 2018

kcondon removed the Status: QA label Aug 28, 2018

djbrooke added this to the 4.10 - Additional Data Transfer Options milestone Aug 28, 2018

djbrooke modified the milestones: 4.10 - Additional Data Transfer Options, 4.9.3 - Optional File PIDs, Initial Internationalization Work Sep 18, 2018

pdurbin mentioned this issue Oct 9, 2018

Enable facet UI for Time Period Covered #5150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata: Dataverse project take ownership of documentation for creating metadata blocks #3168

Metadata: Dataverse project take ownership of documentation for creating metadata blocks #3168

tdilauro commented Jun 13, 2016 •

edited

Loading

pdurbin commented Jun 28, 2016 •

edited

Loading

pdurbin commented Jun 23, 2017

pdurbin commented Jan 14, 2018

jggautier commented Apr 18, 2018 •

edited

Loading

janetm commented May 22, 2018

jggautier commented Jun 19, 2018 •

edited

Loading

pdurbin commented Jun 19, 2018

jggautier commented Aug 9, 2018 •

edited

Loading

jggautier commented Aug 15, 2018 •

edited

Loading

pdurbin commented Aug 15, 2018

dlmurphy commented Aug 24, 2018

dlmurphy commented Aug 24, 2018

mheppler commented Aug 24, 2018

dlmurphy commented Aug 28, 2018

jggautier commented Aug 28, 2018

Metadata: Dataverse project take ownership of documentation for creating metadata blocks #3168

Metadata: Dataverse project take ownership of documentation for creating metadata blocks #3168

Comments

tdilauro commented Jun 13, 2016 • edited Loading

pdurbin commented Jun 28, 2016 • edited Loading

pdurbin commented Jun 23, 2017

pdurbin commented Jan 14, 2018

jggautier commented Apr 18, 2018 • edited Loading

janetm commented May 22, 2018

jggautier commented Jun 19, 2018 • edited Loading

pdurbin commented Jun 19, 2018

jggautier commented Aug 9, 2018 • edited Loading

jggautier commented Aug 15, 2018 • edited Loading

pdurbin commented Aug 15, 2018

dlmurphy commented Aug 24, 2018

dlmurphy commented Aug 24, 2018

mheppler commented Aug 24, 2018

dlmurphy commented Aug 28, 2018

jggautier commented Aug 28, 2018

tdilauro commented Jun 13, 2016 •

edited

Loading

pdurbin commented Jun 28, 2016 •

edited

Loading

jggautier commented Apr 18, 2018 •

edited

Loading

jggautier commented Jun 19, 2018 •

edited

Loading

jggautier commented Aug 9, 2018 •

edited

Loading

jggautier commented Aug 15, 2018 •

edited

Loading