-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata Relationship and Functionality between Publisher and Zenodo repo #5
Comments
On Mar 8, 2015, at 9:20 PM, August Muench notifications@github.com wrote:
owlice @owlice 17h17 hours ago It seems to me that if Zenodo is to act as the archive for these posts, it's better to let it actually do that, and if ADS is indexing posts in an archive… well, it’s indexing the archive, not the blogs themselves, and capturing/exposing the original URLs is capturing information that in 10, 15, 20 years’ time will likely be mostly worthless. (Many distractions here; if this is a jumbled mess, let me know and I’ll try again tomorrow!) Alice |
I don't see the logic of any of these points @owlice except that @dfm did include his original blog post URL in the description of the abstract. My point there was that the Zenodo ingest could have made the relationship explicit in the metadata, and probably should for this AIAC project. And that ADS or ASCL should expose this provenance for mirrored archives. The reason why I do not follow any of your logic is that duplication and mirroring of resources is a huge problem if the provenance cannot be unpacked. Do you need me to prove that? To assert otherwise because links are ephemeral is nonsense; all links are ephemeral. For an archive or a curated index to dereference the original resource for arguments about perpetuity is to substitute the repo or curated index as the valuable piece. They are valuable, just not more valuable than the original post/object especially in the case of curated indices that do not capture the original object's content. When archives do capture the original content, they build from and preserve the original URL even it is eventually broken. See WaybackMachine. Preserve the original link. I was looking for discussion about the semantics of preserving the provenance. |
If the blog post has been deleted (rendering not just the link to it but the actual piece itself ephemeral), isn’t the archive then the valuable piece? (Isn't that the point of the archive?) And if the archive has the original URL, isn’t that preservation enough? I can understand pushing the original URL to Zenodo and having it exposed there; I do think that’s a good idea. I’m not following your reason for wanting ADS to hold it. That just strikes me as unnecessary unless ADS is going to actually archive the blog post (and if it’s going to do that, then why bother with Zenodo?!), but hey, I’m not the one you need to convince, and I’ll stop getting in the way of the discussion! (Sorry!!) Does Zenodo build from the original URL? On Mar 9, 2015, at 8:35 AM, August Muench notifications@github.com wrote:
|
@augustfly I just want to add that in PR #4 I describe the metadata that is associated with a Zenodo deposition. The full REST API is here. That's basically what we have to work with when we're adding materials to a Zenodo community. I'll also agree that the post<=>repo relationship is important, though I don't have the background to make an educated suggestion. As a layperson and potential publisher, I'd be concerned that if Zenodo became the de facto place to see my blog post, then that would take some 'value' away from my blog. (Thankfully we're not dealing with publishers who are paid by ad clicks or page views). Also, the viewing experience on Zenodo itself probably won't be that great (?). A regular reader will probably want to be whisked as quickly as possible to the original web page where there are proper CSS layouts etc. for the blog post. So in that sense I see the content on Zenodo more as a backup than a place to go read the content. |
What Gus said. Basically, let's try to capture as much as possible about provenance/relationship early in the game. The datacite 3.1 format (which is supported by Zenodo) supports both <alternateIdentifiers> and <relatedIdentifiers>. The original blog post url should be in one of the two (I'd have to read the full spec to give a more educated guess but that's TL;DR for now). See example below from the datacite website:
I'm still not sure what the deposited content from the blog will look like in Zenodo, but if we are thinking that this is some kind of a pdf-ified version of the website, then I for one would want to see the original (assuming it's still live of course), not just Zenodo's "bad copy". And just to be clear, there is no magic in all of this: Zenodo, dois, and metadata schema just give us tools and technology that will help us do a good (maybe just decent) job at persisting some of this content. But that does not mean that anything at the end of a URL is flaky or bad and everything with a DOI is awesome. Since we can't solve the 404 problem of the web in general, all we can do is mitigate its impact through technology and social constructs. |
@AramZS and @lnielsen have worked this out. Code here: https://github.com/PressForward/PF_Zenodo |
I'd like to open a discussion about the choice of metadata for establishing the relationship between the original blog publication and the zenodo repository. at the bottom I mention functional issues related to capturing this metadata.
For example in the github=>zenodo software exchange, the relationship between the original github repository and the frozen zenodo repo is encoded as a "SupplementTo" relationship:
As this is not a discussion of that workflow, I will reserve my concerns about the choice of "IsSupplementTo" in any case where 1:1 mirroring has occurred. However, there are many more semantically relevant relationships that could be used to encode the relationship between the original Publisher (blog post) and the preserved repo. The controlled list from DataCite Schema 3.1(PDF) includes:
The appendix of the above PDF schema describes them in detail. I'd rather not pollute the discussion by inserting any opinion yet, but encourage others to think about this matter a bit.
Additionally, there are functional bits to consider about how to capture, preserve, and expose via citation this post<=>repo relationship. A few that come to mind are:
relatedIdentifierType="URL"
) of the post.The text was updated successfully, but these errors were encountered: