Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Deposit - Github Integration #2739

Open
leeper opened this issue Nov 15, 2015 · 19 comments

Comments

@leeper
Copy link
Member

commented Nov 15, 2015

Zenodo now provides a really convenient way to archive a Github repository, using git tags (i.e., Github releases). This is a really convenient way to attach a DOI to a Github repository. Being able to do the same with Dataverse would be awesome.

The reason I thought of it is that I was considering building a layer into the R client that would make it convenient to archive a version of a local git repository using the Dataverse SWORD API, but if this was all implemented natively within Dataverse that would probably be even better.

@mercecrosas

This comment has been minimized.

Copy link
Member

commented Nov 16, 2015

@leeper yes, this is something we've been considering, and you are right that Zenodo does this very well. Thanks for pointing it out and creating the issue.

@mercecrosas

This comment has been minimized.

Copy link
Member

commented Nov 16, 2015

👍

@mercecrosas mercecrosas modified the milestone: In Review Nov 30, 2015

@scolapasta scolapasta modified the milestone: Not Assigned to a Release Jan 28, 2016

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jan 13, 2017

I mentioned to @christophergandrud this morning that @leeper had opened this issue. At some point we should all put our heads together on this. 😄

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jun 25, 2017

@leeper @christophergandrud shoot, we should have talked about this during the Community Meeting! @leeper now that you've added the "dataverse" package to CRAN, do you have any more thoughts on this issue? How can we unblock it?

@leeper

This comment has been minimized.

Copy link
Member Author

commented Jun 25, 2017

From an API perspective, this should be pretty easy because it's just a matter of doing git checkout on the appropriate tag, zipping the contents (sans the .git folder) and dumping to the right SWORD endpoint.

It might make sense in the user interface as a plugin (as @pdurbin and I talked about for the Dropbox add file dialog) that does this from a specified git repo.

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jun 25, 2017

@leeper maybe I'm just hearing what I want to hear, but are you saying that you think it's possible to implement this feature entirely client-side, such as within https://github.com/IQSS/dataverse-client-r ? If so, can we move this issue to that repo?

@leeper

This comment has been minimized.

Copy link
Member Author

commented Jun 25, 2017

Let me try to make an example using R and then feedback to this issue about how well that goes.

@christophergandrud

This comment has been minimized.

Copy link

commented Jun 26, 2017

Nonetheless, it would be great if this was ultimately language agnostic.

(Sorry, off topic, but honestly my dream would be if Dataverse could act as a remote git repository).

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jun 26, 2017

@christophergandrud interesting. I guess supporting git would be language agnostic. Yes, this is all off topic but please see my "A Thought Experiment: Datasets As Git Repos" at https://docs.google.com/document/d/18WDIS8hrFJvMJBcnRuQ8NfD-VxGq32vJ9WwlEgyyWZs/edit?usp=sharing which I originally shared at https://groups.google.com/d/msg/dataverse-community/5zJrr03R9ZE/6ahp8ZgQwt8J .

@leeper I see you opened IQSS/dataverse-client-r#16 . Thanks! Please keep us posted.

@dlmurphy

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2018

An example of a real-world use case, from our notes on a UX interview we conducted with an Astrophysics librarian in 2016 that touched on this issue:

The researchers she works with primarily use Zenodo because of its GitHub integration. “Zenodo has a hook into GitHub. If you’re putting your code on GitHub, you can mint a DOI for a release of your code, and then it’ll be indexed by the Astrophysics Data System (which is like the Pubmed of Astronomy).” The researcher’s software is required to process the data, which is in the form of FITS images. You need the images AND the code for the data to be meaningful.

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jun 28, 2018

@leeper I keep thinking about the diagram you showed at the Dataverse Community Meeting (from https://osf.io/xfj5h/ ), how there was a mix of code and non-code (data.csv, paper, slideshow, website, citations, etc) in what I understand to be the recommendation for organizing your dataset in your field. For this "code deposit" feature, are you thinking you'd want anything that looks like this or are you thinking that you'd want a "code only" dataset that doesn't have your data, your paper, your slides, etc? Here's the diagram:

leeper

Others are welcome to comment on how this feature should work as well! I'm just asking Thomas since he opened it. 😄

@leeper

This comment has been minimized.

Copy link
Member Author

commented Jun 28, 2018

I'd want to deposit the whole project with folder/file hierarchies into single dataset.

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jun 28, 2018

@leeper cool, thanks. Would https://github.com/leeper/rio be a good example of a repo that you'd consider depositing into Dataverse if/when this feature were available? Or are there other repos that would be better examples?

@mercecrosas

This comment has been minimized.

Copy link
Member

commented Jun 28, 2018

@dlmurphy

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

Here's our design team's document for summarizing what we know about this issue and considering next steps:

https://docs.google.com/document/d/1Wa8OJBftzJs_v9QDeccRanx6S0MNoiuPjqnRaL20cNk/edit

@mheppler mheppler changed the title Zenodo-style Github Integration Code Deposit - Github Integration Sep 19, 2018

@pdurbin

This comment has been minimized.

Copy link
Member

commented Sep 20, 2018

A couple things:

  • rOpenSci just published a blog post called "Building Reproducible Data Packages with DataPackageR" that says, "When a manuscript is submitted based on a specific version of a data package, one can make a GitHub release and automatically push it to sites like zenodo so that it is permanently archived." https://ropensci.org/blog/2018/09/18/datapackager/
  • "Papers with code" seems like an interesting dataset that's highly related to this issue: https://github.com/zziz/pwc . I wonder if we should ask @zziz if there are any plans to publish this dataset in a repository.

Hat tip to @amoeba from @whole-tale for putting both of these on my radar!

@zziz

This comment has been minimized.

Copy link

commented Sep 20, 2018

Dear @pdurbin, dataset is already available on the repository as a CSV file. But I don't recommend to use it just yet. I have very recently started this project and there are some works yet to be finished. You are welcome to use is it right now, but I recommend to check back in a month.

@poikilotherm

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

My 2 cents to this: please keep in mind (at least for later extension):

@poikilotherm

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2018

Please let me bring to your attention that there are efforts to integrate GitLab with Zenodo (based on Invenio at CERN). Maybe the changes needed at the GitLab side could be usefull for or even aligned with Dataverse?

See especially this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.