-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code Deposit Spike - GitHub Webhooks #5209
Comments
Cool! FWIW - if your workflow involves the existing workflow mechanism is Dataverse, you might want to start from #5048 that is winding its way through the process. It lets you send any Dataverse setting you want to the workflow and fixes some transaction issues (at least in local workflows, I'm not as sure the problem existed for workflows that do callbacks). As an example, #5049 is a workflow to submit a zipped bag to the Digital Preservation Network and it gets the hostname/port, etc. from Dataverse settings. |
After doing some research, here are some answers to our initial questions:
Other interesting info I found: |
Releases in GitHub are based on Git tags. See https://help.github.com/articles/about-releases/ I'll probably have a number of questions but one that's top of mind is this:
|
It looks like github webhooks do not retry. Some remediation efforts I can see for this are provide some way through the UI for the user to publish a release from github manually. Or maybe we poll at regular intervals ontop of the webhook? Or maybe we just poll? A good thing to discuss in our next group. If we decide to do polling, there are rate limits to take into account. These are per individual, with a limit of 5000 for that individual across all applications. It looks like deep in the github UI there is a way to see if your webhooks succeeded. Its not obvious to the user on on github, we can get the info via the API but it looks to involve sifting through a lot of junk (link) . |
Thanks for the research, @matthew-a-dunlap! Later today I plan to really dive into the answers you gave and start developing new mockups. If I have any followup questions, I'll post them here! |
RE: Question 3, How can we determine which GitHub repositories a user can select from in Dataverse? In addition to the info Matthew posted above, I want to include this info I found from one of the pages he linked:
Any dropdown or type-ahead selector we include can allow the Dataverse user to select any repos that fit those criteria. |
A follow-up question for @matthew-a-dunlap or anyone else in the know, RE: Question 5, "What metadata can we get from the GitHub repo?" Could you please list more specifically what metadata we can pull from a repo, or link to a page with that info? I'm having a hard time finding that info. |
@dlmurphy Regarding Q5: This page has info on all the objects that can be queried via the api, and their attributes . Including release, user, repository and organization. |
@dlmurphy if you look at the dataset in https://dataverse.harvard.edu/dataverse/open-source-at-harvard you can get a sense of the metadata you can get from GitHub for one of their repos. Here's a handy link to Data Explorer: https://scholarsportal.github.io/Dataverse-Data-Explorer/?fileId=3040230&siteUrl=https://dataverse.harvard.edu and to the JSON GitHub exposed for our "dataverse" repo back in July last year: https://github.com/pdurbin/open-source-at-harvard-primary-data/blob/master/2017-07-31/IQSS-dataverse.json |
Thanks, guys. That answers that question pretty well! I'm happy with the answers we've gathered, but I want to let this issue simmer until we can go over these answers in a design team meeting, perhaps this Wednesday. |
Following today's design meeting, we decided that we'd like the next step for this spike to include:
|
@dlmurphy - FYI, in standup today, there was some discussion regarding the prototype, mostly related to whether or not it includes a front end. Some folks may check in with you. |
Just talked about this with the design team -- we don't need a UI for this prototype. |
To be more specific, we're looking for a prototype that:
The prototype doesn't need a frontend. @mheppler, please feel free to weigh in on this, you might have a better idea of what's helpful here. |
Yesterday I demo'ed some code I hacked together as of b9305c0 to @djbrooke @scolapasta @TaniaSchlatter @mheppler @dlmurphy @jggautier and @kcondon All functionality is API only for now. There are two steps:
The result is that a file is created that looks like the screenshot below from https://dev1.dataverse.org/file.xhtml?persistentId=doi:10.5072/FK2/FS7M3O/EBNKNB I had to leave to pick up my kids before any decisions were made about next steps. |
Below is a more readable version of the output from https://api.github.com/repos/IQSS/Zelig that I shoved into the file description above. Please note that I believe that this is only the tip of the iceberg in terms of metadata that we could pull out of GitHub for a repo. The content is most URLs for pulling out additional information. The items that I find interesting are:
|
Upon discussing this in more detail this morning with @djbrooke @TaniaSchlatter @pdurbin , an update on expected results of this spike:
Also, scrap the "cherry pick and format the metadata" suggestion. That can be done when this full feature moves to development. We know that can be done and don't need a spike to prove it. |
The short answer to the question above is "I don't know" because I struggle mightily with JSF. We can try to get both working so we have options. It will take time. Meanwhile, below my todo list from the dev perspective. This is the logical order in which to work on the code.
|
@pdurbin - Let's put the brakes on this for now (except #1 IMHO). We are verifying a proposed approach with @mercecrosas and I'd like to revisit the technical architecture after that. Let's pick this up when you're back next week. Apologies for the confusion. |
@atrisovic created an awesome GitHub integration using GitHub Actions.
I think we should close this issue. We can create a fresh one if we still want to investigate webhooks. |
Related: #2739
We're currently working on designing a workflow that will allow Dataverse users to connect their GitHub accounts using GitHub's webhooks, and then import a GitHub repo for deposit into Dataverse in the form of a dataset containing a .zip with all files from the repo. Users will be able to configure a "sync" whereby whenever a GitHub repo is updated in a specific way (probably when a release is minted), the dataset in Dataverse will be automatically updated.
This spike will help us learn more about what's possible when using GitHub webhooks with Dataverse.
Some questions we have that will help inform the design of this feature:
E.g. releases, commits, pull requests
a. Can a Dataverse user authenticate w/ GitHub in a popup, and have that trigger a change to the Create Dataset or Add Files page?
a. E.g. Can the Dataverse user select from a list of GitHub repos he owns, or repos he has certain other permissions on?
The text was updated successfully, but these errors were encountered: