Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Explore #5028

Open
djbrooke opened this issue Sep 6, 2018 · 19 comments

Comments

@djbrooke
Copy link
Contributor

commented Sep 6, 2018

Dataverse has the infrastructure to support community built, file-level external tools (TwoRavens, Data Explorer). Similar infrastructure should be added at the dataset level to support the Code Ocean integration funded by the recent Sloan grant. Additionally, supporting Binderverse (#4714), SBGrid's reprocessing tool (ping @pameyer), and other future tools should be considered.

@pdurbin

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

This issue represents the initial step toward integration of Dataverse with Code Ocean (Sloan grant) and hopefully related tools such as Binder and Whole Tale (community efforts) so I'm going to leave a bit of a brain dump of recent happenings.

On Tuesday during our regular community call, the Code Ocean team shared their screen and we talked through the future integration at a pretty high level. Notes can be found at https://groups.google.com/d/msg/dataverse-community/HPLziKZbOAc/q_XEqyKEBwAJ

Yesterday @djbrooke and I called in to the first meeting of the Open Science Infrastructure Working Group organized by @craig-willis from @whole-tale to discuss a variety of computation and reproducibility topics with @aprilcs (Code Ocean) @donsizemore and @tlchristian (Odum) @choldgraf and @aculich (Binder #4714) @craig-willis , @Xarthisius and @amoeba (Whole Tale #5097) and others. Notes at https://docs.google.com/document/d/1bOVWBfhOiKGU2dYoHN_Pkpv5zPI6y819UISfkTqYpoQ/edit?usp=sharing

This morning I attended a fantastic Code Ocean workshop given by @aprilcs and I'd be happy to walk anyone through her tutorial which is captured very nicely in her slides ( http://bit.ly/harvard-oa-week ), example capsule ( https://bit.ly/2NanJLc ), and git repo ( https://github.com/aprilcs/candy_trade ). I'd also like to note that she features https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EZSJ1S as an example of a dataset that has an excellent README describing how to reproduce results using code and data in the dataset.

@mheppler and discussed this issue this morning and to me the next logical step is to make a decision on where on the dataset page the button should go that brings the user from the dataset to Code Ocean, Whole Tale, Binder, etc., sort of like how the "Explore" button can bring you to multiple external tools at the file level such as Data Explorer and Two Ravens:

screen shot 2018-10-25 at 3 54 33 pm

@pdurbin

This comment has been minimized.

Copy link
Member

commented Nov 15, 2018

@craig-willis gave us some great feedback on how he assumed the external tool manifest would allow him express to Dataverse how to compose the final URL. This is what he tried:

"queryParameters": [
  {
    "url": "{siteUrl}/api/access/datafile/{fileId}?key={apiToken}"
  }
]

Instead, for now, he'll have to compose the final URL on his side, much like how Data Explorer does. For example base_url=detailsURL.siteUrl+"/api/access/datafile/" at https://github.com/scholarsportal/Dataverse-Data-Explorer/blob/v1.0/assets/js/controllers/details.js#L481

For completeness, here's how the query parameters look in the Data Explore external tool manifest:

"queryParameters": [
   {	
    "fileId": "{fileId}"	
   },	
   {	
    "siteUrl": "{siteUrl}"	
   },	
   {	
    "key": "{apiToken}"	
   }
]

When we work on this issue #5028 we should at least consider this feedback.

@pdurbin

This comment has been minimized.

Copy link
Member

commented Nov 16, 2018

This morning I attended a fantastic Code Ocean workshop

Just a heads up that the Code Ocean interface has completely changed (though the old interface I learned is still available) and is now based on @jupyterlab: https://medium.com/codeocean/new-jupyterlab-based-capsule-page-d618f34bc636

@pdurbin

This comment has been minimized.

Copy link
Member

commented Nov 30, 2018

There's lots more discussion on Code Ocean on #4714 especially starting at #4714 (comment)

@pdurbin

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

There's lots of excitement on Twitter about Nature Scientific Data's "Call for submissions: Reproducible data processing" announcement: https://twitter.com/mercecrosas/status/1072899669074821122

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

I'm blocked on my dream of demo'ing the launching and execution of Jupyter Notebooks from Dataverse using Whole Tale until we implement external tools at the file level or until Whole Tale picks up this issue they just asked me to open as a workaround: whole-tale/whole-tale#66

@pdurbin pdurbin added this to Needs triage in pdurbin via automation Jun 28, 2019

@pdurbin pdurbin moved this from Needs triage to Can't Wait in pdurbin Jun 28, 2019

@djbrooke

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

See #5028 (comment) for an updated list.

Before development begins we need to:

  • Scope (what users have the option to explore? Do we expect multiple tools here? Do we include compute in this?)
  • Determine whether or not we can provide Dataset Level Explore from the current Dataset page
  • (technical planning) Architecture to support external tools framework at the Dataset Level
  • Determine where this fits in on the dataset redesign #3404 (probably "Use Dataset")
  • Implement front end code and ensure accessibility (before sprint or in sprint)
@pdurbin

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

This is a good list.

I would add that we should check in with applications we have already integrated with that operate on all files in the dataset (Whole Tale, Mass Open Cloud) and those that we want to (Code Ocean, Binder) and make sure we're on the same page with regard to the URL the user will land on for the external tool.

Yesterday at jupyterhub/binderhub#900 (comment) a Binder developer indicated that a the dataset level he is (preliminarily) hoping Dataverse users will be sent to URLs like this:

https://mybinder.org/v2/dataverse/10.7910/DVN/RLLL1V

(I picked that dataset because I see some Jupyter Notebooks in it and because its from AJPS so the content has been curated has the note "This dataset underwent an independent verification process that replicated the tables and figures in the primary article.")

If anyone is using the "Compute" button I don't know of it (the MOC installation seems to be down) but it would be good to revisit what the URLs look like that the user is sent to. If memory serves, there was a query parameter for the Swift container which looked something like a DOI. This is not implemented as an external tool but maybe it should be.

So, in summary, here's what I would add to the list:

  • Come up with a list of example URLs that users will be sent to for this or that dataset-level external tool provider.
  • Consider refactoring the "Compute" button into an external tool.
@djbrooke

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

Before development begins we need to:

  • Determine whether or not we can provide Dataset Level Explore from the current Dataset page.
    Yes! @mheppler will prototype a solution that uses the compute button.
  • (technical planning) Architecture to support external tools framework at the Dataset Level.
    Yes, next week. Either Tech Hours or another meeting (@scolapasta will lead)
  • Scope
    What users have the option to explore? Researchers, Replication analysts, pre and post-publish
    Do we expect multiple tools here? Yes
  • Implement front end code and ensure accessibility (before sprint or in sprint)
    Future work as we will roll the accessibility in to the new Dataset page redesign
  • Determine where this fits in on the dataset redesign #3404
    Future work as we'll deliver this in the current infrastructure

Questions

  • For @mercecrosas - do we consider "Compute" and "External Tools" to be the same or do we view them as separate concepts and separate workflows? It will be helpful to understand the future vision for "Compute" as in the design meeting we had arguments for both approaches.
    Answer from @mercecrosas below: #5028 (comment)
@pdurbin

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

I just created a spreadsheet called "External Tools" to help in classifying the state of the various tools we talked about this morning and that are otherwise on our radar: https://docs.google.com/spreadsheets/d/1OwIxpgpWVPDPSFwDsnPfk8ivNRXUIiaAlnbFccCnfsQ/edit?usp=sharing

Here's a screenshot:

Screen Shot 2019-07-17 at 1 04 57 PM

@djbrooke djbrooke assigned mheppler and unassigned djbrooke and TaniaSchlatter Jul 18, 2019

@pdurbin pdurbin moved this from Can't Wait to High priority in pdurbin Jul 19, 2019

@djbrooke djbrooke changed the title Dataset External Tools Dataset Explore Jul 23, 2019

@djbrooke djbrooke moved this from UI/UX Design 💡📝 to Ready 🙋 in IQSS/dataverse Jul 23, 2019

@djbrooke djbrooke assigned scolapasta and unassigned mheppler and scolapasta Jul 23, 2019

@djbrooke djbrooke moved this from Ready 🙋 to IQSS Sprint 7/24 - 8/7 in IQSS/dataverse Jul 23, 2019

@scolapasta

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Met at tech hours, we decided we would just add another column for "scope": dataset or file.

We will also clearly need to modify logic to not require file id for dataset tools.

@pdurbin also brought up the idea of being able to send info in not just as query parameters but in a more RESTful way (by request of Whole tale). We decided that we will work on this after we have the initial ability to support dataset tools in general.

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jul 23, 2019

Oh, I was advocating for it to be in scope for this issue to support putting DOIs and other values in the "path" like the https://mybinder.org/v2/dataverse/10.7910/DVN/RLLL1V example above. I thought we said we'd extend the "toolParameters" definition for this.

@djbrooke

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

Thanks. The checklist above (#5028 (comment)) is all finished. We'll bring this to Sprint Planning tomorrow.

@djbrooke

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2019

  • We'll be able to verify this is working by looking at the URL generated and we don't need a sample tool (or a functioning dataset-level tool) to test this

mheppler added a commit that referenced this issue Jul 25, 2019

@mheppler

This comment has been minimized.

Copy link
Contributor

commented Jul 25, 2019

Wired up placeholder Explore btn in the top, action btn section of the dataset pg. Ready to wire up to the backend up. Included some comments about needed render logic and ui:repeat component.

Included exploreTools.size()>1 render logic to show dropdown vs single btn depending on how many tools are configured, that is currently used at the file level in file-download-button-fragment.xhtml.

Screen Shot 2019-07-25 at 12 34 48 PM

pdurbin added a commit that referenced this issue Jul 26, 2019

@pdurbin pdurbin moved this from IQSS Sprint 7/24 - 8/7 to IQSS Team Dev 💻 in IQSS/dataverse Jul 26, 2019

@pdurbin pdurbin self-assigned this Jul 26, 2019

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

"url": "{siteUrl}/api/access/datafile/{fileId}?key={apiToken}"

I just noticed above that @craig-willis is also interested in being able to modify the path. In the example above, he assumed he'd be able to add the file id to the path.

I just made pull request #6059 but I did not implement the ability to modify the path. I'm going on vacation next week so it was quicker to just get something working. If someone else wants to hack on the code further and add the ability to manipulate the path I think that would be great as it makes Dataverse's external tools much more flexible.

I did implement a new keywork so that you can pass the DOI or Handle to an external tool.

@pdurbin pdurbin removed their assignment Jul 26, 2019

@pdurbin pdurbin removed this from IQSS Team Dev 💻 in IQSS/dataverse Jul 26, 2019

@pdurbin pdurbin removed this from High priority for pdurbin in pdurbin Jul 26, 2019

@pdurbin

This comment has been minimized.

Copy link
Member

commented Jul 26, 2019

pdurbin added a commit that referenced this issue Jul 26, 2019

@pdurbin

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

Two things.

@mercecrosas @djbrooke and I had a nice meeting with Code Ocean yesterday and we invited them to think about external tools at the dataset level and comment here if they'd like. Afterward I knocked together this VERY PRELIMINARY diagram for one of the three main use cases we talked about:

codeocean-reproducibility

This morning I reached out to Renku to let them know that external tools at the dataset level are coming. @rokroskar just shared some thoughts about two potential use cases or user stories at SwissDataScienceCenter/renku-python#536 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.