Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore button for Binder #6807

Closed
atrisovic opened this issue Apr 9, 2020 · 29 comments · Fixed by #9341
Closed

Explore button for Binder #6807

atrisovic opened this issue Apr 9, 2020 · 29 comments · Fixed by #9341
Labels
Feature: External Tool pm.netcdf-hdf5.d All 3 aims are currently under this deliverable Size: 3 A percentage of a sprint. 2.1 hours.
Milestone

Comments

@atrisovic
Copy link
Member

So far the Explore button in DV takes us to Whole Tale:

image

It would be good to discuss adding Binder also. This is already possible on their side, as one can explore Dataverse datasets with a DOI. See here: https://mybinder.org and in picture:

image

@pdurbin
Copy link
Member

pdurbin commented Apr 9, 2020

In order to support an "Explore" button for Binder, we'd need to make the Dataverse "external tool" framework more flexible in the URLs it can construct.

For example, the Explore button would need to be able to send users to URLs such as https://mybinder.org/v2/dataverse/10.7910/DVN/TJCLKP/ with parts of the DOI in the "path" of the URL.

Here's that URL again in context:

Screen Shot 2020-04-09 at 4 57 26 PM

A work around is to add a "launch binder" button to the description of your dataset like this:

Binder

@Xarthisius
Copy link
Contributor

A work around is to add a "launch binder" button to the description of your dataset

Or you can try this External Tool definition:

{
  "displayName": "Binder",
  "description": "Run on Binder",
  "scope": "dataset",
  "type": "explore",
  "toolUrl": "https://girder.hub.yt/api/v1/ythub/dataverse",
  "toolParameters": {
    "queryParameters": [
      {   
        "datasetPid": "{datasetPid}"
      },  
      {   
        "siteUrl": "{siteUrl}"
      },  
      {   
        "key": "{apiToken}"
      }   
    ]   
  }
}

@pdurbin
Copy link
Member

pdurbin commented Apr 10, 2020

@Xarthisius works great! Thanks! If anyone out there wants to add "Binder" under "Explore" at the dataset level, this should do the trick. Just put the JSON in a file and follow http://guides.dataverse.org/en/4.20/admin/external-tools.html#adding-external-tools-to-dataverse

The only limitation is that the DOI for the dataset needs to be real.

@mbamouni
Copy link

Hi,
I setup my ow jupyterhub and I would like to connect it to dataverse.
So I would like to know how I can do that?
@Xarthisius Xarthisius : in the above json , does the toolUrl is native jupyterhub/binderHub url or is it a custom development?

Best regards

Michel

@Xarthisius
Copy link
Contributor

Hi Michel,

in the above json , does the toolUrl is native jupyterhub/binderHub url or is it a custom development?

It's a custom thing, that I'm hosting. Source is here: data-exp-lab/girder_ythub@ec3f756

I setup my ow jupyterhub and I would like to connect it to dataverse.
So I would like to know how I can do that?

I need a little bit more explanation about what you mean by that? BinderHub already supports dataverse datasets as a source for Binders by default (since jupyterhub/binderhub#969),

@mbamouni
Copy link

mbamouni commented Feb 11, 2021

Hi ,

@Xarthisius :Thanks for the answer.

To answer your question :
"I need a little bit more explanation about what you mean by that?" ==>

I setup a jupyterhub cluster using kubernates with a persistent storage for each user.
From dataverse interface, when user is on a dataset with files page, I would like to give the user the ability to choose "jupyterhub" in the explore button dropdown and the files in the dataset would be automatically send to his jupiterhub space (like in the join picture). Addition needs is to go back to dataverse from jupyterhub web page.
my need is some thing
SampleOfNeed
like this : https://hub.gke2.mybinder.org/user/10.7910-dvn-tjclkp-kptl7zfn/tree but with jupyterhub and not binder.
I hope, it's more clear.

Best regards

@Xarthisius
Copy link
Contributor

I setup a jupyterhub cluster using kubernates with a persistent storage for each user.
From dataverse interface, when user is on a dataset with files page, I would like to give the user the ability to choose "jupyterhub" in the explore button dropdown and the files in the dataset would be automatically send to his jupiterhub space (like in the join picture). Addition needs is to go back to dataverse from jupyterhub web page.
my need is some thing

WholeTale does exactly that for Dataverse. It's not "jupyterhub" per se, but it offers the same functionality and more. @pdurbin do you have any instance of DV with WholeTale external tool integration enabled so that @mbamouni could see that in action?

Here's link to a yt video: https://www.youtube.com/watch?v=AoSpQ3A7poY

@mbamouni
Copy link

mbamouni commented Feb 16, 2021

@Xarthisius :I try whole tale also but I would like a lightweight tool like jupyterhub to give multiple choice to the end user.
Indeed, I would like to give the user the ability to choose between whole tale, renku, Jupyterhub, ... for the computional options. So do you know if I can connect a jupyterhub to a dataverse?

For whole tale, I connect it to my dataverse but when I click on the explore button , I'm redirect weel to whole tale but I get
the below error when I try to create the tale with the data from my dataverse:
"RuntimeError: RuntimeError('Lookup for "https://dataverse-test.ouvrirlascience.fr/dataset.xhtml?persistentId=doi:10.70112/1RSUYF" failed with: Failed to get size for https://dataverse-test.ouvrirlascience.fr/dataset.xhtml?persistentId=doi:10.70112/1RSUYF')"

WholeTaleError

Best regards

@pdurbin
Copy link
Member

pdurbin commented Feb 16, 2021

@pdurbin do you have any instance of DV with WholeTale external tool integration enabled so that @mbamouni could see that in action?

Yes, Whole Tale is enabled on https://demo.dataverse.org . I just tried it and it seems to work fine (screenshot below). See also this announcement from @craig-willis about this integration: https://groups.google.com/g/dataverse-community/c/ZJg-_gS4n1g/m/qw2NNClCBgAJ

Screen Shot 2021-02-16 at 11 03 51 AM

@mbamouni
Copy link

@pdurbin after connecting to demo.dataverse, I try to create a tale in whole tale using the "Hierachy file" dataset and I got an error :
"
RuntimeError: RuntimeError('Lookup for "https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/ULHPHW" failed with: Failed to get size for https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/ULHPHW')
"
Do I need a specific account to send data to whole tale?

Best regards,
WholeTaleError

Michel

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2021

@mbamouni huh. I got the same error. Can you please create a new issue about this?

Here's what I see:

Screen Shot 2021-02-17 at 12 16 56 PM

@Xarthisius
Copy link
Contributor

That's because we don't allow registration of data as Dataverse resources outside of official DV deployments [1]
[1] https://iqss.github.io/dataverse-installations/data/data.json

@pdurbin
Copy link
Member

pdurbin commented Feb 17, 2021

@Xarthisius hmm, maybe we should pick this up in #6446 but it used to work.

@mbamouni
Copy link

@Xarthisius is it a good practise to setup my own whole tale instance?

@Xarthisius
Copy link
Contributor

@mbamouni I don't know if that's a good practice, but it's certainly worth a try! :) If you want some pointers maybe visit our slack? The link is at the bottom of our homepage (https://wholetale.org)

@Xarthisius
Copy link
Contributor

@Xarthisius hmm, maybe we should pick this up in #6446 but it used to work.

It used to at some point, but after some consideration it was limited only to production DV deployments. We don't want our user to publish research objects with reference to fake DOIs. I'm sure the same logic applies if you reverse the problem. You don't want testing instances of WholeTale publishing to Harvard's Dataverse, do you?

@mbamouni
Copy link

mbamouni commented Apr 12, 2021

Hello,
@Xarthisius : I saw that there is no more committs on the whole tale project on github since at least one year. So I woul like to know if the project is always maintained. I would like also to have some example of the actual whole tale users in order to inspire what they did. To finish, is it possible to deploy whole tale on kubernetes or it only works on docker swarm?

Best regards,

Michel

@Xarthisius
Copy link
Contributor

@Xarthisius : I saw that there is no more committs on the whole tale project on github since at least one year. So I woul like to know if the project is always maintained.

I dodged a bullet here. I'm glad you're contacting me rather than our PIs directly. They wouldn't be happy to know we haven't been doing a thing for over one year! So let's keep it a secret...

Joking aside. I don't know where you looked but we tagged 1.0rc1 10 days ago with a bunch of new features and AFAICT last commit to the project happened 19h ago. I'd suggest watching here: https://github.com/whole-tale/

I would like also to have some example of the actual whole tale users in order to inspire what they did.

I'm not sure how should I respond to that, because it sounds like you asking for emails and personal info of our users, which of course I can't give you cause that would be neither legal, nor ethical... If you want to look for inspiration go to https://dashboard.wholetale.org and see public Tales.

To finish, is it possible to deploy whole tale on kubernetes or it only works on docker swarm?

Yes it is possible. We have a proof of concept for that even, but we didn't see any significant gain in doing that migration and it has bit-rotten since then.

@pdurbin
Copy link
Member

pdurbin commented Jan 19, 2023

@siacus and I have been talking a bit about containers so I spend a few minutes validating the idea that we can very easily (with a single curl command) load up the JSON @Xarthisius supplied above to put a Binder button on every dataset.

There was seemingly a small issue redirecting the toolUrl (girder.hub.yt to mybinder.org) but @Xarthisius fixed it in record time (thanks!):

I'd like the definition of done for this issue to be updating the guides to link to place outside our guides (per our policy) for the JSON above. Here I'm talking to @Xarthisius on where to host it:

Meanwhile, I don't see any reason to wait on trying out the Binder button on Harvard Dataverse. I just gave @sbarbosadataverse a demo (would like to show @siacus too) and created this issue:

Finally, @atrisovic just wrote up a related design doc for Binder stuff but I'll let her drop a link in here when she's ready (I haven't had a chance to read it closely yet). 😄

Oh, there's so little to do (minor doc change) that I'm giving this a size of 3. All we need is a URL to link to for the JSON file.

@pdurbin pdurbin added Feature: External Tool Size: 3 A percentage of a sprint. 2.1 hours. labels Jan 19, 2023
@atrisovic
Copy link
Member Author

Hi All, I love the idea to bring a "Binder button" to Dataverse and, as Phil @pdurbin mentioned, I am working on a technical design report to document it.

For the datasets that have some code (R, python, other) or specific data formats (ie netcdf, parquet), it would be great to add some env support for exploring them. Is there a way to add a Dockerfile or environment.yml on the fly in the Dataverse -> Binder workflow (that can be then used in repo2docker)?

(cc @Xarthisius)

@Xarthisius
Copy link
Contributor

I don't know if that's possible with Binder / BinderHub. I'm really not affiliated with them and thus cannot speak for the project.

@mreekie mreekie moved this from NIH NetCDF (Phil) to 3️⃣▶ 💨👟SPRINT READY BACKLOG in IQSS Dataverse Project Jan 25, 2023
@mreekie mreekie moved this from 3️⃣▶ 💨👟SPRINT READY BACKLOG to 4️⃣▶⏱In This Sprint in IQSS Dataverse Project Jan 26, 2023
@pdurbin pdurbin added this to the 5.13 milestone Jan 27, 2023
@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2023

@siacus and I just discussed this issue. We'll be documenting Binder as the second dataset-level external tool. Whole Tale was the first:

Screen Shot 2023-01-27 at 11 24 10 AM

@craig-willis
Copy link
Contributor

@pdurbin and @siacus

Discussing this with @Xarthisius and we don't think using the girder.hub.yt integration is the proper long term solution. It was hosted there originally as an expedient way to work around how external tools currently handles parameter replacement.

In short, the provided API converts the Dataverse-style external tools URL:

https://girder.hub.yt/api/v1/ythub/dataverse?datasetPid=doi:a/FK2/U6AEZM&siteUrl=https://demo.dataverse.org

into the format accepted by Binder:

https://mybinder.org/v2/dataverse/{datasetPid}

For example:

curl -I "https://girder.hub.yt/api/v1/ythub/dataverse?datasetPid=doi:10.5072/FK2/U6AEZM&siteUrl=https://demo.dataverse.org"
HTTP/2 303
...
location: https://mybinder.org/v2/dataverse/10.5072/FK2/U6AEZM

If the external tools handler was able to do variable replacement in the toolUrl instead of just query parameters, then the girder.hub.yt API would be necessary. Something along the lines of:

{
  "displayName": "Binder",
  "description": "Run on Binder",
  "scope": "dataset",
  "type": "explore",
  "toolUrl": "https://mybinder.org/v2/dataverse/{datasetPid}",
  "toolParameters": {
    "pathParameters": [
      {   
        "datasetPid": "{datasetPid}"
      }
    ]   
  }
}

This looks to be almost possible today with replaceTokensWithValues used on the toolUrl (e.g., https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/externaltools/ExternalToolHandler.java#L186)

Just note that girder.hub.yt is a potential point of failure and requires someone outside of Binder and Dataverse to maintain the service.

@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2023

I just made a pull request:

@craig-willis I agree 100% with you. In that PR I also expressed the need for Dataverse's external tool framework to be able to put DOIs in the path of the URL.

However, this was also mentioned above ( #6807 (comment) ) nearly three years ago! 😄

For now (in the PR) I'm linking to girder_ythub but yes, yes, yes, we need to fix up our external tool framework!

@pdurbin pdurbin removed their assignment Jan 27, 2023
@craig-willis
Copy link
Contributor

@pdurbin is there an existing issue or should I create one?

@pdurbin
Copy link
Member

pdurbin commented Jan 27, 2023

@craig-willis there is not. Please feel free!! Thank you!! ❤️

@atrisovic
Copy link
Member Author

Also, when sending files from Dataverse to Binder, they should be retrieved in the original format (ie csv instead of tab), something along the lines:

http://dataverse.harvard.edu/api//access/datafile/FILE_ID?format=original

@pdurbin
Copy link
Member

pdurbin commented Jan 30, 2023

@craig-willis thanks for creating this follow up issue:

Much appreciated! 🎉

@atrisovic yes, agreed, but I think we need to make a PR on repo2docker for that 🤔

kcondon added a commit that referenced this issue Jan 31, 2023
in the docs, add Binder as an external tool #6807
@mreekie mreekie moved this from ▶Sprint Kickoff! to 🚮Clear of the Backlog in IQSS Dataverse Project Feb 6, 2023
@mreekie mreekie added the pm.netcdf-hdf5.d All 3 aims are currently under this deliverable label Mar 30, 2023
@mreekie
Copy link

mreekie commented Mar 30, 2023

grooming:

  • This applies towards the Jupyter notebook related aim in the funding.
  • Added tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: External Tool pm.netcdf-hdf5.d All 3 aims are currently under this deliverable Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants