Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5028 Add dataset level external tools #6059

Open
wants to merge 7 commits into
base: develop
from

Conversation

@pdurbin
Copy link
Member

commented Jul 26, 2019

For #5028

@pdurbin pdurbin added this to Code Review 🦁 in IQSS/dataverse via automation Jul 26, 2019

@pdurbin pdurbin referenced this pull request Jul 26, 2019
@coveralls

This comment has been minimized.

Copy link

commented Jul 26, 2019

Coverage Status

Coverage increased (+0.004%) to 19.505% when pulling c0df37d on 5028-dataset-explore-btn into dffb5f6 on develop.

@pdurbin pdurbin requested review from qqmyers and lubitchv Jul 26, 2019

@@ -2016,6 +2021,7 @@ private String init(boolean initFull) {

configureTools = externalToolService.findByType(ExternalTool.Type.CONFIGURE);
exploreTools = externalToolService.findByType(ExternalTool.Type.EXPLORE);

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

findByType isn't filtering by scope yet - should these use findByTypeAndScope with file scope?

This comment has been minimized.

Copy link
@pdurbin

pdurbin Jul 26, 2019

Author Member

Yes, they probably should.

@@ -131,6 +147,7 @@ public static ExternalTool parseAddExternalToolManifest(String manifest) {
String displayName = getRequiredTopLevelField(jsonObject, DISPLAY_NAME);
String description = getRequiredTopLevelField(jsonObject, DESCRIPTION);
String typeUserInput = getRequiredTopLevelField(jsonObject, TYPE);
String scopeUserInput = getRequiredTopLevelField(jsonObject, SCOPE);
String contentType = getOptionalTopLevelField(jsonObject, CONTENT_TYPE);
//Legacy support - assume tool manifests without any mimetype are for tabular data
if(contentType==null) {

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

This is going to set a content type on dataset level tools

This comment has been minimized.

Copy link
@pdurbin

pdurbin Jul 26, 2019

Author Member

Yes, this is a bug. We don't want dataset tools to have a content type. Good catch. Thanks.

This comment has been minimized.

Copy link
@scolapasta

scolapasta Jul 26, 2019

Contributor

@qqmyers eventually we may use this - for example, show the dataset level explore only if the dataset has at least one file of content type X - but for now we're going with simple.

This comment has been minimized.

Copy link
@landreev

landreev Aug 7, 2019

Contributor

If I understand @scolapasta's suggestion - we may have a use for the content type, with some dataset-level external tools; but it sounds like it still needs to be nullable in the table; and it still should be null by default.
In other words, if we adopt this scheme, then we definitely DON'T want to set the type to tab-delimited by default.

ReservedWord reservedWord = ReservedWord.fromString(value);
if (reservedWord.equals(ReservedWord.FILE_ID)) {
allRequiredReservedWordsFound = true;
if (scope.equals(Scope.FILE)) {

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

else Scope.DATASET - just not written yet?

This comment has been minimized.

Copy link
@pdurbin

pdurbin Jul 26, 2019

Author Member

Well, I'm not sure. Should dataset tools have any reserved words that are required?

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

without an else statement, allRequiredReservedWordsFound is false and an exception happens so this method doesn't work for scope dataset as is.
w.r.t. reserved words - it would be odd/bad to configure a dataset tool to not get the dataset id (or pid) in the same way that a file tool needs a file id (or pid if that's implemented).

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

ah sorry - I see the exception is in the if statement - probably should not define the allRequiredWordsFound outside the if clause then (to avoid confusing people like me :-) )

- ``{apiToken}`` (optional) - The Dataverse API token of the user launching the external tool, if available.
- ``{datasetId}`` (optional) - The ID of the dataset containing the file.
- ``{datasetId}`` (optional) - The ID of the dataset.
- ``{datasetPid}`` (optional) - The Persistent ID (DOI or Handle) of the dataset.

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

Why this and not file PID too? Since Dataverse is always generating this, I'm not sure it helps anyone to have both id and pid (a tool can get the pid), but it seems like file and dataset should be the same.
Also - right now we check for fileId existing (for file scope) as a required condition. For dataset, and file if it is made parallel, the condition will have to change to be that id or pid is required.

This comment has been minimized.

Copy link
@pdurbin

pdurbin Jul 26, 2019

Author Member

Good point. The focus right now is on dataset level tools but sure, if a file PID exists it would hurt to expose it. Tool makers should just understand that they might not always get a file PID.

I'm not sure if I grok your other comment but thanks for making it. 😄 Someone else will be picking this up while I'm on vacation, I suspect.

@@ -0,0 +1,3 @@
ALTER TABLE externaltool ADD COLUMN IF NOT EXISTS scope VARCHAR(255);
UPDATE externaltool SET scope = 'FILE';
ALTER TABLE externaltool ALTER COLUMN scope SET NOT NULL;

This comment has been minimized.

Copy link
@qqmyers

qqmyers Jul 26, 2019

Member

should content type now be nullable (in the db and class definition) since it isn't used for scope dataset?

This comment has been minimized.

Copy link
@pdurbin

pdurbin Jul 26, 2019

Author Member

As mentioned above, due to a bug in the pull request, dataset tools get the default content type. Yes, I think this field should be made nullable (and the bug fixed).

This comment has been minimized.

Copy link
@landreev

landreev Aug 7, 2019

Contributor

So what is the status of this - is this being addressed? I.e., is the bug in ExternalToolServiceBean getting fixed in this PR, and is ContentType being made nullable?

@qqmyers
Copy link
Member

left a comment

Made some comments re: minor changes. Overall, it looks OK but incomplete w.r.t. scope dataset. I didn't run it, but it looks like it should not break file scope tools, so it should be OK to merge.

@djbrooke djbrooke moved this from Code Review 🦁 to IQSS Team Dev 💻 in IQSS/dataverse Jul 26, 2019

@djbrooke djbrooke moved this from IQSS Team Dev 💻 to IQSS Sprint 7/24 - 8/7 in IQSS/dataverse Jul 29, 2019

@djbrooke djbrooke moved this from IQSS Sprint 7/24 - 8/7 to IQSS Team Dev 💻 in IQSS/dataverse Jul 31, 2019

@djbrooke

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

Thanks @sekmiller for picking this up. Question for you, @scolapasta and @pdurbin:

Was there any discussion about how/if Dataset explorations should be recorded in the guestbook table? At the file level, I believe the explore tool name is added. Happy to discuss and sorry if I missed some communication or the spot in the code where this was reflected.

@scolapasta

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

It's a good question - it has not been discussed. It's a challenge because guestbook responses are at the file level.

The other thing that we need to think about are restrictions? if one or more files are restricted, do we show this button?

@djbrooke

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

Thanks @scolapasta. When the Dataset explore is initiated, does the tool download all the files or does it download them as needed (or will this differ from tool to tool)? If it's the former, I'd be OK with guestbook records being created for all files in the dataset when a dataset explore is initiated and reflecting the dataset explore tool's name in the table. If the latter, we can create the records as the tool pulls them. Better ideas or corrections welcome.

I'm not super worried about the some-restricted-files use case right now as the tools being focused on at this time are geared towards replication of datasets from journals which are mostly (all?) public. We will definitely need to worry about it in the future for these replication tools, both for restricted files and for files in TRSAs, and also as more tools are added with additional use cases that are not replication focused.

Note that we do need to provide the explore option pre-publish, so we'll already be dealing with files that can only be accessed by certain users, not sure if there's something that can be leveraged there.

@qqmyers

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

If the dataset level tools work as the file level ones, they'll retrieve files as needed and either be restricted to public files, or, if configured to be sent an api key, the ones the user launching them can see. So I think security is always maintained, but the question of whether the button should be shown is separate and depends somewhat on whether/how well Dataverse can determine if the tool can retrieve all files it will need and therefore be able to run. A binary yes if the tool can get all files /no if it can't wouldn't be hard to calculate, but if tools only depend on some files, one would need a way to decide if it can get the 'right' files...

@scolapasta

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

I think it will depend on the tool, but we should err on the side of a tool pulling what is needed. But as currently designed it does present a challenge.

The way file explore currently works (correct me if I'm wrong @sekmiller @qqmyers) is that wen you click the explore button you get the popup, you fill it out and a guestbookresponse / download is created. When the explore tool downloads the file, it sends an extra parameter to not write another guestbook entry (in order to not double count).

I think once we have the concept of guestbook entries existing independent of downloads, it should be easier, because then download record wouldn't be created until the tool actually fetches a file (and attaches to the existing download). But that doesn't seem in scope for this issue.

Re: restricted, my question is really about when to show the button. Draft shouldn't be an issue, because if you can see the draft you can download all the files. We could go with something simple like if all files are available to you, show the button, or if at least one file is available, show the button. I lean towards the former for now, as that would still work for all cases were the files are all public and seems "safer". (not from a security perspective, but from a "will the user be confused" perspective).

@scolapasta

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

@qqmyers yes, I think we're saying the same thing (our comments crossed paths).

@djbrooke

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

Thanks @qqmyers and @scolapasta. If I'm understanding correctly, the only truly scalable solution here seems to be to always show the button and rely on the external tool to provide the messaging about why it can't work with the files in the dataset, instead of trying to add logic to the button display.

@djbrooke djbrooke changed the title Add dataset level external tools 5028 Add dataset level external tools Aug 5, 2019

@sekmiller sekmiller removed their assignment Aug 5, 2019

@pdurbin pdurbin moved this from IQSS Team Dev 💻 to Code Review 🦁 in IQSS/dataverse Aug 6, 2019

@landreev
Copy link
Contributor

left a comment

In addition to the individual comments made earlier, it could be helpful to add a few words in the issue on how to QA this.
Presumably, all that's needed is to a) verify that any existing file-level tools are properly migrated and are still working and b) add a dummy dataset-level tool (using the json for the "Awesome Tool" as a model?); verify that the api works; and that the tool button is shown on the page. (anything else at this point?)

@qqmyers

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

I'd add checking that the file-level tools still work after a dataset-level tool is added to the database, i.e. to make sure the file-level code to list tools properly ignores the dataset-level one(s).

@djbrooke djbrooke moved this from Code Review 🦁 to IQSS Team Dev 💻 in IQSS/dataverse Aug 8, 2019

@pdurbin pdurbin referenced this pull request Aug 9, 2019
@dataversebot

This comment has been minimized.

Copy link

commented Aug 15, 2019

Can one of the admins verify this patch?

@djbrooke djbrooke moved this from IQSS Team Dev 💻 to IQSS Sprint 8/14 - 8/28 in IQSS/dataverse Aug 15, 2019

@scolapasta

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

@djbrooke asked me to try and summarize what is remaining on this issue. I reviewed all the comments (and did a quick review myself), and this is what I could come up with (but feel free to suggest other things if I missed something):

  • allow content type to be null (since Dataset level tools won't require it (and in theory, there may be a space for a file level tool that could work any any tool?
  • add File PID as parameter
  • require proper parameter based on scope (i.e. currently requires fileID for files; should require either dataset id or dataset pid for dataset scope)
  • on DatasetPage change the finds for file tools to use findByScopeAndType
@@ -59,6 +62,15 @@ To list all the external tools that are available in Dataverse:

``curl http://localhost:8080/api/admin/externalTools``

Showing an External Tool in Dataverse

This comment has been minimized.

Copy link
@scolapasta

scolapasta Aug 21, 2019

Contributor

What's the purpose / need for this API? (besides having it in a test?)

This comment has been minimized.

Copy link
@pdurbin

pdurbin Aug 21, 2019

Author Member

It's for sysadmins to check which external tools have been installed.

This comment has been minimized.

Copy link
@scolapasta

scolapasta Aug 22, 2019

Contributor

Sure, for curl http://localhost:8080/api/admin/externalTools, which lists the tools, I'm asking specifically about this new one that takes an id. i.e if you have the id already then you already know about the tool. Doesn't seem to add any value, but maybe I'm missing something.

This comment has been minimized.

Copy link
@pdurbin

pdurbin Aug 22, 2019

Author Member

I can imagine a somewhat paranoid sysadmin or tool maker adding a tool and then wanting to double check that correct values were saved in the database. For example, here I'm asserting that the tool that was just added is a dataset level tool but you could make assertions on other things:

    Response getTool = UtilIT.getExternalTool(id);
    getTool.prettyPrint();
    getTool.then().assertThat()
            .body("data.scope", CoreMatchers.equalTo("dataset"))
            .statusCode(OK.getStatusCode());

To me, this is all basic CRUD stuff. We're doing a read of a single item. I meant to add this back when I first implemented external tools but forgot about it. I'm trying to correct an oversight, something I forgot to add.

@pdurbin pdurbin self-assigned this Aug 21, 2019

@pdurbin pdurbin moved this from IQSS Sprint 8/14 - 8/28 to IQSS Team Dev 💻 in IQSS/dataverse Aug 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.