Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a one-time endpoint that sets the language version for existing descriptor files #5270

Closed
kathy-t opened this issue Dec 6, 2022 · 13 comments
Assignees
Labels
enhancement review merged but pending a third party look at whether it makes sense/is working web-service
Milestone

Comments

@kathy-t
Copy link
Contributor

kathy-t commented Dec 6, 2022

Is your feature request related to a problem? Please describe.
PR #5266 adds the ability to track language versions for source files and entry versions. However, existing source files and entry versions do not have the language version set.

Describe the solution you'd like
Create a one-time endpoint that parses existing descriptors for the language version and sets it for the entries' source files and versions.

Describe alternatives you've considered
n/a

Additional context
Follow up work for #4320

┆Issue is synchronized with this Jira Story
┆Fix Versions: Dockstore 1.14
┆Issue Number: DOCK-2296
┆Sprint: 105 - Colorado
┆Issue Type: Story

@coverbeck
Copy link
Contributor

I just realized we're not going to be able to update frozen versions. We might need to do this in the migrations file. Just discovered a way to do custom Java migrations, maybe that's the way.

@coverbeck
Copy link
Contributor

Further thoughts on the issue raised in previous comment on how we are going to handle frozen versions for this. By default, we will not be able to update frozen versions nor their source files with this information.

  • We could do a a Liquibase custom migration, where you can execute arbitrary Java code during migrations. The issue is we haven't done this before and we'll have to set up Hibernate inside the migration. Seems complicated, looking at DockstoreWebserviceApplication and how the Hibernate stuff is tied into DropWizard. But maybe it's not that hard, I just haven't played with setting up Hibernate much. Also, maybe it's something we may need to do in the future, so maybe it's a good pattern to figure out.
  • Otherwise, if we want frozen versions to be updated, we could change the schema so the descriptor language versions are not directly in the sourceFile and version tables (use join tables). Seems like an ugly workaround.
  • We could just skip frozen versions, and they will never have descriptor language versions. A tiny percentage published versions are actually frozen.

@denis-yuen @kathy-t Any thoughts?

@denis-yuen
Copy link
Member

denis-yuen commented Jan 11, 2023

Well, we have done the second option with frozen versions
https://github.com/dockstore/dockstore/blob/develop/dockstore-webservice/src/main/java/io/dockstore/webservice/core/VersionMetadata.java#L47-L48

It's a @OneToOne so no join table, I think

@coverbeck
Copy link
Contributor

Well, we have done the second option with frozen versions https://github.com/dockstore/dockstore/blob/develop/dockstore-webservice/src/main/java/io/dockstore/webservice/core/VersionMetadata.java#L47-L48

It's a @OneToOne so no join table, I think

Ah good point. I was aware of that one; the solution seemed hackier to me for SourceFiles, but maybe not. Maybe we should just go with this approach -- there could be something else we want to add to source files one day, e.g., my speculative idea about needing file permissions.

@denis-yuen
Copy link
Member

denis-yuen commented Jan 11, 2023

It is a bit of a pain, for example since we have so many source files in the system, I wonder if there will be performance implications. But hopefully a @OneToOne will keep the cost low (or similar to adding a new column)

@coverbeck
Copy link
Contributor

Ran a rough experiment, updating all versions from published workflows only (no tools/apptools, no unpublished), one language at a time, and it took:

CWL: 8.75s
WDL: 1h:27m:33s
NFL: 13m:55s
Galaxy: 3.6s

@denis-yuen
Copy link
Member

WDL may be an over-night/batched job with unpublished as well I imagine.

@coverbeck
Copy link
Contributor

WDL may be an over-night/batched job with unpublished as well I imagine.

Actually, just checked, and we 2.5x more versions on workflows that are published compared to unpublished, so the unpublished will be faster.

coverbeck added a commit that referenced this issue Feb 1, 2023
#5270 -- see PR description for more detail.
@unito-bot unito-bot added the review merged but pending a third party look at whether it makes sense/is working label Feb 1, 2023
@unito-bot
Copy link

➤ Denis Yuen commented:

Documented in https://github.com/dockstore/dockstore-deploy/wiki/1.14-staging#post-deploy ( https://github.com/dockstore/dockstore-deploy/wiki/1.14-staging#post-deploy )

Search seems consistent (if a little confusing) since workflows will appear under all language versions that they’ve ever used. Spot checking a number of languages, the results seem consistent although we lack galaxy results.

@unito-bot
Copy link

➤ Denis Yuen commented:

Hmmm, may be an oversight. The galaxy plugin populates language versions on indexing (new workflows) but not on parseWorkflowContent (what is currently used)

@coverbeck
Copy link
Contributor

Whoops, I just had assumed we hadn't implemented this for Galaxy yet, and that Galaxy only had one version anyway.

@denis-yuen
Copy link
Member

Potential fix for testing and review at #5374

@denis-yuen
Copy link
Member

We have a couple versions in the Galaxy facet in search, so this seems obsolete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement review merged but pending a third party look at whether it makes sense/is working web-service
Projects
None yet
Development

No branches or pull requests

4 participants