Save datasets in Galaxy history from jupyterlab notebook #1157

anuprulez · 2021-09-10T15:29:37Z

This PR adds a Galaxy tool for processing long-running jobs emitted from a script in the Jupyterlab notebook such as created here. This tool will be useful for running scripts as a job such as training a machine/deep learning model that takes a long time to finish. The trained model (as ONNX format) will be available in Galaxy history after the job finishes.

Steps to run this tool:

Serve Galaxy via planemo locally containing this tool
Generate an API key from running Galaxy instance
Execute the notebook https://github.com/anuprulez/jupyterlabtool/blob/master/upload_jltools.ipynb after updating the API key. This notebook will connect to the local Galaxy instance via bioblend and execute the saved script on Galaxy. Once the job finishes, the trained model will be available in a newly created Galaxy history as an ONNX file.
Note: ONNX file format is now available in Galaxy in the dev branch.

Currently, this work is in progress, tools tests are not yet written. Your comments are welcome!! Thanks!

mvdbeek · 2021-09-16T10:03:26Z

What do you think about using a collection with discover_outputs, this should have a similar effect and be better performance-wise, it would work better in workflows, history extractions etc ?

bgruening · 2021-09-22T07:01:39Z

tools/jupyter_job/run_jupyter_job.xml

@@ -0,0 +1,80 @@
+<tool id="run_jupyter_job" name="Run long running jupyterlab job" version="0.0.1">


@anuprulez can you check if there exists a hidden=true option here?

please also add a recent profile=21.05

bgruening · 2021-09-22T07:02:47Z

tools/jupyter_job/run_jupyter_job.xml

@@ -0,0 +1,80 @@
+<tool id="run_jupyter_job" name="Run long running jupyterlab job" version="0.0.1">
+    <description>on GPUs</description>
+    <requirements>


I thought this needs to run in a Docker container? We should make this explicit, as we don't want to run this in conda, for security reasons?

ping @anuprulez

This dynamic code execution is the same as executing code written in any other interactive tool. The execution command from this tool can be sent to any secluded VM for running as far as I understand.

Yes, but this tool is way more insecure than any other tool, left? So making sure it runs in Docker and not by accident in Conda is making it a little bit more secure?

I agree, having a docker container running this tool will automatically enable it to run in a secluded environment.

Moving to an explicit container requirement does not in any way guarantee this is safe. The right way right now is to make it an interactive tool. The medium term solution is a tag that means we require containerized execution and pick a destination that can run it (so effectively an interactive tool without open ports).

Current tool is supposed to be executed on jupyterlab notebook that is already an interactive tool. I am not sure if interactive tools interact with one another?

No problem, they're just tools.

bgruening · 2021-09-22T07:05:29Z

tools/jupyter_job/run_jupyter_job.xml

+        <data format="h5" name="outfile_output_arrays" label="Saved arrays"></data>
+    </outputs>
+    <tests>
+        <test>


can you a test for num expected outputs?

bgruening · 2021-09-22T07:05:52Z

tools/jupyter_job/run_jupyter_job.xml

+            <output name="outfile_output_model" file="scikit-script-model.onnx" ftype="onnx" compare="sim_size" delta="50" />
+        </test>
+        <test>
+            <param name="select_file" value="tf-script.py"/>


why not combining this test with the first one?

bgruening · 2021-09-22T07:06:11Z

tools/jupyter_job/run_jupyter_job.xml

+            </output>
+        </test>
+        <test>
+            <param name="select_file" value="scikit-script.py"/>


and this test with the second?

anuprulez · 2021-11-30T15:04:15Z

Steps to run this tool (after this is merged to Galaxy EU):

Open Galaxy's interactive tool (https://live.usegalaxy.eu/?tool_id=interactive_tool_ml_jupyter_notebook)
Write a script as .ipynb file (or copy the entire script from: https://github.com/anuprulez/galaxytools/blob/run_job/tools/jupyter_job/test-data/tf-script.py)
Open another .ipynb tab and write and execute the following script:

url = <<Galaxy's URL>> e.g. https://usegalaxy.eu/
key = <<your Galaxy's API key>>
file_path = <<ipynb script relative path with respect to the notebook>>
tool_output = run_script_job(file_path)

bgruening

nice, thanks a lot @anuprulez

anuprulez and others added 11 commits August 9, 2021 19:42

Update tool def

98bee96

update tool

0ea3998

Update tool

2b9a237

Update tool

4b5403f

Update tool

67de398

Add yaml safe

d4edcc0

Add joblib

1d0a98a

Add tf to onnx conversion

87a6742

Merge branch 'bgruening:master' into run_job

5b4a9e9

Add sk2onnx

2f1929c

Merge branch 'run_job' of github.com:anuprulez/galaxytools into run_job

99ad538

anuprulez and others added 13 commits September 16, 2021 12:15

Add supported model types

fd4f590

update main and add tests

1efd665

update model and sim size

aefe283

update sim size

44b991d

add shed yml file

8694532

Fix linting

71b4def

add tests

a2a9108

Save arrays and lists

2e4d967

update tests

f8a55cd

update script

1eda129

Fix linting error

0ba6094

Fix imports

b1085fc

Merge branch 'bgruening:master' into run_job

2b5793d

bgruening reviewed Sep 22, 2021

View reviewed changes

anuprulez added 4 commits September 22, 2021 15:35

Fix review comments

6d2e47d

Merge branch 'run_job' of github.com:anuprulez/galaxytools into run_job

3c988d0

Add compilation step before exec

70a44f4

add only globals

e9f9d2d

anuprulez and others added 2 commits October 6, 2021 16:44

Move tf import inside a method

c4f8f0c

Merge branch 'bgruening:master' into run_job

a9f37bd

anuprulez changed the title ~~[RFC] Save datasets in Galaxy history from jupyterlab notebook~~ Save datasets in Galaxy history from jupyterlab notebook Oct 7, 2021

anuprulez and others added 22 commits October 8, 2021 11:20

Add support for python primitives

ed4f63b

Merge branch 'run_job' of github.com:anuprulez/galaxytools into run_job

0957c57

Merge branch 'bgruening:master' into run_job

a6b03cb

update container

c9fab2b

Merge branch 'run_job' of github.com:anuprulez/galaxytools into run_job

3888676

Write working dir files to zip

3ebe64d

update tests

2951748

Fix issue with input datasts

a009448

update scripts

f958290

Fix linting

7cce0d1

Merge branch 'bgruening:master' into run_job

7444917

remove requests

11cae57

Merge branch 'run_job' of github.com:anuprulez/galaxytools into run_job

41777d0

Merge branch 'bgruening:master' into run_job

a028046

update tf operation set and model save

d4365dd

Merge branch 'run_job' of github.com:anuprulez/galaxytools into run_job

49a2747

update tool

bd78677

update number of outputs

7b63c14

Fix failing tests

17b1bd5

update tool

cc1aa3e

update hidden and profile atts

f16ee5c

Merge branch 'bgruening:master' into run_job

19c9094

anuprulez added 2 commits November 30, 2021 16:04

Merge branch 'bgruening:master' into run_job

618519d

Merge branch 'bgruening:master' into run_job

e0598ea

bgruening approved these changes Dec 11, 2021

View reviewed changes

bgruening merged commit f945b1b into bgruening:master Dec 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save datasets in Galaxy history from jupyterlab notebook #1157

Save datasets in Galaxy history from jupyterlab notebook #1157

anuprulez commented Sep 10, 2021 •

edited

Loading

mvdbeek commented Sep 16, 2021

bgruening Sep 22, 2021

bgruening Sep 22, 2021

bgruening Sep 22, 2021

bgruening Sep 25, 2021

anuprulez Sep 27, 2021

bgruening Sep 27, 2021

anuprulez Sep 27, 2021

mvdbeek Sep 29, 2021 •

edited

Loading

anuprulez Sep 29, 2021

mvdbeek Sep 29, 2021

bgruening Sep 22, 2021

bgruening Sep 22, 2021

bgruening Sep 22, 2021

anuprulez commented Nov 30, 2021

bgruening left a comment

		@@ -0,0 +1,80 @@
		<tool id="run_jupyter_job" name="Run long running jupyterlab job" version="0.0.1">

Save datasets in Galaxy history from jupyterlab notebook #1157

Save datasets in Galaxy history from jupyterlab notebook #1157

Conversation

anuprulez commented Sep 10, 2021 • edited Loading

mvdbeek commented Sep 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvdbeek Sep 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuprulez commented Nov 30, 2021

bgruening left a comment

Choose a reason for hiding this comment

anuprulez commented Sep 10, 2021 •

edited

Loading

mvdbeek Sep 29, 2021 •

edited

Loading