Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing workflows #695

Merged
merged 38 commits into from Jan 13, 2016
Merged

Conversation

matthieudumont
Copy link
Contributor

Introducing the workflows structure with the median_otsu demonstration workflow


def median_otsu_bet(input_file, out_dir, save_masked=False, median_radius=4,
numpass=4, autocrop=False, vol_idx=None, dilate=None):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring?

@Garyfallidis
Copy link
Contributor

@chrisfilo, @oesteban and @arokem with this workflow design I think it would be trivial to use nipype to call these workflows from dipy. Can you think of this and let me know your thoughts? @matthieudumont and me and hopefully others too will be working on making many of those available very soon in dipy. We want to have workflows that will be called very easily from other pipelines but which also process multiple files and hopefully parameters. The trick here is to have in dipy.workflows all workflows with string params. In this PR we give an example using median_otsu which is an algorithm for scalp removal from EPI images. Let me know what you think. Thanks in advance.

@chrisgorgo
Copy link
Contributor

Yes it would be fairly easy to wrap such functions. I'm not sure if "workflow" is the best name (since it suggest many processing steps and long running time), but it's just a minor point.

I know we talked about this numerous times, but I would like to once again encourage you to write a Nipype interface instead of argparse. This way writing code once you get:

  • command line functionality
  • integration with Nipype
  • integration with CBrain (through boutiques)

@arokem
Copy link
Contributor

arokem commented Aug 5, 2015

I'm with @chrisfilo on this one. Seems like a duplication of effort to do this here, and if we start down the path, we will have to eventually reinvent all the wheels that nipype has already invented.

For example, it's clear that there are already emerging patterns here, that appear as boiler-plate in the code in this PR. Such things as checking whether output locations exist, whether the files required exist (and if not, what needs to be done), and so on.

Seems like a better use of time to write simplified APIs in dipy for processes that require many steps (e.g. registration with the SyN algorithm, see my attempts over on #673), and write workflows in nipype.

@Garyfallidis
Copy link
Contributor

Hi @chrisfilo. Thank you for your quick feedback. Workflow is used in different projects with a different meaning. Nonetheless, I think it is a good name exactly because in dipy.workflows many functions/methods are called from the different individual workflows. The workflow introduced here is now one of the simplest but there there will be other more complicated ones. Also there will be some generic ones where you will be able to do a complete dwi analysis by running a single command.

About the nipype interfaces. As I said before I am very happy to write nipype interfaces. But also I want to be as general as possible. So I would like to support both ArgParser and Nipype and calling this from a web client and from another script etc.

So what I want you to tell me is what would be the best way to write an inferface for dipy.workflows.segment.median_otsu_flow.

Is this a good example to start from?
https://github.com/nipy/nipype/blob/master/nipype/interfaces/dipy/tracks.py

If I remember correctly I need to write an
InputSpec, OutputSpec and a BaseInterface
is this correct?

So assuming that all input and outputs are given as string. Can you give me a recommendation of how this classes would look?

@Garyfallidis
Copy link
Contributor

Hi @arokem. I disagree that moving all the workflows in nipype will be a complete solution for dipy. I think we really need functions that handle io without extra dependencies. Also it will be much easier to have our quality assurance workflows tested first from the dipy developers and then submit the best ones to nipype. And these functions can be also called from our visualization module right now without waiting for a nipype release. I would like to play with a new technology before making radical decisions. So please do not rush into conclusions.

@Garyfallidis
Copy link
Contributor

As about checking if the files exist. I think these are trivial to write. But our scripts also allow in a beautiful way to run multiple subjects sitting in different directories (using * and other wild-cards) and possibly even multiple parameters and multiple outputs from only one command. I love this option. For example, you can write

dipy_median_otsu "Data/subj_1*0/t1.nii.gz" ...

and this would process all subjects starting with 1 and ending with 0.

If I remember correctly from what @chrisfilo said when we met in OHBM this is still experimental or under development in nipype.

@chrisgorgo
Copy link
Contributor

This is a good starting point:
http://nipy.org/nipype/devel/interface_specs.html

Python example:
http://nipy.org/nipype/devel/python_interface_devel.html

Real world example from nipy:
https://github.com/nipy/nipype/blob/master/nipype/interfaces/nipy/utils.py#L28

On the subject of command line inputs. Your current syntax is equivalent to how bash is dealing with wildcards, but will only work if users will specify parameters in parentheses. I would recommend leaving wild-cards expansion to bash (it's more intuitive for any command line user) and writing your function in such way that accepts lists of files instead of strings with wildcards.

In nipype input specification list of files are described this way:

input_files = InputMultiPath(File(exists=True), desc='list of files')

this will accept both "/path/file" and ["/path/file"] as well as ["/path/file1", "/path/file2"].

Let me know if you have any specific questions!

@Garyfallidis
Copy link
Contributor

Hi @chrisfilo thanks for the links! Cool, I will have something ready for you to see asap. Will do my best.

Aha, here you say
Your current syntax is equivalent to how bash is dealing with wildcards, but will only work if users will specify parameters in parentheses.

I suppose you mean quotes not parentheses. In my investigation I realized that if I leave the terminal language to parse for multiple files the commands will not be multiplatform. Yes, the user will have to write those quotes but if he does that makes the dipy/bin commands exactly the same to run across many platforms and no assumption is needed for bash or for operating system. That means that those can be run from Windows too. Which is an advantage for a library. I think it will be easy for them to learn to use quotes. And if they don't they can always use the nipype version or just run the commands with single files.

@chrisgorgo
Copy link
Contributor

A modified version of this solution: http://stackoverflow.com/a/12501893
would be a more robust way of dealing with the windows problem. This way
you can avoid quotes altogether and use nargs in argparse. We can include
this in nipype_cmd if it will make your life easier.

Best,
Chris
On Aug 5, 2015 7:02 PM, "Eleftherios Garyfallidis" notifications@github.com
wrote:

Hi @chrisfilo https://github.com/chrisfilo thanks for the links! Cool,
I will have something ready for you to see asap. Will do my best.

Aha, here you say
Your current syntax is equivalent to how bash is dealing with wildcards,
but will only work if users will specify parameters in parentheses.

I suppose you mean quotes not parentheses. In my investigation I realized
that if I leave the terminal language to parse for multiple files the
commands will not be multiplatform. Yes, the user will have to write those
quotes but if he does that makes the dipy/bin commands exactly the same to
run across many platforms and no assumption is needed for bash or for
operating system. That means that those can be run from Windows too. Which
is an advantage for a library. I think it will be easy for them to learn to
use quotes. And if they don't they can always use the nipype version or
just run the commands with single files.


Reply to this email directly or view it on GitHub
#695 (comment).

@Garyfallidis
Copy link
Contributor

Cool! I need to get hands on with your suggestions. Will get back to you asap! 👍

@Garyfallidis
Copy link
Contributor

Hi just wanted to give a quick feedback here so that the others now how this goes. A few days ago I proposed a new design to @matthieudumont which will allow automatic generation of cmd interfaces and nipype interfaces from the workflow functions using introspection. Mr. Dumont is currently working and extending this idea first for the cmd interfaces and after we have a first version of this I will also work on the objects that will generate the nipype interfaces automatically too.
If you want a sneak peek right here and right now here is an example of this idea implemented on the median_otsu interface
https://github.com/Garyfallidis/dipy/blob/introducing_workflows/bin/dipy_median_otsu
https://github.com/Garyfallidis/dipy/blob/introducing_workflows/dipy/workflows/base.py
https://github.com/Garyfallidis/dipy/blob/introducing_workflows/dipy/workflows/segment.py
Or you can wait until this PR is updated.

@Garyfallidis
Copy link
Contributor

@matthieudumont there is some weird fail with python 3. Can you rebase and see if you continue getting the same error?

import inspect


try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use the optional_package machinery here. For an example, see: https://github.com/nipy/dipy/blob/master/dipy/viz/fvtk.py#L32

@arokem
Copy link
Contributor

arokem commented Sep 18, 2015

Sorry - just a few things I noticed along the way. Looks like it's going in a good direction, with much less boilerplate code than before - that's good. Could you say a little bit more about how this also generates nipype workflows?

@Garyfallidis
Copy link
Contributor

Hi @arokem. The nipype generator is not yet in. But the idea is that now that we can get the exact input and output parameters from the introspectiveparser we can generate python files with the InputSpects and OutputSpecs automatically written. Without you writing a single piece of code. Clear? Please let me know if it is not.

@arokem
Copy link
Contributor

arokem commented Sep 19, 2015

Yeah. More or less. So the plan is to have some script that goes through
the workflows tree and generates the nipype code?

On Fri, Sep 18, 2015 at 4:06 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Hi @arokem https://github.com/arokem. The nipype generator is not yet
in. But the idea is that now that we can get the exact input and output
parameters and from the introspectiveparser we can generate python files
with the InputSpects and OutputSpecs automatically written. Without you
writing a single piece of code. Clear?


Reply to this email directly or view it on GitHub
#695 (comment).

@Garyfallidis
Copy link
Contributor

Yes @arokem. And then we can do the same with GUIs, web interfaces etc. Let the flows generate the interfaces ;) It seems this will work pretty well. But lets not be overenthusiastic and see if it works in practice. But yeah things look very positive right now. It seems we can greatly speedup our development.

Hi @matthieudumont this looks great I would suggest to add a test function to test the IntrospectiveArgParser with a dummy workflow with a large range of different parameters where different action strategies would be required.

Also it would be nice to update the other bin/commands to the workflow design. So, some refactoring will be needed. The tensor_fit should probably change too because right now it is calling fsl's bet etc which is not relevant. I know you have a better version.

Finally, we need to investigate the use of wildcards along different operating systems and how we will deal with that. There were some suggestions from Chris on this. GGT!

@arokem
Copy link
Contributor

arokem commented Sep 20, 2015

OK - that's cool. It seems to me that you should be able to generate the command line interfaces as well, just from introspection of the dipy.workflows module. That is bin/dipy_median_otsu should also possibly be auto-generated. Presumably they're all going to look the same?

@@ -8,7 +8,7 @@

from scipy.ndimage.filters import median_filter
try:
from skimage.filter import threshold_otsu as otsu
from skimage.filters import threshold_otsu as otsu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a change in the skimage API, or a BF in dipy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, idk why I modified and committed that sorry!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this change. There was a warning that skimage has deprecated skimage.filter and that now we should use skimage.filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. That answers my question.

On Mon, Sep 21, 2015 at 12:34 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

In dipy/segment/mask.py
#695 (comment):

@@ -8,7 +8,7 @@

from scipy.ndimage.filters import median_filter
try:

  • from skimage.filter import threshold_otsu as otsu
  • from skimage.filters import threshold_otsu as otsu

I did this change. There is a warning that skimage has deprecated
skimage.filter and that now we should use skimage.filters.


Reply to this email directly or view it on GitHub
https://github.com/nipy/dipy/pull/695/files#r40014939.

@arokem
Copy link
Contributor

arokem commented Sep 20, 2015

And yes: +1 on thorough testing of the IAP!

@matthieudumont
Copy link
Contributor Author

(huge changelog because I rebased)

@Garyfallidis
Copy link
Contributor

@arokem if you I agree I think I can merge this.

Next PR should be a cleanup of what is in dipy/bin to actually use the new design. And after that or in parallel start putting in the workflows that @matthieudumont and I have already written.

@arokem
Copy link
Contributor

arokem commented Jan 12, 2016

It seems that utils and segment are mostly untested. I think we should have
tests in place for these before we merge.

On Tue, Jan 12, 2016 at 1:05 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

@arokem https://github.com/arokem if you I agree I think I can merge
this.

Next PR should be a cleanup of what is in dipy/bin to actually use the new
design. And after that or in parallel start putting in the workflows that
@matthieudumont https://github.com/matthieudumont and I have already
written.


Reply to this email directly or view it on GitHub
#695 (comment).

@Garyfallidis
Copy link
Contributor

Okay @matthieudumont added tests as requested. So, I will go ahead and merge this. But something very important to realize with the workflows is that we should have some buildbots which will run the tests of the command lines of the workflows with large and realistic datasets and create some reports. This is going towards a proper quality assurance direction where we do have Travis (and nipy.builders) that check that the internal functions (API) work correctly and then buildbots that check that a specific command line (workflow) or a series of command lines (workflows) generate the required results. This will be quite a task because we will need to generate some kind of QA reports or automated end-to-end tests. I think with the new dipy.viz module we will be able to have some of these things nicely done. But we do need to meet and discuss were we will set the buildbots. I think that @matthieudumont, @jchoude, @matthew-brett, @arokem and me should be in that meeting and share ideas on how to make this done. Can we have for the beginning a linux machine somewhere which will able to run the commands once day and generate some reports? Will the guys in Berkeley or Sherbrooke be willing to set a machine only for that? Or should we look into a cloud service?

Garyfallidis added a commit that referenced this pull request Jan 13, 2016
@Garyfallidis Garyfallidis merged commit a5dc8d3 into dipy:master Jan 13, 2016
@arokem
Copy link
Contributor

arokem commented Jan 13, 2016

Nice! Thanks for adding the tests -- will help to maintain this code in the
long run.

Re: end-to-end testing, I think that a potential solution would be cloud
machines running a docker container that we would maintain. For an example
of what that might look like, see the docker that I have for building the
docs:

https://github.com/arokem/dipy-docbuild

This gets automatically built into this dockerhub image:

https://hub.docker.com/r/arokem/dipy-docbuild/

Which can then be pulled down onto a cloud machine (e.g. on AWS) and run.
That's what I did on the 0.10 release. I am not sure how to set up an
automated build for this (though I am confident that could be done), but we
probably don't want to run this on every PR, do we? We could also store
some data on S3 (or equivalent), and get the data from there, instead of a
fetch from an http URL. Might help reduce costs.

At least initially, I can put some funding that I have towards running this
on AWS. In the short/medium time-scale, we might be able to get some more
support for this from UW eScience, but we should also think up a way to
maintain this in the longer run. A tip jar, or something :-)

On Wed, Jan 13, 2016 at 11:19 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Merged #695 #695.


Reply to this email directly or view it on GitHub
#695 (comment).

@matthieudumont matthieudumont deleted the introducing_workflows branch June 22, 2016 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants