Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added a pipesub script for qsub (torque) (WIP, needs some testing) #4

Open
wants to merge 1 commit into
base: NSG
Choose a base branch
from

Conversation

JensTimmerman
Copy link

No description provided.

@srothmei
Copy link
Collaborator

srothmei commented Jul 8, 2015

Hi Jens,

good to see that people are adapting the scripts to their environments and also contributing the code back to us!

Im currently thinking about how to include this the most convenient way, since it would not make much sense to create a new repo for every different job scheduler around (like i did for SLURM and OAR).

I think ill include all the lines including calls to a job-scheduler into a new file where they can be modified easily and from which they'll be called by the main script.
By then, i'll also include your changes into one of those scripts.

Thanks and all the best,
Simon

@JohnGriffiths
Copy link
Collaborator

+1 to that.

I think it would also make sense to have in this new file an option to run the pipeline locally without a job-scheduler. Agree?

@srothmei
Copy link
Collaborator

srothmei commented Jul 8, 2015

Good idea! Should be easy to add this specific case once the "framework" i described above is done.

@JensTimmerman
Copy link
Author

yeah, I was thingking about first creating some sort of abstraction where you would call
submit_job <nodes> <procs> <name> <time> <priority/queue> <mail> <output> <dependon> scriptname

and the submit_job function would be configurable, but not every queuing systems seems to support all these features, like dependencies...
I'll first test this version.

@srothmei
Copy link
Collaborator

srothmei commented Jul 8, 2015

Yes the dependencies are indeed a crucial point here: OAR for example does not support this hence i created a workaround by creating file on the local FS after job completion and checking for their existence within a background job (pretty messy...).
I think dependencies would also give a hard time when it comes to run this locally i.e. without a job scheduler.

On the other hand, the whole think could then be divided into 2 forks "non-dependencies"/"dependencies". This on the other hand causes some extra work

@ehiggs
Copy link

ehiggs commented Oct 9, 2015

Has there been any progress here? We have users who would love to use this data analysis pipeline but are waiting on upstream support.

Thanks

(nb: I am a colleague of @JensTimmerman)

@srothmei
Copy link
Collaborator

srothmei commented Oct 9, 2015

Hi,

i haven't had any time to work on this particular problem.
Currently im rewriting parts of the pipeline from Octave/Matlab to Python using Nipype such that the Interfaces of the Toolboxes like MRTrix etc can be controlled directly from within Python. This will also allow to easily replace tools.

Anyway, i this is finished, i'll also present you are more easy way to exchange/port the pipeline onto different HPC/job-sheduler frameworks.

But this might take one more month i think.

@JohnGriffiths
Copy link
Collaborator

Excellent. Adapting to nipype is the way forward. That will make the code
maximally portable and hackable.

Do you have a github branch for this yet?

On 9 October 2015 at 08:54, Simon Rothmeier notifications@github.com
wrote:

Hi,

i haven't had any time to work on this particular problem.
Currently im rewriting parts of the pipeline from Octave/Matlab to Python
using Nipype such that the Interfaces of the Toolboxes like MRTrix etc can
be controlled directly from within Python. This will also allow to easily
replace tools.

Anyway, i this is finished, i'll also present you are more easy way to
exchange/port the pipeline onto different HPC/job-sheduler frameworks.

But this might take one more month i think.


Reply to this email directly or view it on GitHub
#4 (comment)
.

Dr. John Griffiths

Post-Doctoral Research Fellow

Rotman Research Institute, Baycrest

Toronto, Canada

and

Honorary Associate

School of Physics

University of Sydney

@srothmei
Copy link
Collaborator

Hi John,

currently i'm translating the scripts and doing some debuging on them.
I'll probably create a branch for that during this or the next week, at this moment im storing progress inside a own repo on my account.

Best,
Simon

@srothmei
Copy link
Collaborator

So just to let you know, especially i got some very nice input from John at the SfN, i now finished translating the matlab-scripts into Python and will be starting to port the whole workflow into Python using Nipype-Workflows.
From what i currently understand due to the docs, this will the make it far more easier to run the pipeline on different HPC structures.

@JohnGriffiths
Copy link
Collaborator

Hi Simon. Sorry for the delay in getting back to you on this.

How is this going? Have you come up with a set of workflow designs that you are happy with?

Useful reference points for nipype-based pipeline (outside of nipype itself, which is the main reference point) are the [(new) connectome mapping] toolkit((https://github.com/LTS5/cmp_nipype)), which has been re-written for nipype, and 'CPAC' for fMRI analyses.

If you look at some of the code in e.g. CPAC you will see lots of use of the nipype Function interface (e.g. here,here, etc.); I think that is one of the most useful pieces of design advice to take from this: write your analysis functions as stand-alone python functions, and wrap them in nipype using the function interface. This a lot more flexible and less labour-intensive than actually writing out interfaces (with the input spec, output spec, etc) for each of your interfaces; you just wrap the python functions directly. That also means you can test and debug the functions outside of nipype, which is useful.

Personally I prefer the CPAC architecture a lot more the cmp_nipype, which seems to overcomplicate things somewhat.

(Update: just noticed your pypeline repo. Taking a look now. Do you want to shift this discussion over to there or keep it here?)

@srothmei
Copy link
Collaborator

Hi John,

thanks for the feedback. Things are going quite slow i have to admit, the workflow stuff is also not really trivial when it comes to parallelization and also since i want to keep the flexibility as high as possible, e.g. each tracking-module should output the same streamlined dataset such that the aggregation does not have to cover all the different file formats produced by different tractography toolboxes.

Also thanks for the references. Since you discovered my repo, you'll also notice that it programmed our in-house functions as suggested by you during the SfN: As standalone functions which i will later wrap in nipype's function interface.

For further discussions which are specifically aiming for the implementation in Nypipe, lets shift them over to the new repo to keep this repo tidy.

As soon as everything is working with the nipype implementation, I will update this repository and probably make this solution the default branch in here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants