-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New function: Add queuing system/batch processing option #28
Comments
Will investigate Parsl as a potential solution to parallelize PlantCV on different environments rather than building a unique solution for different systems. |
My initial inclination was to utilize an existing workflow engine (e.g. Nextflow, Parsl, Snakemake, etc.). I have tried them all out, at least a bit. I like them a lot but am not sure they work for precisely what I have been trying to achieve. That being said @gsainsbury86 has developed a Nextflow workflow that we need to look at (https://github.com/aus-plant-phenomics-facility/plantcv-pipeline), and Parsl 0.9 (now released) had some planned features I was waiting for, so I should check them out again. Another ideas below... |
Another way to go I started mapping out was to use Right now, a user develops a workflow, likely in Jupyter. If in Jupyter, they convert it to a Python script and have to form it into a workflow script with some argument parsing and plugging their code into the What if we turned this around a bit? Rather than using Rather than having command-line arguments we could have inputs in a configuration dictionary (though a user could make the config an input easily). We discussed this in #470. Then we package the I would image a user starts with a script downloaded from Jupyter: from plantcv import plantcv as pcv
img, path, filename = pcv.readimage(image) Then this gets converted (roughly) to this (perhaps automatically with a converter): from plantcv import plantcv as pcv
from plantcv.parallel import parallelize
def main():
config = {
"dir": "./images",
"json": "pcv2.output.json",
"outdir": "./output",
"meta": "imgtype,camera,frame,zoom,lifter,gain,exposure,id",
"match": "imgtype:VIS,camera:SV,zoom:z1,frame:0",
"cpu": 1,
"coprocess": "NIR",
"writeimg": True,
"create": True
}
parallelize(config)
def workflow(image, result, outdir, coresult, writeimg, debug):
img, path, filename = pcv.readimage(image)
if __name__ == '__main__':
main() If people have thoughts on this, let us know! |
In favor. i'm doing something like this already because don't like notebooks much. py scripts through jupyter are easier to deal with and can also produce interactive output with jupyter lab or vscode. WHen I am starting out I create workflowargs.py:
then I jump into main() of the workflow script
|
I'd be happy to provide some more insight into my Nextflow implementation. One of my goals was to make it work such that the actual plantCV process could ideally be isolated and run on the command line as a single instance. The main reason for me choosing to do it this way is that I don't anticipate myself being the one who does the analysis configuration/tweaking of parameters and thresholds. My ideal setup is that there's a relatively easy process for our image analyst to modify an existing script for the experiment in question and I can then take that and plug that in to a pipeline/workflow that will run that job for the whole experiment. As for configuration etc., mine broadly works like this:
|
Develop a queuing system/batch processing option to add on in addition to the current local/multiprocessing capabilities.
The text was updated successfully, but these errors were encountered: