Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_structure_prediction.py accepts comma separated list of input folds and optionally dedicated output_directories for each fold #357

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

maurerv
Copy link
Collaborator

@maurerv maurerv commented Jun 6, 2024

No description provided.

…ds and optionally dedicated output_directories for each fold
@dingquanyu
Copy link
Collaborator

I guess in the case of padding, you may also need to update the --output_directory key so that its value is a list in the argument dictionary by extending it to all the sub-folders that should be created in this if block here? e.g. iterate through all_folds and append individual path.join(FLAGD.output_path, <name of the protein complex>) to a list.

command_args["--input"] = ",".join(all_folds)

object_to_model, flags_dict, postprocess_flags, output_dir = pre_modelling_setup(interactors, FLAGS)

if len(FLAGS.input) != len(FLAGS.output_directory):
FLAGS.output_directory *= len(FLAGS.input)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this covers the case when we have 1 output dir and many inputs, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

@maurerv
Copy link
Collaborator Author

maurerv commented Jun 6, 2024

@dingquanyu from your PR at KosinskiLab/AlphaPulldownSnakemake#13 it seemed like you wanted run_multimer_jobs.py to use a single output directory and create subdirectories for each fold according to use_ap_style.

We could extend run_multimer_jobs.py to allow multiple output_paths, but since run_multimer_jobs.py uses the file-based fold specification, where the user might not know the number of folds beforehand, I think having a single output directory makes the most sense

@dingquanyu
Copy link
Collaborator

@dingquanyu from your PR at KosinskiLab/AlphaPulldownSnakemake#13 it seemed like you wanted run_multimer_jobs.py to use a single output directory and create subdirectories for each fold according to use_ap_style.

We could extend run_multimer_jobs.py to allow multiple output_paths, but since run_multimer_jobs.py uses the file-based fold specification, where the user might not know the number of folds beforehand, I think having a single output directory makes the most sense

I see. This means in the snakemake pipeline, you will bypass run_multimer_jobs.py and launch run_structure_prediction.py directly with a cluster of jobs ?

@maurerv
Copy link
Collaborator Author

maurerv commented Jun 6, 2024

Exactly. I added a checkpoint that performs the clustering and then extended the current rule using run_structure_prediction.py to run on each cluster separately. This way we don't need additional rules.

I just pushed these changes for reference bfa71c7ac5d013a0c1aea3b78fc347381a3ca06c

@dingquanyu
Copy link
Collaborator

Exactly. I added a checkpoint that performs the clustering and then extended the current rule using run_structure_prediction.py to run on each cluster separately. This way we don't need additional rules.

I just pushed these changes for reference bfa71c7ac5d013a0c1aea3b78fc347381a3ca06c

I see. Thanks for the commit. Now it makes sense to me.

@maurerv maurerv merged commit 29682ab into KosinskiLab:main Jun 6, 2024
4 checks passed
@maurerv maurerv deleted the prediction_cli branch June 6, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants