Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up data loading process #376

Merged
merged 12 commits into from Dec 11, 2023

Conversation

dingquanyu
Copy link
Contributor

Now MSA files are parsed in parallel instead of in serial way

@dingquanyu dingquanyu changed the title Speedup data loading process Speed up data loading process Dec 5, 2023
parser.add_argument('--alignment_dir', type=str, help='path to alignment dir')
args = parser.parse_args()
alignment_dir = args.alignment_dir
stockholm_files = [i for i in os.listdir(alignment_dir) if (i.endswith('.sto') and ("hmm_output" not in i))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here can you add an exclusion "uniprot_hits" as well? I changed this recently, it is only used for msa pairing.

continue

msa_data[f] = msa
# Now will split the following steps into multiple processes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we already generated the pkl file, then we should check that it exists before re-parsing the msas. Or does it get removed somewhere?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also, is there reason we couldn't just call a function to do this instead of running the script with subprocess?

@christinaflo christinaflo merged commit f861ff3 into aqlaboratory:multimer Dec 11, 2023
1 check passed
@dingquanyu dingquanyu deleted the speedup-dataloader branch January 19, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants