Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all threads used #3

Closed
jakobnissen opened this issue Jun 1, 2021 · 12 comments
Closed

Not all threads used #3

jakobnissen opened this issue Jun 1, 2021 · 12 comments

Comments

@jakobnissen
Copy link
Collaborator

When using CoverM from command line on 10 BAM files, roughly 5 CPUs are used. But when using pycoverm, only about 2 CPUs are used. In both cases, 8 CPUs were given to the task.

@apcamargo
Copy link
Owner

This is something I need to investigate further. Because all the parallelization is performed at the Rust code, I assumed that Python wouldn't interfere with it.

Do you have a reproducible example? How did you measure the CPU usage?

@jakobnissen
Copy link
Collaborator Author

Hm, this is strange. I used multi-gigabyte BAM files from the CAMI2 Airways dataset, and here, it uses around 225% CPU. To make a reproducible example, I downloaded some smaller public BAM files, where it utilized 750% CPU. Again, when using CoverM directly on the CAMI2 files, it uses 500-600% CPU.

One difference could be the number of BAM headers - around 185,000 in the CAMI2 files, around 20 in the public ones. Could it be that most of the time is actually spend in pycoverm, not in CoverM?

This is a wild guess, but perhaps pycoverm performs more checks on each read than CoverM? E.g. I can see pycoverm passes filter_params.min_percent_identity_pair = 0 to the internal Rust function. Does this mean that the statistic (and several other similar statistics) is needlessly computed for each read, then compared to zero?

I'm not sure how to troubleshoot this. Do you have an idea about how I could dig into what is happening?

@apcamargo
Copy link
Owner

I don't think that the min_percent_identity_pair is the problem because CoverM uses it in all (can't check that right now, but I'm almost sure) modes. An additional filter that I'm using though is the trimmed means function, because it can be used as the standard mean estimation when you set the trimming parameters to 0. But I don't think that this is the problem (I would have to test to be sure).

You mentioned that the number of headers is vastly different. So maybe these loops are the ones slowing down the process. If that's the case, they can be easily parallelized with Rayon. I'll try to check that later this week.

@jakobnissen
Copy link
Collaborator Author

I thought about those loops too, but if they were the problem, I would expect CPU usage to begin at around 5-600%, then drop to 100%, when in actuality, it's uniformly around 225%.

@apcamargo
Copy link
Owner

Humn. Could you test to see if the trimmed means method in CoverM is slower than the standard mean in the CAMI2 BAM? You could set the trimming parameters to 0 to make sure that the results will be the same as the standard mean.

@jakobnissen
Copy link
Collaborator Author

I have been comparing coverm contig -m trimmed_mean --trim-min 10 --trim-max 90 -t 8 --bam-files * (from shell) with _pycoverm.get_coverages_from_bam(paths, threads=8, min_identity=0.0, trim_upper=0.1, trim_lower=0.1) (from Python) with the same files.

@apcamargo
Copy link
Owner

I'll have to put some though into this then. Unfortunately I don't think that there's a easy way to profile a Rust function within Python.

Just by curiosity. Did you perform the benchmark with the version I released today? The Linux wheels were compiled in a different Manylinux container and I used a flag to improve performance.

@jakobnissen
Copy link
Collaborator Author

Just tried with the new version, same. Interestingly, it does actually spawn 8 additional threads. I'm on MacOS, but I don't expect that to make a difference, right?

@apcamargo
Copy link
Owner

Nothing changed for building the macOS wheels. I did some testing here and it seems to be working fine.

As soon as I got some time I'll investigate this issue further. Maybe it has something to do with the FFI, even though I can't see how. I'll at least try to parallelize that loop with rayon.

@apcamargo
Copy link
Owner

Well, I started to investigate this issue again and I couldn't figure out what is going on. I wrote some test code where the parallelization is all done on the Rust side and it looks like Python won't interfere with it. Do you have any idea what might be going on, @wwood?

@jakobnissen I'll try to reproduce the issue here. Any of the CAMI2 Airways BAM files will serve?

@jakobnissen
Copy link
Collaborator Author

Hm, not sure. I could only download the short-read Cami2 toy human microbiome dataset, Airways, due to brandwidth reasons.

@jakobnissen
Copy link
Collaborator Author

I can't reproduce this issue anymore. No idea what caused it. Closing as solved - we can always re-open if it appears again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants