Problem speed of mapping #44

ArtemPalanaria · 2020-05-04T15:41:32Z

Dear Matt. I tried to run Read Until, went through the testing stages (to the Testing basecalling and mapping stage, point 6). In this case, basecalling is launched on GPU (RTX 2080Ti). But the speed of mapping shows more than 1-3 seconds. What could be the problem?
thank
I did the launch from a file - an example "Testing"
I attach files
human_chr_selection.toml.txt
chunk_log.log
ru_test.log
guppy_basecall_server_log-2020-05-04_15-07-45.log

mattloose · 2020-05-04T16:09:45Z

Hi,

There are a lot of issues here.

The toml file you provide (human_chr_selection.toml.txt) won't pass validation as it has no targets. It isn't the one passed in the command shown in ru_test.log (that one is human_chr_selection.toml).

The ru_test.log shows that you actually have two targets in your toml file (further suggesting an incorrect toml file here) BUT your used toml file has 2 targets, none of which are found in the reference.

Reads will either be always off target or not map at all. If they are not mapping (and I suspect that is the case here) you will collect more data and so your basecalling will take longer and longer.

In essence I'm not sure you have configured this experiment properly.

If you can provide further information including the source of data (are you playing back a bulkfile here or something else?) and the correct toml file we might be able to help further.

Matt

ArtemPalanaria · 2020-05-04T19:55:18Z

Thanks for the answer. I changed the reference and the file passed the test. After starting, it still shows a long time. To start, I used the bulk file from
http://s3.amazonaws.com/nanopore-human-wgs/bulkfile/PLSP57501_20170308_FNFAF14035_MN16458_sequencing_run_NOTT_Hum_wh1rs2_60428.fast5

Attach files.
human_chr_selection.toml.txt
chunk_log.log
ru_test.log
chek_toml.txt

ArtemPalanaria · 2020-05-04T19:57:07Z

Last time I attached the wrong file (TOML) attached ..

mattloose · 2020-05-04T20:08:07Z

Thanks for the update - So that is a lot slower than I would expect.

I would check a few things here.

First - how quickly can your GPU call reads when running standalone. You may need to play with guppy parameters to tune your guppy basecaller optimally.

However, we need to see if it is GPU or CPU which is limiting here - how big is your reference file that you are mapping too? Also what sort of power is your CPU?

Have you tried the fast basecalling model instead of the high accuracy model? If you see an improvement in speed here then we can pinpoint the source of the problem a little.

Thanks

ArtemPalanaria · 2020-05-04T21:10:53Z

Thank. I launched it on a high accuracy model - the speed for the first 2 minutes was normal, but then again everything started to slow down to 1 second or more. I have CPU Ryzen 7 (3800X 8 core 16 treads). I use as a reference the indexed file from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
File size (mmi) more than 7 GB. Maybe this is the case?
Thanks

mattloose · 2020-05-04T21:18:44Z

This doesn't really make sense then. Can you please try setting the max chunks to 8 rather than infinite and see what happens?

Also - please leave it running for more than 15 minutes and check the resulting data to see if the selection is working.

mattloose · 2020-05-04T21:24:46Z

Also - please can you try it with the FAST model and not the High Accuracy Model. Running on the fast model will tell us something about where the lag is.

ArtemPalanaria · 2020-05-05T08:19:47Z

Dear Matt. I started the process on fast and hac models with max_chunks = 8
But according to the data obtained it is clear that the fast process is going as fast as necessary, and the hac is slowing down and very much. And the results obtained are also apparently bad.
Here are the fast data

chunk_log.zip
ru_test.zip
result.txt
and hac data
result.txt
chunk_log.log

ru_test.zip

Thank

mattloose · 2020-05-06T21:53:24Z

can I check what operating system you are on? And also can you provide a metric for how quickly you can basecall standard reads on your current setup?

tchrisboles · 2020-05-06T22:33:52Z

How would I check the speed in standard basecalling? From log files? I've never looked for them - give me a hint and I dig it out.

ArtemPalanaria · 2020-05-07T07:18:56Z

Dear Matt. Here are the system data (Ubuntu 18.04.4 LTS
Gnome 3.28.2)
and baseсall speed files.
guppy.txt
guppy_basecaller_log-2020-05-07_10-42-57.log
Thank

vincentmanz · 2020-05-08T07:30:13Z

I have observed the same problem here when using the hac model in the toml file, I obtained very slow mapping time (>1s).
#44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020

mattloose · 2020-05-08T15:45:14Z

Hi All,

A quick question - could people confirm the version of guppy they are using?

Thanks.

mattloose · 2020-05-08T15:47:59Z

If you are on version 3.6 it may be worth trying guppy 3.4.5 - it is available from:

https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy_3.4.5_linux64.tar.gz

It looks as though there is a change in guppy performance that might be negatively impacting the speed of read until.

tchrisboles · 2020-05-08T15:49:22Z

Hi Matt and Artem,

I have been having problems similar to Artem and I am running:

tchrisboles · 2020-05-08T15:51:53Z

Thanks Matt - will try 3.4.5 later today.

mattloose · 2020-05-08T15:53:03Z

HI Chris,

If you can let us know how 3.4.5 goes - the accuracy differences aren't key here but the speed is. So you should find that gives you better performance. We're really keen to resolve this ASAP!

Best

Matt

tchrisboles · 2020-05-08T18:38:32Z

OK, I think you guys nailed it with the guppy server version. Here's my test results.
(I downloaded and untarred ont-guppy package 3.4.5 as Matt pointed out above.)
Setup basecall server:

In second terminal window setup the ru_generators command:

I had previously modified Matt's toml file as here:

After 16 min the read distribution and mapping timing looked like this:

Which is much closer to Matt's readme image than I have gotten previously. Mapping timing is still not quite as fast as Matt's. Here's a close-up of 16 minute read distribution:

And the summarise output:

The median read lengths are now showing enrichments for chr21,22. Again, not quite as good as Matt's readme, but significant.
I think what would help us all would be some additional guidance on best strategies for optimizing for guppy server settings.

Hope this helps others who are as interested in ru as we are.

tchrisboles · 2020-05-08T18:47:12Z

By the way, you can see my previous results using guppy 3.5.2 in Question #39.

mattloose · 2020-05-08T18:50:17Z

Thanks @tchrisboles

We're just running some equivalence tests across a few GPUs here. All our work was reported using 3.4.5 - we will investigate the issues with guppy > 3.4.5 with ONT.

mattloose · 2020-05-08T19:08:17Z

So here is a comparison of a 1080 vs the GPU (GV100) in the GridION - as you can see for guppy 3.4.5 performance is roughly equivalent, but guppy 3.6 performance is not sufficient for real time calling. We suspect some underlying issue that can be resolved but for now recommend guppy 3.4.5. You can have two versions of guppy running side by side as required.

ArtemPalanaria · 2020-05-08T19:58:02Z

Dear Matt. I got similar results as Chris using guppy 3.4.5.
Run.txt

I got similar results as Chris using guppy 3.4.5
I also wanted to know - can I use any fast files in the quality of the bulk file, or do I need to prepare them somehow?
And do not tell me an example of setting up library depletion for the human genome (for enriching the metagenome)?
Thanks for the help. I am very glad that everything worked!

mattloose · 2020-05-09T07:32:23Z

Hi - you have to record a bulkfile from a run - you cannot use any fast5 file.

Look under the advanced file save options.

For depletion of a human genome you just need to configure your toml file to reject anything that maps to the reference you want to get rid off. Have a look at our paper for detailsl.

* Update README.md Closes #44 Uses BETA syntax see https://github.com/orgs/community/discussions/16925# thub pages. Adds a link to the Sphinx documentation for readfish on the looselab github pages * Exclude README.md from trailing-whitespace pre-commit Need trailing whitespace to render the warning boxes * Invert the notes about the FAQ and README

tchrisboles mentioned this issue May 8, 2020

Setting up another guppy server #39

Closed

mattloose closed this as completed May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem speed of mapping #44

Problem speed of mapping #44

ArtemPalanaria commented May 4, 2020

mattloose commented May 4, 2020

ArtemPalanaria commented May 4, 2020

ArtemPalanaria commented May 4, 2020

mattloose commented May 4, 2020

ArtemPalanaria commented May 4, 2020

mattloose commented May 4, 2020

mattloose commented May 4, 2020

ArtemPalanaria commented May 5, 2020

mattloose commented May 6, 2020

tchrisboles commented May 6, 2020

ArtemPalanaria commented May 7, 2020

vincentmanz commented May 8, 2020

mattloose commented May 8, 2020

mattloose commented May 8, 2020

tchrisboles commented May 8, 2020

tchrisboles commented May 8, 2020

mattloose commented May 8, 2020

tchrisboles commented May 8, 2020

tchrisboles commented May 8, 2020

mattloose commented May 8, 2020

mattloose commented May 8, 2020

ArtemPalanaria commented May 8, 2020

mattloose commented May 9, 2020

Problem speed of mapping #44

Problem speed of mapping #44

Comments

ArtemPalanaria commented May 4, 2020

mattloose commented May 4, 2020

ArtemPalanaria commented May 4, 2020

ArtemPalanaria commented May 4, 2020

mattloose commented May 4, 2020

ArtemPalanaria commented May 4, 2020

mattloose commented May 4, 2020

mattloose commented May 4, 2020

ArtemPalanaria commented May 5, 2020

mattloose commented May 6, 2020

tchrisboles commented May 6, 2020

ArtemPalanaria commented May 7, 2020

vincentmanz commented May 8, 2020

mattloose commented May 8, 2020

mattloose commented May 8, 2020

tchrisboles commented May 8, 2020

tchrisboles commented May 8, 2020

mattloose commented May 8, 2020

tchrisboles commented May 8, 2020

tchrisboles commented May 8, 2020

mattloose commented May 8, 2020

mattloose commented May 8, 2020

ArtemPalanaria commented May 8, 2020

mattloose commented May 9, 2020