KAT comp - issue with big genome #165

matryoskina · 2022-03-15T20:00:12Z

Hi, I am trying to calculate the kmer profile of this 5.0 Gb genome. Here's the command:

kat comp -t 32 -m 17 -o genome1VSgenome2 -h 'fastq1_R1.fastq.gz fastq1_R.fastq.gz fastq2_R1.fastq.gz fastq2_R.fastq.gz fastq3_R1.fastq.gz fastq3_R3.fastq.gz' genome1.fa genome2.fa

The problem is that the genome statistics are not correct, the final genome size estimate ends up being 0.90 Mb, and the plot is just something weird (no peak detected).
I tried with different kmer values (17, 21, 51) but no change. I tried to set -H and -I to 1000000000 but no change.
Do you have suggestions?
I attach the log file
Thanks!
slurm-6387284.txt

jonwright99 · 2022-03-16T09:31:49Z

Hi, I think you have a problem with your command line. You should have the reads as the first parameter, then the genome as the second. You are including a third which makes comp function very differently. The log file looks like you are putting one assembly as the first parameter, another assembly as the second, and the reads as the third which will give odd results.

matryoskina · 2022-03-16T19:14:21Z

Hi,
Thanks for your help! I rerun the analysis with only the fastq and one genome, but the problem is still there. No peak was found. Shall I increase the k-mer size? Or is there something else I am missing? I am attaching the new log file
Thanks!
slurm-6522577.txt
d

jonwright99 · 2022-03-17T08:05:52Z

Is there a plot created? If so, can you post it?

Also, can you rerun without using -h and, if you set -H you will speed up the run as it won't need to double the hash size many times to find the correct size. I use -H100000000000.

So your command line above should read;
kat comp -t 32 -m 17 -H100000000000 -o genome1VSgenome2 'fastq1_R1.fastq.gz fastq1_R.fastq.gz fastq2_R1.fastq.gz fastq2_R.fastq.gz fastq3_R1.fastq.gz fastq3_R3.fastq.gz' genome1.fa

matryoskina · 2022-03-21T14:49:07Z

There is no plot created from this job. I have one created from a previous run

jonwright99 · 2022-03-21T15:44:19Z

There's something very odd with your reads here, are they paired-end reads? Also, were all the fastq files you have included in the analysis the ones used to generate the assembly? I've seen these type of plots with no peak where the libraries either are not paired-end reads or they had multiple rounds of PCR before sequencing.

matryoskina · 2022-03-24T15:38:11Z

Yes, reads are all paired-ends. Regarding the assembly, well, the genome was assembled with long read and those short reads were used for misassemblies correction. Then I used an Hi-C library (Illumina paired-end) to get the chromosomes. Do you think I should use this library instead? Also, could I just compare two genomes without illumina reads?
Thanks

jonwright99 · 2022-03-24T17:12:13Z

Ah, that makes sense now. Do you know roughly the coverage of the paired-end reads that you used for misassemblies correction? I'm guessing it quite low and not enough to generate a peak on the plot. KAT is designed to compare an Illumina read dataset to an assembly generated from that dataset to show how the kmer content of the reads is represented in the assembly. Because your datasets have been used differently to generate an assembly, the plots are not working as intended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAT comp - issue with big genome #165

KAT comp - issue with big genome #165

matryoskina commented Mar 15, 2022

jonwright99 commented Mar 16, 2022

matryoskina commented Mar 16, 2022

jonwright99 commented Mar 17, 2022 •

edited

matryoskina commented Mar 21, 2022

jonwright99 commented Mar 21, 2022

matryoskina commented Mar 24, 2022

jonwright99 commented Mar 24, 2022

KAT comp - issue with big genome #165

KAT comp - issue with big genome #165

Comments

matryoskina commented Mar 15, 2022

jonwright99 commented Mar 16, 2022

matryoskina commented Mar 16, 2022

jonwright99 commented Mar 17, 2022 • edited

matryoskina commented Mar 21, 2022

jonwright99 commented Mar 21, 2022

matryoskina commented Mar 24, 2022

jonwright99 commented Mar 24, 2022

jonwright99 commented Mar 17, 2022 •

edited