Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no genome size estimation result #185

Open
AlcaArctica opened this issue Jan 5, 2024 · 1 comment
Open

no genome size estimation result #185

AlcaArctica opened this issue Jan 5, 2024 · 1 comment

Comments

@AlcaArctica
Copy link

AlcaArctica commented Jan 5, 2024

Hi, I am trying out kat, but I do not get a genome estimation result:
I am running: kat hist -m 21 -h 200 -t 38 -v -p png -o wGorVio_kat wGorVio_reads.jf

Here is the log file output:


        
Kmer Analysis Toolkit (KAT) V2.4.2

Running KAT in HIST mode
------------------------

Loading hashes into memory... done.  Time taken: 356.4s

Bining kmers ... done.  Time taken: 3.4s

Merging counts ... done.  Time taken: 0.0s

Saving results to disk ... done.  Time taken: 0.0s

Creating plot ...
Plotting histograms for: 1
201 element histogram file loaded.
Axis limits:
xmax: 201
ymax: 5188216.0
 done.  Time taken: 2.9s

Analysing peaks
---------------

Analysing distributions for: /lustre/projects/dazzlerAssembly/asm_wGorVio/hifi/qc/reads/kat/wGorVio_kat
Input file generated using K 21
Kmer coverage histogram file detected
Analysing spectra

Creating initial peaks ... done. 1 peaks initially created

  Index    Left    Mean    Right    StdDev      Max    Volume  Description
-------  ------  ------  -------  --------  -------  --------  -------------
      1      80     100      120        10  1421407         0  1/2X

Locally optimising each peak ... done.

  Index    Left    Mean    Right    StdDev      Max     Volume  Description
-------  ------  ------  -------  --------  -------  ---------  -------------
      1   24.02      99   173.98     37.49  1421406  132588455  1/2X

Fitting cumulative distribution to histogram by adjusting peaks ... done.

  Index    Left    Mean    Right    StdDev      Max     Volume  Description
-------  ------  ------  -------  --------  -------  ---------  -------------
      1   10.78      98   185.22     43.61  1421406  152072606  1/2X

Time taken:  0.2s

K-mer frequency spectra statistics
----------------------------------
K-value used: 21
Peaks in analysis: 1
Global minima @ Frequency=12x (490472)
Global maxima @ Frequency=200x (6033723)
Overall mean k-mer frequency: 98x

  Index    Left    Mean    Right    StdDev      Max     Volume  Description
-------  ------  ------  -------  --------  -------  ---------  -------------
      1   10.78      98   185.22     43.61  1421406  152072606  1/2X

Calculating genome statistics
-----------------------------
Assuming that homozygous peak is the largest in the spectra with frequency of: 98x
Homozygous peak index: 0
CAUTION: the following estimates are based on having a clean spectra and having identified the correct homozygous peak!
Estimated genome size: 0.00 Mbp

Creating plots
--------------

Plotting K-mer frequency distributions ... done.  Saved to: None


KAT HIST completed.
Total runtime: 364.3s

my results are:

{
    "k": 21,
    "nb_peaks": 1,
    "global_minima": {
        "freq": 12,
        "count": 490472
    },
    "global_maxima": {
        "freq": 200,
        "count": 6033723
    },
    "mean_freq": 98,
    "peaks": [
        {
            "mean_freq": 98.00000000000003,
            "stddev": 43.61217428090905,
            "count": 1421406,
            "volume": 152072606
        }
    ],
    "hom_peak": {
        "freq": 98,
        "index": 0
    },
    "est_genome_size": 0,
    "est_het_rate": 0.0

Why are the estimated genome size and the estimated het rate zero?
I though the histogram was looking fine
kat_hist_reads

@AlcaArctica
Copy link
Author

alright, I figured out that it is my setting of the -h parameter, which screws with the calculation of the genome size / heterozygosity. when I leave this parameter out, both are calculated without hitch (although the graph is prettier with ;)

guess that also answers my question here: #182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant