+
+
+
+
+
+
+
++ nf-core/viralrecon report + +
+ + + + + + + + + + + + + +Report generated on 2024-05-07, 19:31 CEST based on data in: + +
-
+
+
/Users/vlad/git/website/public/examples/jupyter/data/assembly
+
+ /Users/vlad/git/website/public/examples/jupyter/data/kraken2
+
+ /Users/vlad/git/website/public/examples/jupyter/data/variants
+
+ /Users/vlad/git/website/public/examples/jupyter/data/fastqc
+
+
+ + + + + + + + + + + + + + +
nf-core/viralrecon summary
+ ++ De novo assembly metrics + +
+ +Summary of input reads, trimmed reads, and non-host reads. Generated by the nf-core/viralrecon pipeline
Sample Name | # Input reads | # Trimmed reads (Cutadapt) | # Mapped reads | % Mapped reads | % Non-host reads (Kraken 2) | # SNPs | # SNPs | # Contigs | Largest contig | Genome fraction | N50 | Pangolin lineage | Nextclade clade |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SAMPLE1_PE | 55442 | 24125 | 48013.0 | 99.5% | 100.0 | 7 | 1 | 1.0 | 29903.0 | 98.1% | 29903.0 | B.1 | 20A |
SAMPLE2_PE | 42962 | 19160 | 37942.0 | 98.9% | 99.8 | 7 | 1 | 1.0 | 29903.0 | 89.8% | 29903.0 | A.2 | 19B |
SAMPLE3_SE | 49202 | 46278.0 | 99.2% | 99.9 | 54 | 0 | 1.0 | 29903.0 | 97.4% | 29903.0 | B | 19A |
+ + + + + +
fastp
+ +0.23.2
+
+
+ fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...).DOI: 10.1093/bioinformatics/bty560.
+ + + + ++ Filtered Reads + +
+ +Filtering statistics of sampled reads.
+
+ Insert Sizes + +
+ +Insert size estimation of sampled reads.
+
+ Sequence Quality + +
+ +Average sequencing quality over each base of all reads.
+
+ GC Content + +
+ +Average GC content over each base of all reads.
+
+ N content + +
+ +Average N content over each base of all reads.
+ + + + + +
Bcftools
+ +1.16
+
+
+ Bcftools contains utilities for variant calling and manipulating VCFs and BCFs.DOI: 10.1093/gigascience/giab008.
+ + + + ++ Variant Substitution Types + +
+ + + + + ++
+ Variant Quality + +
+ + + + + ++
+ Indel Distribution + +
+ + + + + ++
+ Variant depths + +
+ +Read depth support distribution for called variants
+ + + + + +
Bowtie 2 / HiSAT2
+ +Bowtie 2 + and HISAT2 are fast + and memory-efficient tools for aligning sequencing reads against a reference genome. + Unfortunately both tools have identical log output by default, so it is impossible + to distiguish which tool was used. + .DOI: 10.1038/nmeth.1923; 10.1038/nmeth.3317; 10.1038/s41587-019-0201-4.
+ + + + ++ Single-end alignments + + + +
+ +This plot shows the number of reads aligning to the reference in different ways.
There are 3 possible types of alignment:
+-
+
- SE mapped uniquely: Read has only one occurence in the reference genome. +
- SE multimapped: Read has multiple occurence. +
- SE not aligned: Read has no occurence. +
+
+ Paired-end alignments + + + +
+ +This plot shows the number of reads aligning to the reference in different ways. +
There are 6 possible types of alignment:
+-
+
- PE mapped uniquely: Pair has only one occurence in the reference genome. +
- PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair. +
- PE one mate mapped uniquely: One read of a pair has one occurence. +
- PE multimapped: Pair has multiple occurence. +
- PE one mate multimapped: One read of a pair has multiple occurence. +
- PE neither mate aligned: Pair has no occurence. +
+ + + + + +
Cutadapt
+ +4.2
+
+
+ Cutadapt is a tool to find and remove adapter sequences, primers, poly-A + tails and other types of unwanted sequence from your high-throughput + sequencing reads.DOI: 10.14806/ej.17.1.200.
+ + + + ++ Filtered Reads + +
+ +This plot shows the number of reads (SE) / pairs (PE) removed by Cutadapt.
+
+ Trimmed Sequence Lengths + + + +
+ +This plot shows the number of reads with certain lengths of adapter trimmed.
Obs/Exp shows the raw counts divided by the number expected due to sequencing errors. +A defined peak may be related to adapter length.
+See the cutadapt documentation +for more information on how these numbers are generated.
+ + + + + +
FastQC
+ +0.11.9
+
+
+ FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.
+ + + + ++ Sequence Counts + + + +
+ +Sequence counts for each sample. Duplicate read counts are an estimate only.
This plot show the total number of reads, broken down into unique and duplicate +if possible (only more recent versions of FastQC give duplicate info).
+You can read more about duplicate calculation in the +FastQC documentation. +A small part has been copied here for convenience:
+Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.
+The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
+
+ Sequence Quality Histograms + + + +
+ +The mean quality value across each base position in the read.
To enable multiple samples to be plotted on the same graph, only the mean quality +scores are plotted (unlike the box plots seen in FastQC reports).
+Taken from the FastQC help:
+The y-axis on the graph shows the quality scores. The higher the score, the better +the base call. The background of the graph divides the y axis into very good quality +calls (green), calls of reasonable quality (orange), and calls of poor quality (red). +The quality of calls on most platforms will degrade as the run progresses, so it is +common to see base calls falling into the orange area towards the end of a read.
+
+ Per Sequence Quality Scores + + + +
+ +The number of reads with average quality scores. Shows if a subset of reads has poor quality.
From the FastQC help:
+The per sequence quality score report allows you to see if a subset of your +sequences have universally low quality values. It is often the case that a +subset of sequences will have universally poor quality, however these should +represent only a small percentage of the total sequences.
+
+ Per Base Sequence Content + + + +
+ +The proportion of each base position for which each of the four normal DNA bases has been called.
To enable multiple samples to be shown in a single plot, the base composition data +is shown as a heatmap. The colours represent the balance between the four bases: +an even distribution should give an even muddy brown colour. Hover over the plot +to see the percentage of the four bases under the cursor.
+To see the data as a line plot, as in the original FastQC graph, click on a sample track.
+From the FastQC help:
+Per Base Sequence Content plots out the proportion of each base position in a +file for which each of the four normal DNA bases has been called.
+In a random library you would expect that there would be little to no difference +between the different bases of a sequence run, so the lines in this plot should +run parallel with each other. The relative amount of each base should reflect +the overall amount of these bases in your genome, but in any case they should +not be hugely imbalanced from each other.
+It's worth noting that some types of library will always produce biased sequence +composition, normally at the start of the read. Libraries produced by priming +using random hexamers (including nearly all RNA-Seq libraries) and those which +were fragmented using transposases inherit an intrinsic bias in the positions +at which reads start. This bias does not concern an absolute sequence, but instead +provides enrichement of a number of different K-mers at the 5' end of the reads. +Whilst this is a true technical bias, it isn't something which can be corrected +by trimming and in most cases doesn't seem to adversely affect the downstream +analysis.
Rollover for sample name
++
+ Per Sequence GC Content + + + +
+ +The average GC content of reads. Normal random library typically have a + roughly normal distribution of GC content.
From the FastQC help:
+This module measures the GC content across the whole length of each sequence +in a file and compares it to a modelled normal distribution of GC content.
+In a normal random library you would expect to see a roughly normal distribution +of GC content where the central peak corresponds to the overall GC content of +the underlying genome. Since we don't know the the GC content of the genome the +modal GC content is calculated from the observed data and used to build a +reference distribution.
+An unusually shaped distribution could indicate a contaminated library or +some other kinds of biased subset. A normal distribution which is shifted +indicates some systematic bias which is independent of base position. If there +is a systematic bias which creates a shifted normal distribution then this won't +be flagged as an error by the module since it doesn't know what your genome's +GC content should be.
+
+ Per Base N Content + + + +
+ +The percentage of base calls at each position for which an N
was called.
From the FastQC help:
+If a sequencer is unable to make a base call with sufficient confidence then it will
+normally substitute an N
rather than a conventional base call. This graph shows the
+percentage of base calls at each position for which an N
was called.
It's not unusual to see a very low proportion of Ns appearing in a sequence, especially +nearer the end of a sequence. However, if this proportion rises above a few percent +it suggests that the analysis pipeline was unable to interpret the data well enough to +make valid base calls.
+
+ Sequence Length Distribution + +
+ +The distribution of fragment sizes (read lengths) found. + See the FastQC help
+
+ Sequence Duplication Levels + + + +
+ +The relative level of duplication found for every sequence.
From the FastQC Help:
+In a diverse library most sequences will occur only once in the final set. +A low level of duplication may indicate a very high level of coverage of the +target sequence, but a high level of duplication is more likely to indicate +some kind of enrichment bias (eg PCR over amplification). This graph shows +the degree of duplication for every sequence in a library: the relative +number of sequences with different degrees of duplication.
+Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.
+The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
+In a properly diverse library most sequences should fall into the far left of the +plot in both the red and blue lines. A general level of enrichment, indicating broad +oversequencing in the library will tend to flatten the lines, lowering the low end +and generally raising other categories. More specific enrichments of subsets, or +the presence of low complexity contaminants will tend to produce spikes towards the +right of the plot.
+
+ Overrepresented sequences by sample + + + +
+ +The total amount of overrepresented sequences found in each library.
FastQC calculates and lists overrepresented sequences in FastQ files. It would not be +possible to show this for all samples in a MultiQC report, so instead this plot shows +the number of sequences categorized as overrepresented.
+Sometimes, a single sequence may account for a large number of reads in a dataset. +To show this, the bars are split into two: the first shows the overrepresented reads +that come from the single most common sequence. The second shows the total count +from all remaining overrepresented sequences.
+From the FastQC Help:
+A normal high-throughput library will contain a diverse set of sequences, with no +individual sequence making up a tiny fraction of the whole. Finding that a single +sequence is very overrepresented in the set either means that it is highly biologically +significant, or indicates that the library is contaminated, or not as diverse as you expected.
+FastQC lists all the sequences which make up more than 0.1% of the total. +To conserve memory only sequences which appear in the first 100,000 sequences are tracked +to the end of the file. It is therefore possible that a sequence which is overrepresented +but doesn't appear at the start of the file for some reason could be missed by this module.
+
+ Top overrepresented sequences + +
+ +Top overrepresented sequences across all samples. The table shows 20 +most overrepresented sequences across all samples, ranked by the number of samples they occur in.
Overrepresented sequence | Samples | Occurrences | % of all reads |
---|---|---|---|
AAGGTGTCTGCAATTCATAGCTCTTTTCAGAACGTTCCGTGTACCAAGCA | 7 | 2126 | 1.2040% |
ACAGTATTCTTTGCTATAGTAGTCGGCATAGATGCTTTAATTCTAGAATT | 7 | 3369 | 1.9080% |
ACTACCGAAGTTGTAGGAGACATTATACTTAAACCAGCAAATAATAGTTT | 7 | 3914 | 2.2167% |
ACTAGGTTCCATTGTTCAAGGAGCTTTTTAAGCTCTTCAACGGTAATAGT | 7 | 3031 | 1.7166% |
AGCAAAATGTTGGACTGAGACTGACCTTACTAAAGGACCTCATGAATTTT | 7 | 2350 | 1.3309% |
AGCCTCATAAAACTCAGGTTCCCAATACCTTGAAGTGTTATCATTAGTAA | 7 | 2565 | 1.4527% |
AGGAATTACTTGTGTATGCTGCTGACCCTGCTATGCACGCTGCTTCTGGT | 7 | 1935 | 1.0959% |
AGTGAAATTGGGCCTCATAGCACATTGGTAAACACCAGATGGTGAACCAT | 7 | 2029 | 1.1491% |
AGTTTCCACACAGACAGGCATTAATTTGCGTGTTTCTTCTGCATGTGCAA | 7 | 2103 | 1.1910% |
CACAAGTAGTGGCACCTTCTTTAGTCAAATTCTCAGTGCCACAAAATTCG | 7 | 2282 | 1.2924% |
CAGCCCCTATTAAACAGCCTGCACGTGTTTGAAAAACATTAGAACCTGTA | 7 | 2443 | 1.3836% |
CATCCAGATTCTGCCACTCTTGTTAGTGACATTGACATCACTTTCTTAAA | 7 | 2333 | 1.3213% |
CCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAACAA | 7 | 3210 | 1.8180% |
CGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATG | 7 | 2907 | 1.6464% |
CGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCAT | 7 | 1905 | 1.0789% |
CTTTTCTCCAAGCAGGGTTACGTGTAAGGAATTCTCTTACCACGCCTATT | 7 | 2451 | 1.3881% |
GGTGTATACTGCTGCCGTGAACATGAGCATGAAATTGCTTGGTACACGGA | 7 | 2360 | 1.3366% |
GTACGCGTTCCATGTGGTCATTCAATCCAGAAACTAACATTCTTCTCAAC | 7 | 2279 | 1.2907% |
TGAAATGGTGAATTGCCCTCGTATGTTCCAGAAGAGCAAGGTTCTTTTAA | 7 | 2693 | 1.5252% |
TGATTTGAGTGTTGTCAATGCCAGATTACGTGCTAAGCACTATGTGTACA | 7 | 2409 | 1.3643% |
+
+ Adapter Content + + + +
+ +The cumulative percentage count of the proportion of your + library which has seen each of the adapter sequences at each position.
Note that only samples with ≥ 0.1% adapter contamination are shown.
+There may be several lines per sample, as one is shown for each adapter +detected in the file.
+From the FastQC Help:
+The plot shows a cumulative percentage count of the proportion +of your library which has seen each of the adapter sequences at each position. +Once a sequence has been seen in a read it is counted as being present +right through to the end of the read so the percentages you see will only +increase as the read length goes on.
+
+ Status Checks + + + +
+ +Status for each FastQC section showing whether results seem entirely normal (green), +slightly abnormal (orange) or very unusual (red).
FastQC assigns a status for each section of the report. +These give a quick evaluation of whether the results of the analysis seem +entirely normal (green), slightly abnormal (orange) or very unusual (red).
+It is important to stress that although the analysis results appear to give a pass/fail result, +these evaluations must be taken in the context of what you expect from your library. +A 'normal' sample as far as FastQC is concerned is random and diverse. +Some experiments may be expected to produce libraries which are biased in particular ways. +You should treat the summary evaluations therefore as pointers to where you should concentrate +your attention and understand why your library may not look random and diverse.
+Specific guidance on how to interpret the output of each module can be found in the relevant +report section, or in the FastQC help.
+In this heatmap, we summarise all of these into a single heatmap for a quick overview. +Note that not all FastQC sections have plots in MultiQC reports, but all status checks +are shown in this heatmap.
+ + + + + +
Kraken
+ +Kraken is a taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.DOI: 10.1186/gb-2014-15-3-r46.
+ + + + ++ Top taxa + + + +
+ +The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. +The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. +The unclassified count is always shown across all taxa ranks.
+The total number of reads is approximated by dividing the number of unclassified
reads by the percentage of
+the library that they account for.
+Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts +in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
+Note that any taxon that does not exactly fit a taxon rank (eg. -
or G2
) is ignored.
+ + + + + + + + + +
Nextclade
+ +Nextclade does viral genome alignment, clade assignment, mutation calling, and quality checks.DOI: 10.21105/joss.03773.
+ + + + ++ Run table + +
+ + + + + +Sample Name | Clade | QC Overall Status | QC Missing Data Status | QC Mixed Sites Status |
---|---|---|---|---|
SAMPLE1_PE | 20A | good | good | good |
SAMPLE2_PE | 19B | bad | bad | good |
SAMPLE3_SE | 19A | good | good | good |
+ + + + + +
Pangolin
+ +4.2
+
+
+
+ Scorpio: 0.3.17
+
+
+
+ Constellations: 0.1.10
+
+
+ Pangolin uses variant calls to assign SARS-CoV-2 genome sequences to global lineages.DOI: 10.1093/ve/veab064.
+ + + + ++ Run table + + + +
+ +Statistics gathered from the input pangolin files. Hover over the column headers for descriptions and click Help for more in-depth documentation.
This table shows some of the metrics parsed by Pangolin. +Hover over the column headers to see a description of the contents. Longer help text for certain columns is shown below:
+-
+
- Conflict
-
+
- In the pangoLEARN decision tree model, a given sequence gets assigned to the most likely category based on known diversity.
+ If a sequence can fit into more than one category, the conflict score will be greater than
0
and reflect the number of categories the sequence could fit into. + If the conflict score is0
, this means that within the current decision tree there is only one category that the sequence could be assigned to.
+
+ - In the pangoLEARN decision tree model, a given sequence gets assigned to the most likely category based on known diversity.
+ If a sequence can fit into more than one category, the conflict score will be greater than
- Ambiguity score
-
+
- This score is a function of the quantity of missing data in a sequence.
+ It represents the proportion of relevant sites in a sequence which were imputed to the reference values.
+ A score of
1
indicates that no sites were imputed, while a score of0
indicates that more sites were imputed than were not imputed. + This score only includes sites which are used by the decision tree to classify a sequence.
+
+ - This score is a function of the quantity of missing data in a sequence.
+ It represents the proportion of relevant sites in a sequence which were imputed to the reference values.
+ A score of
- Scorpio conflict
-
+
- The conflict score is the proportion of defining variants which have the reference allele in the sequence. + Ambiguous/other non-ref/alt bases at each of the variant positions contribute only to the denominators of these scores. +
+ - Note
-
+
- If any conflicts from the decision tree, this field will output the alternative assignments. + If the sequence failed QC this field will describe why. + If the sequence met the SNP thresholds for scorpio to call a constellation, it’ll describe the exact SNP counts of Alt, Ref and Amb (Alternative, reference and ambiguous) alleles for that call. +
+
Sample Name | Lineage | Conflict | Ambiguity | S call | S support | S conflict | QC Status | QC Note | Note |
---|---|---|---|---|---|---|---|---|---|
SAMPLE1_PE | B.1 | 0.0 | Pass | Ambiguous content: 3% | Usher placements: B.1(1/1) | ||||
SAMPLE2_PE | A.2 | 0.0 | Pass | Ambiguous content: 11% | Usher placements: A.2(2/2) | ||||
SAMPLE3_SE | B | 0.3 | Pass | Ambiguous content: 4% | Usher placements: B(2/3) B.1(1/3) |
+ + + + + +
Picard
+ +Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
+ + + + ++ Alignment Summary + +
+ +Please note that Picard's read counts are divided by two for paired-end data. Total bases (including unaligned) is not provided.
+
+ Mean read length + +
+ +The mean read length of the set of reads examined.
+
+ Base Distribution + +
+ +Plot shows the distribution of bases by cycle.
+
+ Insert Size + +
+ +Plot shows the number of reads at a given insert size. Reads with different orientations are summed.
+
+ Mean Base Quality by Cycle + + + +
+ +Plot shows the mean base quality by cycle.
This metric gives an overall snapshot of sequencing machine performance. +For most types of sequencing data, the output is expected to show a slight +reduction in overall base quality scores towards the end of each read.
+Spikes in quality within reads are not expected and may indicate that technical +problems occurred during sequencing.
+
+ Base Quality Distribution + +
+ +Plot shows the count of each base quality score.
+ + + + + +
Samtools
+ +1.16.1
+
+
+
+ HTSlib: 1.16
+
+
+ Samtools is a suite of programs for interacting with high-throughput sequencing data.DOI: 10.1093/bioinformatics/btp352.
+ + + + ++ Percent mapped + + + +
+ +Alignment metrics from samtools stats
; mapped vs. unmapped reads vs. reads mapped with MQ0.
For a set of samples that have come from the same multiplexed library, +similar numbers of reads for each sample are expected. Large differences in numbers might +indicate issues during the library preparation process. Whilst large differences in read +numbers may be controlled for in downstream processings (e.g. read count normalisation), +you may wish to consider whether the read depths achieved have fallen below recommended +levels depending on the applications.
+Low alignment rates could indicate contamination of samples (e.g. adapter sequences), +low sequencing quality or other artefacts. These can be further investigated in the +sequence level QC (e.g. from FastQC).
+Reads mapped with MQ0 often indicate that the reads are ambiguously mapped to multiple +locations in the reference sequence. This can be due to repetitive regions in the genome, +the presence of alternative contigs in the reference, or due to reads that are too short +to be uniquely mapped. These reads are often filtered out in downstream analyses.
+
+ Alignment stats + +
+ +This module parses the output from samtools stats
. All numbers in millions.
+
+ Flagstat + +
+ +This module parses the output from samtools flagstat
+
+ Mapped reads per contig + +
+ +The samtools idxstats
tool counts the number of mapped reads per chromosome / contig. Chromosomes with < 0.1% of the total aligned reads are omitted from this plot.
+ + + + + +
SnpEff
+ +5.0e
+
+
+ SnpEff is a genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes). .DOI: 10.4161/fly.19695.
+ + + + ++ Variants by Genomic Region + + + +
+ +The stacked bar plot shows locations of detected variants in +the genome and the number of variants for each location.
The upstream and downstream interval size to detect these +genomic regions is 5000bp by default.
+
+ Variant Effects by Impact + + + +
+ +The stacked bar plot shows the putative impact of detected +variants and the number of variants for each impact.
There are four levels of impacts predicted by SnpEff:
+-
+
- High: High impact (like stop codon) +
- Moderate: Middle impact (like same type of amino acid substitution) +
- Low: Low impact (ie silence mutation) +
- Modifier: No impact +
+
+ Variants by Effect Types + + + +
+ +The stacked bar plot shows the effect of variants at protein +level and the number of variants for each effect type.
This plot shows the effect of variants with respect to +the mRNA.
+
+ Variants by Functional Class + + + +
+ +The stacked bar plot shows the effect of variants and +the number of variants for each effect type.
This plot shows the effect of variants on the translation of +the mRNA as protein. There are three possible cases:
+-
+
- Silent: The amino acid does not change. +
- Missense: The amino acid is different. +
- Nonsense: The variant generates a stop codon. +
+ + + + + +
VARIANTS: QUAST
+ +VARIANTS: QUAST This section of the report shows QUAST QC results for the consensus sequence.DOI: 10.1093/bioinformatics/btt086.
+ + + + ++ Assembly Statistics + +
+ + + + + +Sample Name | N50 (Kbp) | L50 (K) | Largest contig (Kbp) | Length (Mbp) | Misassemblies | Mismatches/100kbp | Indels/100kbp | Genome Fraction |
---|---|---|---|---|---|---|---|---|
SAMPLE1_PE | 21.0Kbp | 0.0K | 21.0Kbp | 0.0Mbp | 0 | 20.27 | 0.00 | 99.0% |
SAMPLE2_PE | 4.1Kbp | 0.0K | 10.4Kbp | 0.0Mbp | 0 | 38.24 | 3.48 | 96.0% |
SAMPLE3_SE | 8.8Kbp | 0.0K | 16.9Kbp | 0.0Mbp | 0 | 10.09 | 0.00 | 99.0% |
+
+ Number of Contigs + +
+ +This plot shows the number of contigs found for each assembly, broken + down by length.
+ + + + + +
ASSEMBLY: QUAST (SPAdes)
+ +ASSEMBLY: QUAST (SPAdes) This section of the report shows QUAST results from SPAdes de novo assembly.DOI: 10.1093/bioinformatics/btt086.
+ + + + ++ Assembly Statistics + +
+ + + + + +Sample Name | N50 (Kbp) | L50 (K) | Largest contig (Kbp) | Length (Mbp) | Misassemblies | Mismatches/100kbp | Indels/100kbp | Genome Fraction |
---|---|---|---|---|---|---|---|---|
SAMPLE1_PE | 21.0Kbp | 0.0K | 21.0Kbp | 0.0Mbp | 0 | 20.27 | 0.00 | 99.0% |
SAMPLE2_PE | 4.1Kbp | 0.0K | 10.4Kbp | 0.0Mbp | 0 | 38.24 | 3.48 | 96.0% |
SAMPLE3_SE | 8.8Kbp | 0.0K | 16.9Kbp | 0.0Mbp | 0 | 10.09 | 0.00 | 99.0% |
+
+ Number of Contigs + +
+ +This plot shows the number of contigs found for each assembly, broken + down by length.
+ + + + + +
Software Versions
+ +Software Versions lists versions of software tools extracted from file contents.
+ + + + +Group | Software | Version |
---|---|---|
Bcftools | Bcftools | 1.16 |
Cutadapt | Cutadapt | 4.2 |
FastQC | FastQC | 0.11.9 |
Pangolin | Constellations | 0.1.10 |
Pangolin | 4.2 | |
Scorpio | 0.3.17 | |
Samtools | HTSlib | 1.16 |
Samtools | 1.16.1 | |
SnpEff | SnpEff | 5.0e |
fastp | fastp | 0.23.2 |