diff --git a/public/examples/jupyter/.gitignore b/public/examples/jupyter/.gitignore new file mode 100644 index 0000000..87620ac --- /dev/null +++ b/public/examples/jupyter/.gitignore @@ -0,0 +1 @@ +.ipynb_checkpoints/ diff --git a/public/examples/viralrecon/data.zip b/public/examples/jupyter/data.zip similarity index 100% rename from public/examples/viralrecon/data.zip rename to public/examples/jupyter/data.zip diff --git a/public/examples/viralrecon/multiqc_config_illumina.yml b/public/examples/jupyter/multiqc_config_illumina.yml similarity index 100% rename from public/examples/viralrecon/multiqc_config_illumina.yml rename to public/examples/jupyter/multiqc_config_illumina.yml diff --git a/public/examples/jupyter/multiqc_report.html b/public/examples/jupyter/multiqc_report.html new file mode 100644 index 0000000..76adcba --- /dev/null +++ b/public/examples/jupyter/multiqc_report.html @@ -0,0 +1,10299 @@ + + + + + + + + + + + + + +nf-core/viralrecon report: MultiQC Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ + + + + + +

+ +

nf-core/viralrecon report

+ +

Loading report..

+ +
+ +
+
+ + + +
+ + + + +
+ + + + +
+

+ + Highlight Samples +

+ +
+ + + +
+

+ Regex mode off + + +

+
    +
    + + +
    +

    + + Rename Samples +

    + +
    + + + +
    +

    Click here for bulk input.

    +
    +

    Paste two columns of a tab-delimited table here (eg. from Excel).

    +

    First column should be the old name, second column the new name.

    +
    + + +
    +
    +

    + Regex mode off + + +

    +
      +
      + + +
      +

      + + Show / Hide Samples +

      + +
      +
      + +
      +
      + +
      +
      + + +
      +
      +

      Warning! This can take a few seconds.

      +

      + Regex mode off + + +

      +
        +
        + + +
        +

        Export Plots

        +
        + +
        +
        +
        +
        +
        + + px +
        +
        +
        +
        + + px +
        +
        +
        +
        +
        + +
        +
        + +
        +
        +
        +
        + +
        +
        +
        + + X +
        +
        +
        +
        + +
        +

        Download the raw data used to create the plots in this report below:

        +
        +
        + +
        +
        + +
        +
        + +

        Note that additional data was saved in multiqc_report_data when this report was generated.

        + +
        +
        +
        + +
        +
        Choose Plots
        + + +
        + +
        + +

        If you use plots from MultiQC in a publication or presentation, please cite:

        +
        + MultiQC: Summarize analysis results for multiple tools and samples in a single report
        + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        + Bioinformatics (2016)
        + doi: 10.1093/bioinformatics/btw354
        + PMID: 27312411 +
        +
        +
        + + +
        +

        Save Settings

        +

        You can save the toolbox settings for this report to the browser.

        +
        + + +
        +
        + +

        Load Settings

        +

        Choose a saved report profile from the dropdown box below:

        +
        +
        + +
        +
        + + + + +
        +
        +
        + + +
        +

        Tool Citations

        +

        Please remember to cite the tools that you use in your analysis.

        +

        To help with this, you can download publication details of the tools mentioned in this report:

        +

        +

        +
        + + +
        +

        About MultiQC

        +

        This report was generated using MultiQC, version 1.22.dev0

        +

        You can see a YouTube video describing how to use MultiQC reports here: + https://youtu.be/qPbIlO_KWN0

        +

        For more information about MultiQC, including other videos and + extensive documentation, please visit http://multiqc.info

        +

        You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: + https://github.com/MultiQC/MultiQC

        +

        MultiQC is published in Bioinformatics:

        +
        + MultiQC: Summarize analysis results for multiple tools and samples in a single report
        + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        + Bioinformatics (2016)
        + doi: 10.1093/bioinformatics/btw354
        + PMID: 27312411 +
        +
        + + +
        + +
        + +
        + + +
        + + + +

        + + + + + + +

        +

        + nf-core/viralrecon report + +

        + + + + + + + + + + + + + +
        +

        Report generated on 2024-05-07, 19:31 CEST based on data in: + +

        + +
        + + + +
        + + + + + + + + + + + + + + +
        + + +
        +
        +

        nf-core/viralrecon summary

        + +
        + + + + + +
        + +

        + De novo assembly metrics + +

        + +

        Summary of input reads, trimmed reads, and non-host reads. Generated by the nf-core/viralrecon pipeline

        + + + +
        +
        +
        + + + + + + + + + + + + + + + + + + + + Showing 3/3 rows and 13/13 columns. + +
        +
        + +
        +
        + +
        Sample Name# Input reads# Trimmed reads (Cutadapt)# Mapped reads% Mapped reads% Non-host reads (Kraken 2)# SNPs# SNPs# ContigsLargest contigGenome fractionN50Pangolin lineageNextclade clade
        SAMPLE1_PE
        55442
        24125
        48013.0
        99.5%
        100.0
        7
        1
        1.0
        29903.0
        98.1%
        29903.0
        B.1
        20A
        SAMPLE2_PE
        42962
        19160
        37942.0
        98.9%
        99.8
        7
        1
        1.0
        29903.0
        89.8%
        29903.0
        A.2
        19B
        SAMPLE3_SE
        49202
        46278.0
        99.2%
        99.9
        54
        0
        1.0
        29903.0
        97.4%
        29903.0
        B
        19A
        + +
        + + + +
        + + +
        + + +
        +
        + + + + + +
        + + +
        +
        +

        fastp

        + +
        + + + Version: 0.23.2 + + +
        + +
        +

        fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...).DOI: 10.1093/bioinformatics/bty560.

        + + + + +
        + +

        + Filtered Reads + +

        + +

        Filtering statistics of sampled reads.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Insert Sizes + +

        + +

        Insert size estimation of sampled reads.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Sequence Quality + +

        + +

        Average sequencing quality over each base of all reads.

        + + + +
        +
        +
        + + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + GC Content + +

        + +

        Average GC content over each base of all reads.

        + + + +
        +
        +
        + + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + N content + +

        + +

        Average N content over each base of all reads.

        + + + +
        +
        +
        + + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        Bcftools

        + +
        + + + Version: 1.16 + + +
        + +
        +

        Bcftools contains utilities for variant calling and manipulating VCFs and BCFs.DOI: 10.1093/gigascience/giab008.

        + + + + +
        + +

        + Variant Substitution Types + +

        + + + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Variant Quality + +

        + + + + + +
        +
        +
        + + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Indel Distribution + +

        + + + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Variant depths + +

        + +

        Read depth support distribution for called variants

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        Bowtie 2 / HiSAT2

        + +
        +

        Bowtie 2 + and HISAT2 are fast + and memory-efficient tools for aligning sequencing reads against a reference genome. + Unfortunately both tools have identical log output by default, so it is impossible + to distiguish which tool was used. + .DOI: 10.1038/nmeth.1923; 10.1038/nmeth.3317; 10.1038/s41587-019-0201-4.

        + + + + +
        + +

        + Single-end alignments + + + +

        + +

        This plot shows the number of reads aligning to the reference in different ways.

        + + +
        +

        There are 3 possible types of alignment:

        +
          +
        • SE mapped uniquely: Read has only one occurence in the reference genome.
        • +
        • SE multimapped: Read has multiple occurence.
        • +
        • SE not aligned: Read has no occurence.
        • +
        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Paired-end alignments + + + +

        + +

        This plot shows the number of reads aligning to the reference in different ways. +

        Please note that single mate alignment counts are halved to tally with pair counts properly.

        + + +
        +

        There are 6 possible types of alignment:

        +
          +
        • PE mapped uniquely: Pair has only one occurence in the reference genome.
        • +
        • PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair.
        • +
        • PE one mate mapped uniquely: One read of a pair has one occurence.
        • +
        • PE multimapped: Pair has multiple occurence.
        • +
        • PE one mate multimapped: One read of a pair has multiple occurence.
        • +
        • PE neither mate aligned: Pair has no occurence.
        • +
        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        Cutadapt

        + +
        + + + Version: 4.2 + + +
        + +
        +

        Cutadapt is a tool to find and remove adapter sequences, primers, poly-A + tails and other types of unwanted sequence from your high-throughput + sequencing reads.DOI: 10.14806/ej.17.1.200.

        + + + + +
        + +

        + Filtered Reads + +

        + +

        This plot shows the number of reads (SE) / pairs (PE) removed by Cutadapt.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Trimmed Sequence Lengths + + + +

        + +

        This plot shows the number of reads with certain lengths of adapter trimmed.

        + + +
        +

        Obs/Exp shows the raw counts divided by the number expected due to sequencing errors. +A defined peak may be related to adapter length.

        +

        See the cutadapt documentation +for more information on how these numbers are generated.

        +
        + + +
        +
        +
        + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        FastQC

        + +
        + + + Version: 0.11.9 + + +
        + +
        +

        FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

        + + + + +
        + +

        + Sequence Counts + + + +

        + +

        Sequence counts for each sample. Duplicate read counts are an estimate only.

        + + +
        +

        This plot show the total number of reads, broken down into unique and duplicate +if possible (only more recent versions of FastQC give duplicate info).

        +

        You can read more about duplicate calculation in the +FastQC documentation. +A small part has been copied here for convenience:

        +

        Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

        +

        The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Sequence Quality Histograms + + + +

        + +

        The mean quality value across each base position in the read.

        + + +
        +

        To enable multiple samples to be plotted on the same graph, only the mean quality +scores are plotted (unlike the box plots seen in FastQC reports).

        +

        Taken from the FastQC help:

        +

        The y-axis on the graph shows the quality scores. The higher the score, the better +the base call. The background of the graph divides the y axis into very good quality +calls (green), calls of reasonable quality (orange), and calls of poor quality (red). +The quality of calls on most platforms will degrade as the run progresses, so it is +common to see base calls falling into the orange area towards the end of a read.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Per Sequence Quality Scores + + + +

        + +

        The number of reads with average quality scores. Shows if a subset of reads has poor quality.

        + + +
        +

        From the FastQC help:

        +

        The per sequence quality score report allows you to see if a subset of your +sequences have universally low quality values. It is often the case that a +subset of sequences will have universally poor quality, however these should +represent only a small percentage of the total sequences.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Per Base Sequence Content + + + +

        + +

        The proportion of each base position for which each of the four normal DNA bases has been called.

        + + +
        +

        To enable multiple samples to be shown in a single plot, the base composition data +is shown as a heatmap. The colours represent the balance between the four bases: +an even distribution should give an even muddy brown colour. Hover over the plot +to see the percentage of the four bases under the cursor.

        +

        To see the data as a line plot, as in the original FastQC graph, click on a sample track.

        +

        From the FastQC help:

        +

        Per Base Sequence Content plots out the proportion of each base position in a +file for which each of the four normal DNA bases has been called.

        +

        In a random library you would expect that there would be little to no difference +between the different bases of a sequence run, so the lines in this plot should +run parallel with each other. The relative amount of each base should reflect +the overall amount of these bases in your genome, but in any case they should +not be hugely imbalanced from each other.

        +

        It's worth noting that some types of library will always produce biased sequence +composition, normally at the start of the read. Libraries produced by priming +using random hexamers (including nearly all RNA-Seq libraries) and those which +were fragmented using transposases inherit an intrinsic bias in the positions +at which reads start. This bias does not concern an absolute sequence, but instead +provides enrichement of a number of different K-mers at the 5' end of the reads. +Whilst this is a true technical bias, it isn't something which can be corrected +by trimming and in most cases doesn't seem to adversely affect the downstream +analysis.

        +
        + + +
        +
        +
        + + Click a sample row to see a line plot for that dataset. +
        +
        Rollover for sample name
        +
        + Position: - +
        %T: -
        +
        %C: -
        +
        %A: -
        +
        %G: -
        +
        +
        +
        + +
        +
        +
        +
        + + + +
        +
        + + + + + + +
        + +

        + Per Sequence GC Content + + + +

        + +

        The average GC content of reads. Normal random library typically have a + roughly normal distribution of GC content.

        + + +
        +

        From the FastQC help:

        +

        This module measures the GC content across the whole length of each sequence +in a file and compares it to a modelled normal distribution of GC content.

        +

        In a normal random library you would expect to see a roughly normal distribution +of GC content where the central peak corresponds to the overall GC content of +the underlying genome. Since we don't know the the GC content of the genome the +modal GC content is calculated from the observed data and used to build a +reference distribution.

        +

        An unusually shaped distribution could indicate a contaminated library or +some other kinds of biased subset. A normal distribution which is shifted +indicates some systematic bias which is independent of base position. If there +is a systematic bias which creates a shifted normal distribution then this won't +be flagged as an error by the module since it doesn't know what your genome's +GC content should be.

        +
        + + +
        +
        +
        + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Per Base N Content + + + +

        + +

        The percentage of base calls at each position for which an N was called.

        + + +
        +

        From the FastQC help:

        +

        If a sequencer is unable to make a base call with sufficient confidence then it will +normally substitute an N rather than a conventional base call. This graph shows the +percentage of base calls at each position for which an N was called.

        +

        It's not unusual to see a very low proportion of Ns appearing in a sequence, especially +nearer the end of a sequence. However, if this proportion rises above a few percent +it suggests that the analysis pipeline was unable to interpret the data well enough to +make valid base calls.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Sequence Length Distribution + +

        + +

        The distribution of fragment sizes (read lengths) found. + See the FastQC help

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Sequence Duplication Levels + + + +

        + +

        The relative level of duplication found for every sequence.

        + + +
        +

        From the FastQC Help:

        +

        In a diverse library most sequences will occur only once in the final set. +A low level of duplication may indicate a very high level of coverage of the +target sequence, but a high level of duplication is more likely to indicate +some kind of enrichment bias (eg PCR over amplification). This graph shows +the degree of duplication for every sequence in a library: the relative +number of sequences with different degrees of duplication.

        +

        Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

        +

        The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        +

        In a properly diverse library most sequences should fall into the far left of the +plot in both the red and blue lines. A general level of enrichment, indicating broad +oversequencing in the library will tend to flatten the lines, lowering the low end +and generally raising other categories. More specific enrichments of subsets, or +the presence of low complexity contaminants will tend to produce spikes towards the +right of the plot.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Overrepresented sequences by sample + + + +

        + +

        The total amount of overrepresented sequences found in each library.

        + + +
        +

        FastQC calculates and lists overrepresented sequences in FastQ files. It would not be +possible to show this for all samples in a MultiQC report, so instead this plot shows +the number of sequences categorized as overrepresented.

        +

        Sometimes, a single sequence may account for a large number of reads in a dataset. +To show this, the bars are split into two: the first shows the overrepresented reads +that come from the single most common sequence. The second shows the total count +from all remaining overrepresented sequences.

        +

        From the FastQC Help:

        +

        A normal high-throughput library will contain a diverse set of sequences, with no +individual sequence making up a tiny fraction of the whole. Finding that a single +sequence is very overrepresented in the set either means that it is highly biologically +significant, or indicates that the library is contaminated, or not as diverse as you expected.

        +

        FastQC lists all the sequences which make up more than 0.1% of the total. +To conserve memory only sequences which appear in the first 100,000 sequences are tracked +to the end of the file. It is therefore possible that a sequence which is overrepresented +but doesn't appear at the start of the file for some reason could be missed by this module.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Top overrepresented sequences + +

        + +

        Top overrepresented sequences across all samples. The table shows 20 +most overrepresented sequences across all samples, ranked by the number of samples they occur in.

        + + + +
        +
        +
        + + + + + + + + + + + + + + + + + + + + Showing 20/20 rows and 3/3 columns. + +
        +
        + +
        +
        + +
        Overrepresented sequenceSamplesOccurrences% of all reads
        AAGGTGTCTGCAATTCATAGCTCTTTTCAGAACGTTCCGTGTACCAAGCA
        7
        2126
        1.2040%
        ACAGTATTCTTTGCTATAGTAGTCGGCATAGATGCTTTAATTCTAGAATT
        7
        3369
        1.9080%
        ACTACCGAAGTTGTAGGAGACATTATACTTAAACCAGCAAATAATAGTTT
        7
        3914
        2.2167%
        ACTAGGTTCCATTGTTCAAGGAGCTTTTTAAGCTCTTCAACGGTAATAGT
        7
        3031
        1.7166%
        AGCAAAATGTTGGACTGAGACTGACCTTACTAAAGGACCTCATGAATTTT
        7
        2350
        1.3309%
        AGCCTCATAAAACTCAGGTTCCCAATACCTTGAAGTGTTATCATTAGTAA
        7
        2565
        1.4527%
        AGGAATTACTTGTGTATGCTGCTGACCCTGCTATGCACGCTGCTTCTGGT
        7
        1935
        1.0959%
        AGTGAAATTGGGCCTCATAGCACATTGGTAAACACCAGATGGTGAACCAT
        7
        2029
        1.1491%
        AGTTTCCACACAGACAGGCATTAATTTGCGTGTTTCTTCTGCATGTGCAA
        7
        2103
        1.1910%
        CACAAGTAGTGGCACCTTCTTTAGTCAAATTCTCAGTGCCACAAAATTCG
        7
        2282
        1.2924%
        CAGCCCCTATTAAACAGCCTGCACGTGTTTGAAAAACATTAGAACCTGTA
        7
        2443
        1.3836%
        CATCCAGATTCTGCCACTCTTGTTAGTGACATTGACATCACTTTCTTAAA
        7
        2333
        1.3213%
        CCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAACAA
        7
        3210
        1.8180%
        CGACTACTAGCGTGCCTTTGTAAGCACAAGCTGATGAGTACGAACTTATG
        7
        2907
        1.6464%
        CGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCAT
        7
        1905
        1.0789%
        CTTTTCTCCAAGCAGGGTTACGTGTAAGGAATTCTCTTACCACGCCTATT
        7
        2451
        1.3881%
        GGTGTATACTGCTGCCGTGAACATGAGCATGAAATTGCTTGGTACACGGA
        7
        2360
        1.3366%
        GTACGCGTTCCATGTGGTCATTCAATCCAGAAACTAACATTCTTCTCAAC
        7
        2279
        1.2907%
        TGAAATGGTGAATTGCCCTCGTATGTTCCAGAAGAGCAAGGTTCTTTTAA
        7
        2693
        1.5252%
        TGATTTGAGTGTTGTCAATGCCAGATTACGTGCTAAGCACTATGTGTACA
        7
        2409
        1.3643%
        + +
        + + +
        +
        + + + + + + +
        + +

        + Adapter Content + + + +

        + +

        The cumulative percentage count of the proportion of your + library which has seen each of the adapter sequences at each position.

        + + +
        +

        Note that only samples with ≥ 0.1% adapter contamination are shown.

        +

        There may be several lines per sample, as one is shown for each adapter +detected in the file.

        +

        From the FastQC Help:

        +

        The plot shows a cumulative percentage count of the proportion +of your library which has seen each of the adapter sequences at each position. +Once a sequence has been seen in a read it is counted as being present +right through to the end of the read so the percentages you see will only +increase as the read length goes on.

        +
        + + +
        +
        No samples found with any adapter contamination > 0.1%
        + +
        +
        + + + + + + +
        + +

        + Status Checks + + + +

        + +

        Status for each FastQC section showing whether results seem entirely normal (green), +slightly abnormal (orange) or very unusual (red).

        + + +
        +

        FastQC assigns a status for each section of the report. +These give a quick evaluation of whether the results of the analysis seem +entirely normal (green), slightly abnormal (orange) or very unusual (red).

        +

        It is important to stress that although the analysis results appear to give a pass/fail result, +these evaluations must be taken in the context of what you expect from your library. +A 'normal' sample as far as FastQC is concerned is random and diverse. +Some experiments may be expected to produce libraries which are biased in particular ways. +You should treat the summary evaluations therefore as pointers to where you should concentrate +your attention and understand why your library may not look random and diverse.

        +

        Specific guidance on how to interpret the output of each module can be found in the relevant +report section, or in the FastQC help.

        +

        In this heatmap, we summarise all of these into a single heatmap for a quick overview. +Note that not all FastQC sections have plots in MultiQC reports, but all status checks +are shown in this heatmap.

        +
        + + +
        +
        + + + + +
        +
        + + + +
        +
        + + + +
        +
        + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        Kraken

        + +
        +

        Kraken is a taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.DOI: 10.1186/gb-2014-15-3-r46.

        + + + + +
        + +

        + Top taxa + + + +

        + +

        The number of reads falling into the top 5 taxa across different ranks.

        + + +
        +

        To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. +The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. +The unclassified count is always shown across all taxa ranks.

        +

        The total number of reads is approximated by dividing the number of unclassified reads by the percentage of +the library that they account for. +Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.

        +

        The category "Other" shows the difference between the above total read count and the sum of the read counts +in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.

        +

        Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.

        +
        + + +
        +
        +
        + + + + + + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + +
        + + +
        +
        + + + + + + + + + +
        + + +
        +
        +

        Nextclade

        + +
        +

        Nextclade does viral genome alignment, clade assignment, mutation calling, and quality checks.DOI: 10.21105/joss.03773.

        + + + + +
        + +

        + Run table + +

        + + + + + +
        +
        +
        + + + + + + + + + + + + + + + + + + + + Showing 3/3 rows and 4/29 columns. + +
        +
        + +
        +
        + +
        Sample NameCladeQC Overall StatusQC Missing Data StatusQC Mixed Sites Status
        SAMPLE1_PE20Agoodgoodgood
        SAMPLE2_PE19Bbadbadgood
        SAMPLE3_SE19Agoodgoodgood
        + +
        + + + +
        + + +
        + + +
        +
        + + + + + +
        + + +
        +
        +

        Pangolin

        + +
        + + + Version: 4.2 + + + + Scorpio: 0.3.17 + + + + Constellations: 0.1.10 + + +
        + +
        +

        Pangolin uses variant calls to assign SARS-CoV-2 genome sequences to global lineages.DOI: 10.1093/ve/veab064.

        + + + + +
        + +

        + Run table + + + +

        + +

        Statistics gathered from the input pangolin files. Hover over the column headers for descriptions and click Help for more in-depth documentation.

        + + +
        +

        This table shows some of the metrics parsed by Pangolin. +Hover over the column headers to see a description of the contents. Longer help text for certain columns is shown below:

        +
          +
        • Conflict
            +
          • In the pangoLEARN decision tree model, a given sequence gets assigned to the most likely category based on known diversity. + If a sequence can fit into more than one category, the conflict score will be greater than 0 and reflect the number of categories the sequence could fit into. + If the conflict score is 0, this means that within the current decision tree there is only one category that the sequence could be assigned to.
          • +
          +
        • +
        • Ambiguity score
            +
          • This score is a function of the quantity of missing data in a sequence. + It represents the proportion of relevant sites in a sequence which were imputed to the reference values. + A score of 1 indicates that no sites were imputed, while a score of 0 indicates that more sites were imputed than were not imputed. + This score only includes sites which are used by the decision tree to classify a sequence.
          • +
          +
        • +
        • Scorpio conflict
            +
          • The conflict score is the proportion of defining variants which have the reference allele in the sequence. + Ambiguous/other non-ref/alt bases at each of the variant positions contribute only to the denominators of these scores.
          • +
          +
        • +
        • Note
            +
          • If any conflicts from the decision tree, this field will output the alternative assignments. + If the sequence failed QC this field will describe why. + If the sequence met the SNP thresholds for scorpio to call a constellation, it’ll describe the exact SNP counts of Alt, Ref and Amb (Alternative, reference and ambiguous) alleles for that call.
          • +
          +
        • +
        +
        + + +
        +
        +
        + + + + + + + + + + + + + + + + + + + + Showing 3/3 rows and 9/13 columns. + +
        +
        + +
        +
        + +
        Sample NameLineageConflictAmbiguityS callS supportS conflictQC StatusQC NoteNote
        SAMPLE1_PEB.1
        0.0
        PassAmbiguous content: 3%Usher placements: B.1(1/1)
        SAMPLE2_PEA.2
        0.0
        PassAmbiguous content: 11%Usher placements: A.2(2/2)
        SAMPLE3_SEB
        0.3
        PassAmbiguous content: 4%Usher placements: B(2/3) B.1(1/3)
        + +
        + + + +
        + + +
        + + +
        +
        + + + + + +
        + + +
        +
        +

        Picard

        + +
        +

        Picard is a set of Java command line tools for manipulating high-throughput sequencing data.

        + + + + +
        + +

        + Alignment Summary + +

        + +

        Please note that Picard's read counts are divided by two for paired-end data. Total bases (including unaligned) is not provided.

        + + + +
        +
        +
        + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Mean read length + +

        + +

        The mean read length of the set of reads examined.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Base Distribution + +

        + +

        Plot shows the distribution of bases by cycle.

        + + + +
        +
        +
        + + + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Insert Size + +

        + +

        Plot shows the number of reads at a given insert size. Reads with different orientations are summed.

        + + + +
        +
        +
        + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Mean Base Quality by Cycle + + + +

        + +

        Plot shows the mean base quality by cycle.

        + + +
        +

        This metric gives an overall snapshot of sequencing machine performance. +For most types of sequencing data, the output is expected to show a slight +reduction in overall base quality scores towards the end of each read.

        +

        Spikes in quality within reads are not expected and may indicate that technical +problems occurred during sequencing.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Base Quality Distribution + +

        + +

        Plot shows the count of each base quality score.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        Samtools

        + +
        + + + Version: 1.16.1 + + + + HTSlib: 1.16 + + +
        + +
        +

        Samtools is a suite of programs for interacting with high-throughput sequencing data.DOI: 10.1093/bioinformatics/btp352.

        + + + + +
        + +

        + Percent mapped + + + +

        + +

        Alignment metrics from samtools stats; mapped vs. unmapped reads vs. reads mapped with MQ0.

        + + +
        +

        For a set of samples that have come from the same multiplexed library, +similar numbers of reads for each sample are expected. Large differences in numbers might +indicate issues during the library preparation process. Whilst large differences in read +numbers may be controlled for in downstream processings (e.g. read count normalisation), +you may wish to consider whether the read depths achieved have fallen below recommended +levels depending on the applications.

        +

        Low alignment rates could indicate contamination of samples (e.g. adapter sequences), +low sequencing quality or other artefacts. These can be further investigated in the +sequence level QC (e.g. from FastQC).

        +

        Reads mapped with MQ0 often indicate that the reads are ambiguously mapped to multiple +locations in the reference sequence. This can be due to repetitive regions in the genome, +the presence of alternative contigs in the reference, or due to reads that are too short +to be uniquely mapped. These reads are often filtered out in downstream analyses.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Alignment stats + +

        + +

        This module parses the output from samtools stats. All numbers in millions.

        + + + +
        +
        + + + + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + +
        + + +
        +
        + + + + + + +
        + +

        + Flagstat + +

        + +

        This module parses the output from samtools flagstat

        + + + +
        +
        + + + + +
        + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + +
        + + +
        +
        + + + + + + +
        + +

        + Mapped reads per contig + +

        + +

        The samtools idxstats tool counts the number of mapped reads per chromosome / contig. Chromosomes with < 0.1% of the total aligned reads are omitted from this plot.

        + + + +
        +
        + +
        + + + +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        SnpEff

        + +
        + + + Version: 5.0e + + +
        + +
        +

        SnpEff is a genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes). .DOI: 10.4161/fly.19695.

        + + + + +
        + +

        + Variants by Genomic Region + + + +

        + +

        The stacked bar plot shows locations of detected variants in +the genome and the number of variants for each location.

        + + +
        +

        The upstream and downstream interval size to detect these +genomic regions is 5000bp by default.

        +
        + + +
        +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Variant Effects by Impact + + + +

        + +

        The stacked bar plot shows the putative impact of detected +variants and the number of variants for each impact.

        + + +
        +

        There are four levels of impacts predicted by SnpEff:

        +
          +
        • High: High impact (like stop codon)
        • +
        • Moderate: Middle impact (like same type of amino acid substitution)
        • +
        • Low: Low impact (ie silence mutation)
        • +
        • Modifier: No impact
        • +
        +
        + + +
        +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Variants by Effect Types + + + +

        + +

        The stacked bar plot shows the effect of variants at protein +level and the number of variants for each effect type.

        + + +
        +

        This plot shows the effect of variants with respect to +the mRNA.

        +
        + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + +
        +
        + + + + + + +
        + +

        + Variants by Functional Class + + + +

        + +

        The stacked bar plot shows the effect of variants and +the number of variants for each effect type.

        + + +
        +

        This plot shows the effect of variants on the translation of +the mRNA as protein. There are three possible cases:

        +
          +
        • Silent: The amino acid does not change.
        • +
        • Missense: The amino acid is different.
        • +
        • Nonsense: The variant generates a stop codon.
        • +
        +
        + + +
        +
        + + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        VARIANTS: QUAST

        + +
        +

        VARIANTS: QUAST This section of the report shows QUAST QC results for the consensus sequence.DOI: 10.1093/bioinformatics/btt086.

        + + + + +
        + +

        + Assembly Statistics + +

        + + + + + +
        +
        +
        + + + + + + + + + + + + + + + + + + + + Showing 3/3 rows and 8/8 columns. + +
        +
        + +
        +
        + +
        Sample NameN50 (Kbp)L50 (K)Largest contig (Kbp)Length (Mbp)MisassembliesMismatches/100kbpIndels/100kbpGenome Fraction
        SAMPLE1_PE
        21.0Kbp
        0.0K
        21.0Kbp
        0.0Mbp
        0
        20.27
        0.00
        99.0%
        SAMPLE2_PE
        4.1Kbp
        0.0K
        10.4Kbp
        0.0Mbp
        0
        38.24
        3.48
        96.0%
        SAMPLE3_SE
        8.8Kbp
        0.0K
        16.9Kbp
        0.0Mbp
        0
        10.09
        0.00
        99.0%
        + +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Number of Contigs + +

        + +

        This plot shows the number of contigs found for each assembly, broken + down by length.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        ASSEMBLY: QUAST (SPAdes)

        + +
        +

        ASSEMBLY: QUAST (SPAdes) This section of the report shows QUAST results from SPAdes de novo assembly.DOI: 10.1093/bioinformatics/btt086.

        + + + + +
        + +

        + Assembly Statistics + +

        + + + + + +
        +
        +
        + + + + + + + + + + + + + + + + + + + + Showing 3/3 rows and 8/8 columns. + +
        +
        + +
        +
        + +
        Sample NameN50 (Kbp)L50 (K)Largest contig (Kbp)Length (Mbp)MisassembliesMismatches/100kbpIndels/100kbpGenome Fraction
        SAMPLE1_PE
        21.0Kbp
        0.0K
        21.0Kbp
        0.0Mbp
        0
        20.27
        0.00
        99.0%
        SAMPLE2_PE
        4.1Kbp
        0.0K
        10.4Kbp
        0.0Mbp
        0
        38.24
        3.48
        96.0%
        SAMPLE3_SE
        8.8Kbp
        0.0K
        16.9Kbp
        0.0Mbp
        0
        10.09
        0.00
        99.0%
        + +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Number of Contigs + +

        + +

        This plot shows the number of contigs found for each assembly, broken + down by length.

        + + + +
        +
        + + + +
        +
        + + +
        +
        +
        Created with MultiQC
        +
        + + + +
        + + + +
        +
        + + + + + +
        + + +
        +
        +

        Software Versions

        + +
        +

        Software Versions lists versions of software tools extracted from file contents.

        + + + + +
        + + + + + +
        + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        GroupSoftwareVersion
        BcftoolsBcftools1.16
        CutadaptCutadapt4.2
        FastQCFastQC0.11.9
        PangolinConstellations0.1.10
        Pangolin4.2
        Scorpio0.3.17
        SamtoolsHTSlib1.16
        Samtools1.16.1
        SnpEffSnpEff5.0e
        fastpfastp0.23.2
        + + +
        + + +
        + + +
        + + + + + +
        + + + + + + + + + + + + + + + + diff --git a/public/examples/jupyter/multiqc_report.zip b/public/examples/jupyter/multiqc_report.zip new file mode 100644 index 0000000..1f590b5 Binary files /dev/null and b/public/examples/jupyter/multiqc_report.zip differ diff --git a/public/examples/jupyter/notebook.html b/public/examples/jupyter/notebook.html new file mode 100644 index 0000000..3f1addc --- /dev/null +++ b/public/examples/jupyter/notebook.html @@ -0,0 +1,9081 @@ + + + + + +notebook + + + + + + + + + + + + +
        +
        + +
        +
        + +
        + +
        +
        + +
        + +
        +
        + +
        + + +
        +
        + +
        + +
        +
        + +
        + + +
        +
        + +
        +
        + +
        + + +
        +
        + +
        + + +
        + + +
        + + +
        + + +
        +
        + +
        + + +
        +
        + +
        + + +
        +
        + +
        +
        + +
        +
        + +
        + + +
        + + +
        +
        + +
        + + +
        + + +
        +
        + +
        + + +
        +
        + +
        + + +
        +
        + +
        + + +
        +
        + +
        + +
        +
        + +
        + + +
        +
        + +
        + + +
        +
        + +
        + +
        +
        + +
        + + +
        +
        + + diff --git a/public/examples/viralrecon/multiqc_report.html b/public/examples/viralrecon/multiqc_report.html deleted file mode 100644 index 2d36a21..0000000 --- a/public/examples/viralrecon/multiqc_report.html +++ /dev/null @@ -1,9978 +0,0 @@ - - - - - - - - - - - - - -MultiQC Report - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        -
        -

        - - - - - - -

        - -

        Loading report..

        - -
        - -
        -
        - - - -
        - - - - -
        - - - - -
        -

        - - Highlight Samples -

        - -

        - - This report has flat image plots that won't be highlighted.
        - See the documentation - for help. -

        - -
        - - - -
        -

        - Regex mode off - - -

        -
          -
          - - -
          -

          - - Rename Samples -

          - -

          - - This report has flat image plots that won't be renamed.
          - See the documentation - for help. -

          - -
          - - - -
          -

          Click here for bulk input.

          -
          -

          Paste two columns of a tab-delimited table here (eg. from Excel).

          -

          First column should be the old name, second column the new name.

          -
          - - -
          -
          -

          - Regex mode off - - -

          -
            -
            - - -
            -

            - - Show / Hide Samples -

            - -

            - - This report has flat image plots that won't be hidden.
            - See the documentation - for help. -

            - -
            -
            - -
            -
            - -
            -
            - - -
            -
            -

            Warning! This can take a few seconds.

            -

            - Regex mode off - - -

            -
              -
              - - -
              -

              Export Plots

              -
              - -
              -
              -
              -
              -
              - - px -
              -
              -
              -
              - - px -
              -
              -
              -
              -
              - -
              -
              - -
              -
              -
              -
              - -
              -
              -
              - - X -
              -
              -
              -
              - -
              -

              Download the raw data used to create the plots in this report below:

              -
              -
              - -
              -
              - -
              -
              - -

              Note that additional data was saved in multiqc_data when this report was generated.

              - -
              -
              -
              - -
              -
              Choose Plots
              - - -
              - -
              - -

              If you use plots from MultiQC in a publication or presentation, please cite:

              -
              - MultiQC: Summarize analysis results for multiple tools and samples in a single report
              - Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
              - Bioinformatics (2016)
              - doi: 10.1093/bioinformatics/btw354
              - PMID: 27312411 -
              -
              -
              - - -
              -

              Save Settings

              -

              You can save the toolbox settings for this report to the browser.

              -
              - - -
              -
              - -

              Load Settings

              -

              Choose a saved report profile from the dropdown box below:

              -
              -
              - -
              -
              - - - - -
              -
              -
              - - -
              -

              Tool Citations

              -

              Please remember to cite the tools that you use in your analysis.

              -

              To help with this, you can download publication details of the tools mentioned in this report:

              -

              -

              -
              - - -
              -

              About MultiQC

              -

              This report was generated using MultiQC, version 1.14

              -

              You can see a YouTube video describing how to use MultiQC reports here: - https://youtu.be/qPbIlO_KWN0

              -

              For more information about MultiQC, including other videos and - extensive documentation, please visit http://multiqc.info

              -

              You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: - https://github.com/ewels/MultiQC

              -

              MultiQC is published in Bioinformatics:

              -
              - MultiQC: Summarize analysis results for multiple tools and samples in a single report
              - Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
              - Bioinformatics (2016)
              - doi: 10.1093/bioinformatics/btw354
              - PMID: 27312411 -
              -
              - -
              - -
              - - -
              - - - -

              - - - - -

              - - - -

              - A modular tool to aggregate results from bioinformatics analyses across many samples into a single report. -

              - - - -
              This report has been generated by the nf-core/viralrecon analysis pipeline. For information about how to interpret these results, please see the documentation. -
              - - - - - - - - - -
              -

              Report - - generated on 2024-04-26, 12:44 UTC - - - based on data in: - - /Users/vlad/git/viralrecon/work/4f/57d710487359b934b816642964a9b0

              - - -
              - - - - - - - -
              - - - - - - - - - - - - - -
              - - -
              -

              Variant calling metrics

              -

              generated by the nf-core/viralrecon pipeline.

              - - - - -
              - - - - -
              - - - - - - - - - Showing 3/3 rows and 15/15 columns. - -
              -
              - -
              Sample# Input reads% Non-host reads (Kraken 2)# Trimmed reads (fastp)# Mapped reads% Mapped reads# Trimmed reads (iVar)Coverage median% Coverage > 1x% Coverage > 10x# SNPs# INDELs# Missense variants# Ns per 100kb consensusPangolin lineageNextclade clade
              SAMPLE1_PE
              55442
              99.96
              48270
              48045
              99.53
              48013
              290.00
              99.00
              98.00
              7
              1
              3
              2193.76
              B.1
              20A
              SAMPLE2_PE
              42962
              99.78
              38404
              37980
              98.90
              37942
              181.00
              98.00
              90.00
              7
              1
              6
              10497.27
              A.2
              19B
              SAMPLE3_SE
              49202
              99.90
              46659
              46278
              99.18
              45981
              282.00
              99.00
              97.00
              54
              NA
              37
              3096.68
              B
              19A
              - -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              De novo assembly metrics

              -

              generated by the nf-core/viralrecon pipeline.

              - - - - -
              - - - - -
              - - - - - - - - - Showing 3/3 rows and 15/15 columns. - -
              -
              - -
              Sample# Input reads# Trimmed reads (Cutadapt)% Non-host reads (Kraken 2)# Contigs (SPAdes)Largest contig (SPAdes)% Genome fraction (SPAdes)N50 (SPAdes)# Contigs (Unicycler)Largest contig (Unicycler)% Genome fraction (Unicycler)N50 (Unicycler)# Contigs (minia)Largest contig (minia)% Genome fraction (minia)N50 (minia)
              SAMPLE1_PE
              55442
              48250
              99.96
              5
              20973
              98.99
              20973.00
              NA
              NA
              NA
              NA
              NA
              NA
              NA
              NA
              SAMPLE2_PE
              42962
              38320
              99.78
              13
              10409
              96.01
              4093.00
              NA
              NA
              NA
              NA
              NA
              NA
              NA
              NA
              SAMPLE3_SE
              49202
              46613
              99.90
              5
              16925
              98.97
              8770.00
              NA
              NA
              NA
              NA
              NA
              NA
              NA
              NA
              - -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              Amplicon coverage heatmap

              -

              Heatmap to show median log10(coverage+1) per amplicon across samples.

              - - - - -
              - - - - -
              -
              -
              - -
              -
              -
              - - - -
              -
              - - - -
              -
              -
              -
              - loading.. -
              -
              -
              -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              PREPROCESS: FastQC (raw reads)

              -

              PREPROCESS: FastQC (raw reads) This section of the report shows FastQC results for the raw reads before adapter trimming.

              - - - - -
              - -

              - Sequence Counts - - - -

              - -

              Sequence counts for each sample. Duplicate read counts are an estimate only.

              - - -
              -

              This plot show the total number of reads, broken down into unique and duplicate -if possible (only more recent versions of FastQC give duplicate info).

              -

              You can read more about duplicate calculation in the -FastQC documentation. -A small part has been copied here for convenience:

              -

              Only sequences which first appear in the first 100,000 sequences -in each file are analysed. This should be enough to get a good impression -for the duplication levels in the whole file. Each sequence is tracked to -the end of the file to give a representative count of the overall duplication level.

              -

              The duplication detection requires an exact sequence match over the whole length of -the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

              -
              - -
              - - -
              -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Sequence Quality Histograms - - - -

              - -

              The mean quality value across each base position in the read.

              - - -
              -

              To enable multiple samples to be plotted on the same graph, only the mean quality -scores are plotted (unlike the box plots seen in FastQC reports).

              -

              Taken from the FastQC help:

              -

              The y-axis on the graph shows the quality scores. The higher the score, the better -the base call. The background of the graph divides the y axis into very good quality -calls (green), calls of reasonable quality (orange), and calls of poor quality (red). -The quality of calls on most platforms will degrade as the run progresses, so it is -common to see base calls falling into the orange area towards the end of a read.

              -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Per Sequence Quality Scores - - - -

              - -

              The number of reads with average quality scores. Shows if a subset of reads has poor quality.

              - - -
              -

              From the FastQC help:

              -

              The per sequence quality score report allows you to see if a subset of your -sequences have universally low quality values. It is often the case that a -subset of sequences will have universally poor quality, however these should -represent only a small percentage of the total sequences.

              -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Per Base Sequence Content - - - -

              - -

              The proportion of each base position for which each of the four normal DNA bases has been called.

              - - -
              -

              To enable multiple samples to be shown in a single plot, the base composition data -is shown as a heatmap. The colours represent the balance between the four bases: -an even distribution should give an even muddy brown colour. Hover over the plot -to see the percentage of the four bases under the cursor.

              -

              To see the data as a line plot, as in the original FastQC graph, click on a sample track.

              -

              From the FastQC help:

              -

              Per Base Sequence Content plots out the proportion of each base position in a -file for which each of the four normal DNA bases has been called.

              -

              In a random library you would expect that there would be little to no difference -between the different bases of a sequence run, so the lines in this plot should -run parallel with each other. The relative amount of each base should reflect -the overall amount of these bases in your genome, but in any case they should -not be hugely imbalanced from each other.

              -

              It's worth noting that some types of library will always produce biased sequence -composition, normally at the start of the read. Libraries produced by priming -using random hexamers (including nearly all RNA-Seq libraries) and those which -were fragmented using transposases inherit an intrinsic bias in the positions -at which reads start. This bias does not concern an absolute sequence, but instead -provides enrichement of a number of different K-mers at the 5' end of the reads. -Whilst this is a true technical bias, it isn't something which can be corrected -by trimming and in most cases doesn't seem to adversely affect the downstream -analysis.

              -
              - -
              -
              -
              - - Click a sample row to see a line plot for that dataset. -
              -
              Rollover for sample name
              - -
              - Position: - -
              %T: -
              -
              %C: -
              -
              %A: -
              -
              %G: -
              -
              -
              -
              - -
              -
              -
              -
              - - - -
              -
              - - - - - - -
              - -

              - Per Sequence GC Content - - - -

              - -

              The average GC content of reads. Normal random library typically have a - roughly normal distribution of GC content.

              - - -
              -

              From the FastQC help:

              -

              This module measures the GC content across the whole length of each sequence -in a file and compares it to a modelled normal distribution of GC content.

              -

              In a normal random library you would expect to see a roughly normal distribution -of GC content where the central peak corresponds to the overall GC content of -the underlying genome. Since we don't know the the GC content of the genome the -modal GC content is calculated from the observed data and used to build a -reference distribution.

              -

              An unusually shaped distribution could indicate a contaminated library or -some other kinds of biased subset. A normal distribution which is shifted -indicates some systematic bias which is independent of base position. If there -is a systematic bias which creates a shifted normal distribution then this won't -be flagged as an error by the module since it doesn't know what your genome's -GC content should be.

              -
              - -
              - - -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Per Base N Content - - - -

              - -

              The percentage of base calls at each position for which an N was called.

              - - -
              -

              From the FastQC help:

              -

              If a sequencer is unable to make a base call with sufficient confidence then it will -normally substitute an N rather than a conventional base call. This graph shows the -percentage of base calls at each position for which an N was called.

              -

              It's not unusual to see a very low proportion of Ns appearing in a sequence, especially -nearer the end of a sequence. However, if this proportion rises above a few percent -it suggests that the analysis pipeline was unable to interpret the data well enough to -make valid base calls.

              -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Sequence Length Distribution - -

              - -

              The distribution of fragment sizes (read lengths) found. - See the FastQC help

              - - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Sequence Duplication Levels - - - -

              - -

              The relative level of duplication found for every sequence.

              - - -
              -

              From the FastQC Help:

              -

              In a diverse library most sequences will occur only once in the final set. -A low level of duplication may indicate a very high level of coverage of the -target sequence, but a high level of duplication is more likely to indicate -some kind of enrichment bias (eg PCR over amplification). This graph shows -the degree of duplication for every sequence in a library: the relative -number of sequences with different degrees of duplication.

              -

              Only sequences which first appear in the first 100,000 sequences -in each file are analysed. This should be enough to get a good impression -for the duplication levels in the whole file. Each sequence is tracked to -the end of the file to give a representative count of the overall duplication level.

              -

              The duplication detection requires an exact sequence match over the whole length of -the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

              -

              In a properly diverse library most sequences should fall into the far left of the -plot in both the red and blue lines. A general level of enrichment, indicating broad -oversequencing in the library will tend to flatten the lines, lowering the low end -and generally raising other categories. More specific enrichments of subsets, or -the presence of low complexity contaminants will tend to produce spikes towards the -right of the plot.

              -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Overrepresented sequences - - - -

              - -

              The total amount of overrepresented sequences found in each library.

              - - -
              -

              FastQC calculates and lists overrepresented sequences in FastQ files. It would not be -possible to show this for all samples in a MultiQC report, so instead this plot shows -the number of sequences categorized as over represented.

              -

              Sometimes, a single sequence may account for a large number of reads in a dataset. -To show this, the bars are split into two: the first shows the overrepresented reads -that come from the single most common sequence. The second shows the total count -from all remaining overrepresented sequences.

              -

              From the FastQC Help:

              -

              A normal high-throughput library will contain a diverse set of sequences, with no -individual sequence making up a tiny fraction of the whole. Finding that a single -sequence is very overrepresented in the set either means that it is highly biologically -significant, or indicates that the library is contaminated, or not as diverse as you expected.

              -

              FastQC lists all of the sequences which make up more than 0.1% of the total. -To conserve memory only sequences which appear in the first 100,000 sequences are tracked -to the end of the file. It is therefore possible that a sequence which is overrepresented -but doesn't appear at the start of the file for some reason could be missed by this module.

              -
              - -
              -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Adapter Content - - - -

              - -

              The cumulative percentage count of the proportion of your - library which has seen each of the adapter sequences at each position.

              - - -
              -

              Note that only samples with ≥ 0.1% adapter contamination are shown.

              -

              There may be several lines per sample, as one is shown for each adapter -detected in the file.

              -

              From the FastQC Help:

              -

              The plot shows a cumulative percentage count of the proportion -of your library which has seen each of the adapter sequences at each position. -Once a sequence has been seen in a read it is counted as being present -right through to the end of the read so the percentages you see will only -increase as the read length goes on.

              -
              - -
              No samples found with any adapter contamination > 0.1%
              - - -
              -
              - - - - - - -
              - -

              - Status Checks - - - -

              - -

              Status for each FastQC section showing whether results seem entirely normal (green), -slightly abnormal (orange) or very unusual (red).

              - - -
              -

              FastQC assigns a status for each section of the report. -These give a quick evaluation of whether the results of the analysis seem -entirely normal (green), slightly abnormal (orange) or very unusual (red).

              -

              It is important to stress that although the analysis results appear to give a pass/fail result, -these evaluations must be taken in the context of what you expect from your library. -A 'normal' sample as far as FastQC is concerned is random and diverse. -Some experiments may be expected to produce libraries which are biased in particular ways. -You should treat the summary evaluations therefore as pointers to where you should concentrate -your attention and understand why your library may not look random and diverse.

              -

              Specific guidance on how to interpret the output of each module can be found in the relevant -report section, or in the FastQC help.

              -

              In this heatmap, we summarise all of these into a single heatmap for a quick overview. -Note that not all FastQC sections have plots in MultiQC reports, but all status checks -are shown in this heatmap.

              -
              - -
              -
              -
              - -
              -
              -
              - - - -
              -
              - - - -
              -
              -
              -
              - loading.. -
              -
              -
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              PREPROCESS: fastp (adapter trimming)

              -

              PREPROCESS: fastp (adapter trimming) This section of the report shows fastp results for reads after adapter and quality trimming.DOI: 10.1093/bioinformatics/bty560.

              - - - - -
              - -

              - Filtered Reads - -

              - -

              Filtering statistics of sampled reads.

              - - -
              - - -
              -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Insert Sizes - -

              - -

              Insert size estimation of sampled reads.

              - - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Sequence Quality - -

              - -

              Average sequencing quality over each base of all reads.

              - - -
              - - - - -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - GC Content - -

              - -

              Average GC content over each base of all reads.

              - - -
              - - - - -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - N content - -

              - -

              Average N content over each base of all reads.

              - - -
              - - - - -
              - -
              loading..
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              PREPROCESS: Kraken 2

              -

              PREPROCESS: Kraken 2 This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp.DOI: 10.1186/gb-2014-15-3-r46.

              - - - - -
              - -

              - Top taxa - - - -

              - -

              The number of reads falling into the top 5 taxa across different ranks.

              - - -
              -

              To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. -The counts for these top five taxa are then plotted for each of the 9 different taxa ranks. -The unclassified count is always shown across all taxa ranks.

              -

              The total number of reads is approximated by dividing the number of unclassified reads by the percentage of -the library that they account for. -Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.

              -

              The category "Other" shows the difference between the above total read count and the sum of the read counts -in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.

              -

              Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.

              -
              - -
              - - -
                 
              - - - - - - - - -
              - -
              -
              loading..
              -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              VARIANTS: Bowtie 2

              -

              This section of the report shows Bowtie 2 mapping results for reads after adapter trimming and quality trimming.DOI: 10.1038/nmeth.1923; 10.1038/nmeth.3317; 10.1038/s41587-019-0201-4.

              - - - - -
              - -

              - Single-end alignments - - - -

              - -

              This plot shows the number of reads aligning to the reference in different ways.

              - - -
              -

              There are 3 possible types of alignment:

              -
                -
              • SE mapped uniquely: Read has only one occurence in the reference genome.
              • -
              • SE multimapped: Read has multiple occurence.
              • -
              • SE not aligned: Read has no occurence.
              • -
              -
              - -
              - - -
              -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Paired-end alignments - - - -

              - -

              This plot shows the number of reads aligning to the reference in different ways. -

              Please note that single mate alignment counts are halved to tally with pair counts properly.

              - - -
              -

              There are 6 possible types of alignment:

              -
                -
              • PE mapped uniquely: Pair has only one occurence in the reference genome.
              • -
              • PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair.
              • -
              • PE one mate mapped uniquely: One read of a pair has one occurence.
              • -
              • PE multimapped: Pair has multiple occurence.
              • -
              • PE one mate multimapped: One read of a pair has multiple occurence.
              • -
              • PE neither mate aligned: Pair has no occurence.
              • -
              -
              - -
              - - -
              -
              loading..
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              VARIANTS: SAMTools (raw)

              -

              Samtools This section of the report shows SAMTools counts/statistics after mapping with Bowtie 2.DOI: 10.1093/bioinformatics/btp352.

              - - - - -
              - -

              - Samtools Flagstat - -

              - -

              This module parses the output from samtools flagstat. All numbers in millions.

              - - -
              -
              loading..
              -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              VARIANTS: SAMTools (iVar)

              -

              Samtools This section of the report shows SAMTools counts/statistics after primer sequence removal with iVar.DOI: 10.1093/bioinformatics/btp352.

              - - - - -
              - -

              - Samtools Flagstat - -

              - -

              This module parses the output from samtools flagstat. All numbers in millions.

              - - -
              -
              loading..
              -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              VARIANTS: mosdepth

              -

              VARIANTS: mosdepth This section of the report shows genome-wide coverage metrics generated by mosdepth.DOI: 10.1093/bioinformatics/btx699.

              - - - - -
              - -

              - Cumulative coverage distribution - - - -

              - -

              Proportion of bases in the reference genome with, at least, a given depth of coverage

              - - -
              -

              For a set of DNA or RNA reads mapped to a reference sequence, such as a genome -or transcriptome, the depth of coverage at a given base position is the number -of high-quality reads that map to the reference at that position, while the -breadth of coverage is the fraction of the reference sequence to which reads -have been mapped with at least a given depth of coverage -(Sims et al. 2014).

              -

              Defining coverage breadth in terms of coverage depth is useful, because -sequencing experiments typically require a specific minimum depth of coverage -over the region of interest (Sims et al. 2014), so the extent of the reference sequence -that is amenable to analysis is constrained to lie within regions that have -sufficient depth. With inadequate sequencing breadth, it can be difficult to -distinguish the absence of a biological feature (such as a gene) from a lack -of data (Green 2007).

              -

              For increasing coverage depths (1×, 2×, …, N×), -coverage breadth is calculated as the percentage of the reference -sequence that is covered by at least that number of reads, then plots -coverage breadth (y-axis) against coverage depth (x-axis). This plot -shows the relationship between sequencing depth and breadth for each read -dataset, which can be used to gauge, for example, the likely effect of a -minimum depth filter on the fraction of a genome available for analysis.

              -
              - -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Coverage distribution - - - -

              - -

              Proportion of bases in the reference genome with a given depth of coverage

              - - -
              -

              For a set of DNA or RNA reads mapped to a reference sequence, such as a genome -or transcriptome, the depth of coverage at a given base position is the number -of high-quality reads that map to the reference at that position -(Sims et al. 2014).

              -

              Bases of a reference sequence (y-axis) are groupped by their depth of coverage -(0×, 1×, …, N×) (x-axis). This plot shows -the frequency of coverage depths relative to the reference sequence for each -read dataset, which provides an indirect measure of the level and variation of -coverage depth in the corresponding sequenced sample.

              -

              If reads are randomly distributed across the reference sequence, this plot -should resemble a Poisson distribution (Lander & Waterman 1988), with a peak indicating approximate -depth of coverage, and more uniform coverage depth being reflected in a narrower -spread. The optimal level of coverage depth depends on the aims of the -experiment, though it should at minimum be sufficiently high to adequately -address the biological question; greater uniformity of coverage is generally -desirable, because it increases breadth of coverage for a given depth of -coverage, allowing equivalent results to be achieved at a lower sequencing depth -(Sampson -et al. 2011; Sims -et al. 2014). However, it is difficult to achieve uniform coverage -depth in practice, due to biases introduced during sample preparation -(van -Dijk et al. 2014), sequencing (Ross et al. 2013) and read mapping -(Sims et al. 2014).

              -

              This plot may include a small peak for regions of the reference sequence with -zero depth of coverage. Such regions may be absent from the given sample (due -to a deletion or structural rearrangement), present in the sample but not -successfully sequenced (due to bias in sequencing or preparation), or sequenced -but not successfully mapped to the reference (due to the choice of mapping -algorithm, the presence of repeat sequences, or mismatches caused by variants -or sequencing errors). Related factors cause most datasets to contain some -unmapped reads (Sims -et al. 2014).

              -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Average coverage per contig - -

              - -

              Average coverage per contig or chromosome

              - - -
              - - -
              -
              loading..
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              VARIANTS: Total variants (iVar)

              -

              is calculated from the total number of variants called by iVar.

              - - - - -
              - - - - -
              - - -
              -
              loading..
              -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              VARIANTS: Pangolin

              -

              VARIANTS: Pangolin This section of the report shows Pangolin lineage analysis results for the called variants.DOI: 10.1093/ve/veab064.

              - - - - -
              - -

              - Run table - - - -

              - -

              Statistics gathered from the input pangolin files. Hover over the column headers for descriptions and click Help for more in-depth documentation.

              - - -
              -

              This table shows some of the metrics parsed by Pangolin. -Hover over the column headers to see a description of the contents. Longer help text for certain columns is shown below:

              -
                -
              • Conflict
                  -
                • In the pangoLEARN decision tree model, a given sequence gets assigned to the most likely category based on known diversity. - If a sequence can fit into more than one category, the conflict score will be greater than 0 and reflect the number of categories the sequence could fit into. - If the conflict score is 0, this means that within the current decision tree there is only one category that the sequence could be assigned to.
                • -
                -
              • -
              • Ambiguity score
                  -
                • This score is a function of the quantity of missing data in a sequence. - It represents the proportion of relevant sites in a sequence which were imputed to the reference values. - A score of 1 indicates that no sites were imputed, while a score of 0 indicates that more sites were imputed than were not imputed. - This score only includes sites which are used by the decision tree to classify a sequence.
                • -
                -
              • -
              • Scorpio conflict
                  -
                • The conflict score is the proportion of defining variants which have the reference allele in the sequence. - Ambiguous/other non-ref/alt bases at each of the variant positions contribute only to the denominators of these scores.
                • -
                -
              • -
              • Note
                  -
                • If any conflicts from the decision tree, this field will output the alternative assignments. - If the sequence failed QC this field will describe why. - If the sequence met the SNP thresholds for scorpio to call a constellation, it’ll describe the exact SNP counts of Alt, Ref and Amb (Alternative, reference and ambiguous) alleles for that call.
                • -
                -
              • -
              -
              - -
              - - - - - - - - - Showing 3/3 rows and 8/10 columns. - -
              -
              - -
              Sample NameLineageConflictAmbiguityS callS supportS conflictQC StatusNote
              SAMPLE1_PEB.1
              0.0
              PassUsher placements: B.1(1/1)
              SAMPLE2_PEA.2
              0.0
              PassUsher placements: A.2(2/2)
              SAMPLE3_SEB
              0.3
              PassUsher placements: B(2/3) B.1(1/3)
              - -
              - - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              VARIANTS: BCFTools

              -

              Bcftools This section of the report shows BCFTools stats results for the called variants.DOI: 10.1093/gigascience/giab008.

              - - - - -
              - -

              - Variant Substitution Types - -

              - - - - -
              - - -
              -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Variant Quality - -

              - - - - -
              - - - - -
              - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Indel Distribution - -

              - - - - -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Variant depths - -

              - -

              Read depth support distribution for called variants

              - - -
              loading..
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              VARIANTS: SnpEff

              -

              VARIANTS: SnpEff This section of the report shows SnpEff results for the called variants.DOI: 10.4161/fly.19695.

              - - - - -
              - -

              - Variants by Genomic Region - - - -

              - -

              The stacked bar plot shows locations of detected variants in -the genome and the number of variants for each location.

              - - -
              -

              The upstream and downstream interval size to detect these -genomic regions is 5000bp by default.

              -
              - -
              - - - -
              -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Variant Effects by Impact - - - -

              - -

              The stacked bar plot shows the putative impact of detected -variants and the number of variants for each impact.

              - - -
              -

              There are four levels of impacts predicted by SnpEff:

              -
                -
              • High: High impact (like stop codon)
              • -
              • Moderate: Middle impact (like same type of amino acid substitution)
              • -
              • Low: Low impact (ie silence mutation)
              • -
              • Modifier: No impact
              • -
              -
              - -
              - - - -
              -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Variants by Effect Types - - - -

              - -

              The stacked bar plot shows the effect of variants at protein -level and the number of variants for each effect type.

              - - -
              -

              This plot shows the effect of variants with respect to -the mRNA.

              -
              - -
              - - -
              -
              loading..
              -
              - - -
              -
              - - - - - - -
              - -

              - Variants by Functional Class - - - -

              - -

              The stacked bar plot shows the effect of variants and -the number of variants for each effect type.

              - - -
              -

              This plot shows the effect of variants on the translation of -the mRNA as protein. There are three possible cases:

              -
                -
              • Silent: The amino acid does not change.
              • -
              • Missense: The amino acid is different.
              • -
              • Nonsense: The variant generates a stop codon.
              • -
              -
              - -
              - - - -
              -
              loading..
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              ASSEMBLY: Cutadapt (primer trimming)

              -

              ASSEMBLY: Cutadapt (primer trimming) This section of the report shows Cutadapt results for reads after primer sequence trimming.DOI: 10.14806/ej.17.1.200.

              - - - - -
              - -

              - Filtered Reads - -

              - -

              This plot shows the number of reads (SE) / pairs (PE) removed by Cutadapt.

              - - -
              - - -
              -
              loading..
              -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Trimmed Sequence Lengths - - - -

              - -

              This plot shows the number of reads with certain lengths of adapter trimmed.

              - - -
              -

              Obs/Exp shows the raw counts divided by the number expected due to sequencing errors. -A defined peak may be related to adapter length.

              -

              See the cutadapt documentation -for more information on how these numbers are generated.

              -
              - -

              Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).

              - - -
              - -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              ASSEMBLY: QUAST (SPAdes)

              -

              ASSEMBLY: QUAST (SPAdes) This section of the report shows QUAST results from SPAdes de novo assembly.DOI: 10.1093/bioinformatics/btt086.

              - - - - -
              - -

              - Assembly Statistics - -

              - - - - -
              - - - - - - - - - Showing 3/3 rows and 8/8 columns. - -
              -
              - -
              Sample NameN50 (Kbp)L50 (K)Largest contig (Kbp)Length (Mbp)MisassembliesMismatches/100kbpIndels/100kbpGenome Fraction
              SAMPLE1_PE
              21.0Kbp
              0.0K
              21.0Kbp
              0.0Mbp
              0.0
              20.27
              0.00
              99.0%
              SAMPLE2_PE
              4.1Kbp
              0.0K
              10.4Kbp
              0.0Mbp
              0.0
              38.24
              3.48
              96.0%
              SAMPLE3_SE
              8.8Kbp
              0.0K
              16.9Kbp
              0.0Mbp
              0.0
              10.09
              0.00
              99.0%
              - -
              - - -
              -
              - - -
              - - - - - -
              - -

              - Number of Contigs - -

              - -

              This plot shows the number of contigs found for each assembly, broken - down by length.

              - - -
              - - -
              -
              loading..
              -
              - - - -
              - - - -
              -
              - - - -
              - - -
              -

              nf-core/viralrecon Software Versions

              -

              are collected at run time from the software output.

              - - - - -
              - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
              Process Name Software Version
              ASCIIGENOMEasciigenome1.16.0
              bedtools2.30.0
              BANDAGE_IMAGEbandage0.8.1
              BCFTOOLS_CONSENSUSbcftools1.16
              BCFTOOLS_FILTERbcftools1.16
              BCFTOOLS_QUERYbcftools1.16
              BCFTOOLS_SORTbcftools1.16
              BCFTOOLS_STATSbcftools1.16
              BEDTOOLS_GETFASTAbedtools2.30.0
              BEDTOOLS_MASKFASTAbedtools2.30.0
              BEDTOOLS_MERGEbedtools2.30.0
              BLAST_BLASTNblast2.13.0+
              BLAST_MAKEBLASTDBblast2.13.0+
              BOWTIE2_ALIGNbowtie22.4.4
              pigz2.6
              samtools1.16.1
              BOWTIE2_BUILDbowtie22.4.4
              CAT_FASTQcat8.30
              COLLAPSE_PRIMERSpython3.9.5
              CUSTOM_DUMPSOFTWAREVERSIONSpython3.11.0
              yaml6.0
              CUSTOM_GETCHROMSIZESgetchromsizes1.16.1
              CUTADAPTcutadapt4.2
              FASTPfastp0.23.2
              FASTQCfastqc0.11.9
              FASTQC_RAWfastqc0.11.9
              FASTQC_TRIMfastqc0.11.9
              FILTER_BLASTNsed4.7
              GUNZIP_GFFgunzip1.10
              GUNZIP_SCAFFOLDSgunzip1.10
              IVAR_TRIMivar1.4
              IVAR_VARIANTSivar1.4
              IVAR_VARIANTS_TO_VCFpython3.9.12
              KRAKEN2_KRAKEN2kraken22.1.2
              pigz2.6
              MAKE_BED_MASKpython3.9.5
              samtools1.14
              MAKE_VARIANTS_LONG_TABLEpython3.9.9
              MOSDEPTH_AMPLICONmosdepth0.3.3
              MOSDEPTH_GENOMEmosdepth0.3.3
              NEXTCLADE_RUNnextclade2.12.0
              PICARD_COLLECTMULTIPLEMETRICSpicard3.0.0
              PLOT_MOSDEPTH_REGIONS_AMPLICONr-base4.0.3
              PLOT_MOSDEPTH_REGIONS_GENOMEr-base4.0.3
              QUASTquast5.2.0
              RENAME_FASTA_HEADERsed4.7
              SAMPLESHEET_CHECKpython3.9.5
              SAMTOOLS_FLAGSTATsamtools1.16.1
              SAMTOOLS_IDXSTATSsamtools1.16.1
              SAMTOOLS_INDEXsamtools1.16.1
              SAMTOOLS_SORTsamtools1.16.1
              SAMTOOLS_STATSsamtools1.16.1
              SNPEFF_ANNsnpeff5.0e
              SNPEFF_BUILDsnpeff5.0e
              SNPSIFT_EXTRACTFIELDSsnpsift4.3
              SPADESspades3.15.5
              TABIX_BGZIPtabix1.12
              TABIX_TABIXtabix1.12
              UNTAR_KRAKEN2_DBuntar1.30
              UNTAR_NEXTCLADE_DBuntar1.30
              WorkflowNextflow23.10.0
              nf-core/viralrecon2.6.0
              - - -
              - - -
              - - -
              -
              - - - -
              - - -
              -

              nf-core/viralrecon Workflow Summary

              -

              - this information is collected when the pipeline is started.

              - - - - -
              - - - - - -

              Core Nextflow options

              -
              -
              revision
              master
              -
              runName
              evil_babbage
              -
              containerEngine
              docker
              -
              launchDir
              /Users/vlad/git/viralrecon
              -
              workDir
              /Users/vlad/git/viralrecon/work
              -
              projectDir
              /Users/vlad/.nextflow/assets/nf-core/viralrecon
              -
              userName
              vlad
              -
              profile
              test,docker
              -
              configFiles
              /Users/vlad/.nextflow/assets/nf-core/viralrecon/nextflow.config, /Users/vlad/git/viralrecon/nextflow.config
              -
              -

              Input/output options

              -
              -
              input
              https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/v2.6/samplesheet_test_amplicon_illumina.csv
              -
              platform
              illumina
              -
              protocol
              amplicon
              -
              outdir
              /Users/vlad/tmp/nextflow
              -
              -

              Reference genome options

              -
              -
              genome
              MN908947.3
              -
              fasta
              https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V1/nCoV-2019.reference.fasta
              -
              gff
              https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/GCA_009858895.3_ASM985889v3_genomic.200409.gff.gz
              -
              primer_bed
              https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V1/nCoV-2019.primer.bed
              -
              primer_set
              artic
              -
              primer_set_version
              1
              -
              -

              Nanopore/Illumina options

              -
              -
              nextclade_dataset
              https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/nextclade_sars-cov-2_MN908947_2022-06-14T12_00_00Z.tar.gz
              -
              nextclade_dataset_name
              sars-cov-2
              -
              nextclade_dataset_reference
              MN908947
              -
              nextclade_dataset_tag
              2022-06-14T12:00:00Z
              -
              -

              Illumina QC, read trimming and filtering options

              -
              -
              kraken2_db
              https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/genome/kraken2/kraken2_hs22.tar.gz
              -
              -

              Illumina variant calling options

              -
              -
              variant_caller
              ivar
              -
              -

              Illumina de novo assembly options

              -
              -
              skip_abacas
              true
              -
              -

              Max job request options

              -
              -
              max_cpus
              2
              -
              max_memory
              6.GB
              -
              max_time
              6.h
              -
              -

              Institutional config options

              -
              -
              config_profile_name
              Test profile
              -
              config_profile_description
              Minimal test dataset to check pipeline function
              -
              - - - -
              - - -
              - - -
              - - - - -
              - - - - - - - - - - - - - - - - diff --git a/public/examples/viralrecon/multiqc_report.zip b/public/examples/viralrecon/multiqc_report.zip deleted file mode 100644 index d0adb02..0000000 Binary files a/public/examples/viralrecon/multiqc_report.zip and /dev/null differ diff --git a/src/content/example-reports/jupyter.md b/src/content/example-reports/jupyter.md new file mode 100644 index 0000000..07782ac --- /dev/null +++ b/src/content/example-reports/jupyter.md @@ -0,0 +1,10 @@ +--- +title: Jupyter Notebook +description: Example of interactive MultiQC usage in a Jupyter notebook. +type: Analysis Types +embed: /examples/jupyter/notebook.html +zip: /examples/jupyter/multiqc_report.zip +data: /examples/jupyter/data.zip +--- + +Show how data can be summarized interactively without use of intermediate custom content, on the example of the [nf-core/viralrecon](https://github.com/nf-core/viralrecon) workflow. The ipynb file for the notebook is included into the "Download full report output" bundle. diff --git a/src/search_patterns.yaml b/src/search_patterns.yaml index 0045500..f23b87f 120000 --- a/src/search_patterns.yaml +++ b/src/search_patterns.yaml @@ -1 +1 @@ -../../MultiQC/multiqc/utils/search_patterns.yaml \ No newline at end of file +../../MultiQC/multiqc/search_patterns.yaml \ No newline at end of file diff --git a/update_examples.sh b/update_examples.sh index 19efbbc..ae2c27a 100644 --- a/update_examples.sh +++ b/update_examples.sh @@ -8,10 +8,10 @@ for i in "public/examples/${dirs[@]}"; do echo "--------------------------------------------------" cd $i rm -rf multiqc_report.html multiqc_report.zip multiqc_data - unzip data.zip + unzip -q data.zip multiqc . --disable-ngi -t default - zip -r multiqc_report.zip multiqc_report.html multiqc_data - rm -r data/ multiqc_data/ __MACOSX/ + zip -q -r multiqc_report.zip multiqc_report.html multiqc_data + rm -r data/ multiqc_data/ cd ../ done @@ -20,12 +20,33 @@ echo "Creating report for ngi-rna" echo "--------------------------------------------------" cd ngi-rna rm -rf *multiqc_report.html multiqc_report.zip *multiqc_data -unzip data.zip +unzip -q data.zip multiqc . --test-db ngi_db_data.json # plugin changed the name of the report, don't want to break links mv test_ngi_project_pipeline_multiqc_report.html test_ngi_project_multiqc_report.html -zip -r multiqc_report.zip *multiqc_report.html *multiqc_data -rm -r data/ test_ngi_project_multiqc_data/ __MACOSX/ +zip -q -r multiqc_report.zip *multiqc_report.html *multiqc_data +rm -r data/ test_ngi_project_multiqc_data/ +cd ../ + +echo "--------------------------------------------------" +echo "Creating Jupyter example" +echo "--------------------------------------------------" +cd jupyter +rm -rf multiqc_report.zip multiqc_report.html multiqc_report_data +# Get the notebook from a separate repo +wget https://github.com/MultiQC/example-notebook/raw/master/multiqc_example.ipynb -O notebook.ipynb +# Hack the notebook a bit: +# 1. We don't need to re-install MultiQC as it's in our environment +sed -i '' 's/\%pip install/\# \%pip install/g' notebook.ipynb # remove the pip install command +sed -i '' 's/\%reset/\# \%reset/g' notebook.ipynb # remove the kernel restart command +# 2. GitHub doesn't render interactive plots, but the website does +sed -i '' 's/, flat=True//g' notebook.ipynb # remove the flat=True parameter +sed -i '' '/flat=True/d' notebook.ipynb # remove the explanation about flat=True +unzip -q data.zip +jupyter execute notebook.ipynb --inplace # Run the notebook +jupyter nbconvert --to html notebook.ipynb # Convert it to HTML +zip -q -r multiqc_report.zip notebook.ipynb multiqc_report.html multiqc_report_data +rm -r data/ multiqc_report_data/ notebook.ipynb cd ../ echo "--------------------------------------------------"