# Genome assembly and quality control of assembly

## Required tools

After the previous steps of quality control (QC), we have reads still in raw_format but now we have a summary of their sequencing quality. Furthermore, we have removed regions with poor quality of sequencing (where we cannot be sure if the assigned nucleotides are right) and we removed the adaptor sequences that are added to our DNA for sequencing. 

In this series of steps, we will assemble the reads using a tool called `shovill`. Once again, we will mimic how to run the commands in the **Compute Canada (CC)** cluster of analysis. 

For these tutorials, tools will be made available using singularity containers, which can be run using the command `singularity run tool_image`. These tools have been made available in the environment already, so there is no need to download them.

Tools used in this tutorial:
- shovill
- singularity

We will first explore the structure of our environment and the folders available. Tools downloaded for the tutorial are in the `tools` folder and in the `tutorials` directory are the primary datasets as well as the results of our runs. 

### **Note:** After opening this notebook, make sure that you have the **bash** kernel selected. 

In [1]:
cd
tree -dL 2 tutorials

# structure ofsoftware directory
tree -dL 2 /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto

tutorials
|-- results
|   |-- annotation
|   |-- assembly_checkm
|   |-- assembly_quast
|   |-- contigs
|   |-- reads_qc
|   `-- snippy
`-- trimmed_reads

8 directories
/mnt/cidgoh-object-storage/seagull/jupyter-mdprieto
|-- baktadb-light
|   `-- amrfinderplus-db
|-- raw_reads
`-- reference_data

4 directories


For downstream assembly, we will use the curated reads contained in the `trimmed_reads` subdirectory (n=20 files containing paired reads for 10 isolates of _P.aeruginosa_). 

In [2]:
# source PATH to use module function
source /cvmfs/soft.computecanada.ca/config/profile/bash.sh

In [3]:
cd
ls $HOME/tutorials/trimmed_reads

ERR10479510_R1.fastq.gz  ERR10479513_R2.fastq.gz  ERR10479517_R1.fastq.gz
ERR10479510_R2.fastq.gz  ERR10479514_R1.fastq.gz  ERR10479517_R2.fastq.gz
ERR10479511_R1.fastq.gz  ERR10479514_R2.fastq.gz  ERR10479518_R1.fastq.gz
ERR10479511_R2.fastq.gz  ERR10479515_R1.fastq.gz  ERR10479518_R2.fastq.gz
ERR10479512_R1.fastq.gz  ERR10479515_R2.fastq.gz  ERR10479519_R1.fastq.gz
ERR10479512_R2.fastq.gz  ERR10479516_R1.fastq.gz  ERR10479519_R2.fastq.gz
ERR10479513_R1.fastq.gz  ERR10479516_R2.fastq.gz


## De-novo assembly with Shovill

Shovill is a tool that optimizes the assembler `Spades` to minimize the run time, while maintaining the quality of results. It generates a draft genome using heuristic algorithms and does not require a reference genome that guides the process. See the GitHub repositories of [shovill](https://github.com/tseemann/shovill) and [SPAdes](https://github.com/ablab/spades) for more details. 

Shovill is not available as a module pre-installed in **CC**, so we must use another strategy. The easiest one is to use a container, we can install a **Docker** container, but Docker containers are not suitable for high performance clusters like **Compute Canada** because they have inherent root user (administrator) privileges. Thus, many HPC allow use of **Singularity** images as an alternative (For more info about what is containerization you can read https://www.melbournebioinformatics.org.au/tutorials/tutorials/docker/docker/).

A useful repository of **Singularity** images is located at https://depot.galaxyproject.org/singularity/

<font color='darkred'>_**Notes for compute canada:**_ </font>  
- Singularity needs to be loaded into the system. On the CIDGOH servers, it is loaded by default. Run the following code to have singularity available in your compute canada session. 

>    module load singularity

- We have already downloaded the **Shovill** singularity container. In CC, you may need to do it, so run the following command pull it from the repository into your local directory. The command tells the system to pull a container from a repository into your local system.

>    singularity pull shovill_1.1.sif https://depot.galaxyproject.org/singularity/shovill%3A1.1.0--hdfd78af_1


In [4]:
# executing shovill

singularity exec /mnt/cidgoh-object-storage/images/shovill_1.1.sif shovill --help

SYNOPSIS
  De novo assembly pipeline for Illumina paired reads
USAGE
  shovill [options] --outdir DIR --R1 R1.fq.gz --R2 R2.fq.gz
GENERAL
  --help          This help
  --version       Print version and exit
  --check         Check dependencies are installed
INPUT
  --R1 XXX        Read 1 FASTQ (default: '')
  --R2 XXX        Read 2 FASTQ (default: '')
  --depth N       Sub-sample --R1/--R2 to this depth. Disable with --depth 0 (default: 150)
  --gsize XXX     Estimated genome size eg. 3.2M <blank=AUTODETECT> (default: '')
OUTPUT
  --outdir XXX    Output folder (default: '')
  --force         Force overwite of existing output folder (default: OFF)
  --minlen N      Minimum contig length <0=AUTO> (default: 0)
  --mincov n.nn   Minimum contig coverage <0=AUTO> (default: 2)
  --namefmt XXX   Format of contig FASTA IDs in 'printf' style (default: 'contig%05d')
  --keepfiles     Keep intermediate files (default: OFF)
RESOURCES
  --tmpdir XXX    Fast temporary directory (default: '')
  --cpus

#### What does shovill do?

1. Unifies coverage depth (how many times is a region covered by reads on average) for all genomes
2. Trims adapters and poor quality reads if necessary
3. Assembles using SPAdes
4. Polishes genomes (improves quality) and filters low quality contigs


In [5]:
# create output directory
CONTIGS_DIR="$HOME/tutorials/results/contigs"

# define PATH to trimmed_reads
TRIMMED_READS="$HOME/tutorials/trimmed_reads"

To execute shovill, we run the command from the singularity container we just downloaded. Genome assembly is the most resource intensive process in the pipeline, so it will probably take a while to run. As input, we will use or `trimmed_reads` files and to optimize your run time, we will assemble only two isolates. The remaining ones are already available in the `tutorials/results` folder

<font color='darkred'>_**Notes for compute canada:**_ </font>  
- Allocate sufficiente memory as the size of every genome must be kept in storage while it is assembled
- Bioinformatic procedures usually use multiple threads to optimize performance, so their efficiency increases with the number of available cores (including **SPAdes**). 
- In shovill, the `--ram` option specifies the available ram per thread (core)
    - Spades will take input of RAM from shovill as total available mem, better to input limit manually with `--opts "-m XX"`

In [6]:
# for loop to run a command for each sample

for READ1 in $(ls $TRIMMED_READS/*R1.fastq.gz | head -n 2)
do

    
    READ2=${READ1/_R1/_R2}                                                              # substitute R1 for R2 in variable
    PREFIX_ISOLATE=$(basename $READ1 _R1.fastq.gz)                                      # create file with isolate name
    echo "Started processing $PREFIX_ISOLATE"
    
    singularity exec /mnt/cidgoh-object-storage/images/shovill_1.1.sif shovill  \
        --R1 $READ1                                                                     `# specify paired read 1` \
        --R2 $READ2                                                                     `# specify paired read 2` \
        --outdir $CONTIGS_DIR                                                           `# define output directory` \
        --force                                                                         `# overwrite results if already available` \
        --ram 140                                                                       `# how much ram memory to use`
    
    mv "$CONTIGS_DIR/contigs.fa" $CONTIGS_DIR/${PREFIX_ISOLATE}_contigs.fa
    
    echo "Finished assembly of sample $PREFIX_ISOLATE"
    
done

Started processing ERR10479510
[shovill] Hello jupyter-mdprieto
[shovill] You ran: /usr/local/bin/shovill --R1 /home/jupyter-mdprieto/tutorials/trimmed_reads/ERR10479510_R1.fastq.gz --R2 /home/jupyter-mdprieto/tutorials/trimmed_reads/ERR10479510_R2.fastq.gz --outdir /home/jupyter-mdprieto/tutorials/results/contigs --force --ram 140
[shovill] This is shovill 1.1.0
[shovill] Written by Torsten Seemann
[shovill] Homepage is https://github.com/tseemann/shovill
[shovill] Operating system is linux
[shovill] Perl version is v5.26.2
[shovill] Machine has 16 CPU cores and 176.90 GB RAM
[shovill] Using bwa - /usr/local/bin/bwa | Version: 0.7.17-r1188
[shovill] Using flash - /usr/local/bin/flash | FLASH v1.2.11
[shovill] Using java - /usr/local/bin/java | openjdk version "11.0.1" 2018-10-16 LTS
[shovill] Using kmc - /usr/local/bin/kmc | K-Mer Counter (KMC) ver. 3.1.1 (2019-05-19)
[shovill] Using lighter - /usr/local/bin/lighter | Lighter v1.1.2
[shovill] Using megahit - /usr/local/bin/megahit | M

In this tutorial, we processed only two samples to optimize the runtime. The remaining assemblies can be found in the same directory for future steps. In this tutorial, we processed only two samples to optimize the runtime. The remaining assemblies can be found in the same directory for future steps. 

The main output of the **Shovill** pipeline are the files ending in `contigs.fa` which contain assembled reads in fasta format. We can see that this format contains a header for every contig and then the reads.  

In [7]:
head "$CONTIGS_DIR/ERR10479510_contigs.fa"

>contig00001 len=531597 cov=40.5 corr=0 origname=NODE_1_length_531597_cov_40.531919_pilon sw=shovill-spades/1.1.0 date=20230301
CGGCGGCAGTTGGCGAAAGAAATCCCGCACCTGTGCCCGCTTGAGTTGGCGACGACATAC
CACATGCTCGTGACGATCAACGCCGTGCACCTGGAATACTTGTTTTGCCAGATCCAGACC
AATGCGACTAAGGTTCATGCTGACTCCCCCTCCGGGACTTGTGGCTGCACCATTAGTCTG
GCGCTTGACGCCGTAGGAGGGAGGAGTCCATTTCATTGCCCTACCCCAGCTCTCCATCGC
CGCCAATCTCCCGCATATCCCCGGAGTCCGCCATGTCCTCACCCCAACCGCCCCGCTTCG
ACGGCCAACGCTGGAGCAACGCCGACGACGACCGCATCGAGGTGCTGCCTGCCGACCCCG
CCTGGCCACAACACTTCGCCGCCGAAGCCGAGGCCATCCGCACGGCGCTGGCGCTGCCCG
GGCTGGGCATCGAGCATGTCGGCAGCACCGCGGTGCCCGGGCTCGACGCCAAGCCGATCA
TCGACATCCTCCTGCTGCCGCCGCCCGGCCACGATCCGCAGCGGCTGGTAGCCCCGCTGG


As a refresher, this is the current structure of directories for our tutorial project

In [8]:
cd ~
tree -dL 2 tutorials

tutorials
|-- results
|   |-- annotation
|   |-- assembly_checkm
|   |-- assembly_quast
|   |-- contigs
|   |-- reads_qc
|   `-- snippy
`-- trimmed_reads

8 directories


# Quality control of draft genomes

**Shovill** produces contigs (overlapping consensus regions of DNA) for every isolate. However, we may have contaminated cultures growing other bacteria besides our organism of interest. Also, given the non targeted approach used for sequencing, the reads from an isolate may have poor quality (low realibility in base calling or poor coverage of certain regions).

Thus, after producing draft genomes, we typically conduct additional checks to verify the quality of the resulting files and make sure that we do not have contamination in our samples. 

### General expectation

- The average size of contigs is over 5000 basepairs and that we have less than a 1000 contigs in our draft genome. 
- The resulting assembly should have a coverage of at least 90% of the reference genome

## Quast

Quast [(github:quast)](https://github.com/ablab/quast) produces quantitative summaries of the contigs in every assembly. It may also use a reference genome to evaluate misassemblies, unaligned contigs, and readcoverage against the reference genome. 

**_Some metrics include:_** 
- Number of contigs and number of contigs > 500bp
- **N50** or the length at which the collection of all contigs of at least that length covers half of the assembly 
- **NG50** is similar to **N50** but measures the coverage of the reference genome
- Number of misassemblies including inversions, relocations, and translocations
- Number and total length of unaligned contigs (against the reference genome)

As the data we are analyzing in the tutorial comes from _P. aeruginosa_ isolates ([PMID:34412676](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8376114/) - [BioProject:PRJEB56397](https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB56397)), we will use the [PAO1 reference strain](https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000006765.1/) for quality control. 

Reference genomes contain two files, a fasta sequence file (.fna or .fa) and an annotation file (.gff). We have them stored in `~/tutorials/tools/reference_data/`

In [9]:
# load quast into our environment
module load StdEnv/2020 gcc/9.3.0 quast/5.0.2

# reference genomes are found in the tools directory
ls /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF*


Lmod is automatically replacing "intel/2020.1.217" with "gcc/9.3.0".


The following have been reloaded with a version change:
  1) StdEnv/2016.4 => StdEnv/2020           4) mii/1.1.1 => mii/1.1.2
  2) gcccore/.5.4.0 => gcccore/.9.3.0       5) openmpi/2.1.1 => openmpi/4.0.3
  3) imkl/11.3.4.258 => imkl/2020.1.217

/mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF_000006765.1_ASM676v1_genomic.fna
/mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF_000006765.1_ASM676v1_genomic.fna.gz
/mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF_000006765.1_ASM676v1_genomic.gff.gz


In [10]:
# create ENV variable to input and output directory
CONTIGS_DIR="$HOME/tutorials/results/contigs"
RESULTS_QUAST="/$HOME/tutorials/results/assembly_quast"

The main command `quast.py` produces several reports, with formats such as `.pdf, .html, and .csv`, containing the previously mentioned metrics. 

They can be opened directly in Jupyter by clicking the file on the explorer or exported to your local computer for further visualization. 

In [11]:
quast.py $CONTIGS_DIR/*contigs.fa                                                                                       `# pattern for contig files produced by shovill` \
    -r /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF_000006765.1_ASM676v1_genomic.fna.gz       `# reference genome` \
    -g /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF_000006765.1_ASM676v1_genomic.gff.gz                                     `# reference genomic features positions` \
    -o $RESULTS_QUAST                                                                                                  `# output directory` \
    --threads 12

/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Compiler/gcc9/quast/5.0.2/bin/quast.py /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479510_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479511_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479512_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479513_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479514_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479515_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479516_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479517_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479518_contigs.fa /home/jupyter-mdprieto/tutorials/results/contigs/ERR10479519_contigs.fa -r /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/GCF_000006765.1_ASM676v1_genomic.fna.gz -g /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/G

We can preview metrics such as the N50 of the assemblies and the coverage of the reference genome using the commands below. 

In [12]:
# list contents of folder
ls $RESULTS_QUAST

# overview of N50 and coverage
echo
cat $RESULTS_QUAST/report.tsv                      `# print the file` | \
cut -f 1-5                                         `# cut the columns 1 to 5 (separated by tab)` | \
grep -E 'Assembly|N50|fraction'                    `# select lines matching the pattern Assembly OR N50 OR fraction` | \
column -ts $'\t'                                   `# print table separating at tabs ($'\t')`

[0m[01;34maligned_stats[0m    icarus.html            report.html  transposed_report.tex
[01;34mbasic_stats[0m      [01;34micarus_viewers[0m         report.tex   transposed_report.tsv
[01;34mcontigs_reports[0m  quast.log              report.tsv   transposed_report.txt
[01;34mgenome_stats[0m     [01;34mquast_corrected_input[0m  report.txt

Assembly             ERR10479510__contigs  ERR10479510_contigs  ERR10479511__contigs  ERR10479511_contigs
N50                  221997                221997               184817                184817
Genome fraction (%)  95.402                95.402               95.397                95.397


## CheckM


**CheckM** infers the quality of the genome assembly based on the presence and uniqueness of sets of gene markers that are specific to species/taxa, and determines the completeness (coverage of reference genome) and the contamination of the input draft genomes.

**CheckM** is not available in the CC cluster. So, we use a singularity container with the latest version.


In [13]:
# create ENV variable to input/output directory and singularity container for checkm
CONTIGS_DIR="$HOME/tutorials/results/contigs/"
RESULTS_CHECKM="$HOME/tutorials/results/assembly_checkm/"

The first step is to create a dataset with specific genomic markers for a species, taxon or genus using 

`checkm taxon_set <species/genus/taxon> <taxon_name> <marker_file>`

Also, as some of the data necessary for the tutorials were saved in a shared directory so everyone can access it, singularity may not be able to recognize that drive by default. 
Thus, we run `export SINGULARITY_BIND="/mnt,/etc"` so singularity can look for information on these additional drives

In [17]:
# add drives to singularity
export SINGULARITY_BIND="/mnt,/etc"

# command
singularity exec /mnt/cidgoh-object-storage/images/checkm_1.2.2.sif checkm \
    taxon_set species 'Pseudomonas aeruginosa' /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/pseudomonas.ms

[2023-05-09 20:29:10] INFO: CheckM v1.2.2
[2023-05-09 20:29:10] INFO: checkm taxon_set species Pseudomonas aeruginosa /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/pseudomonas.ms
[2023-05-09 20:29:10] INFO: CheckM data: /usr/local/checkm_data
[2023-05-09 20:29:10] INFO: [CheckM - taxon_set] Generate taxonomic-specific marker set.
[2023-05-09 20:29:15] INFO: Marker set for Pseudomonas aeruginosa contains 1617 marker genes arranged in 469 sets.
[2023-05-09 20:29:15] INFO: Marker set inferred from 19 reference genomes.
[2023-05-09 20:29:15] INFO: Marker set for Pseudomonas contains 833 marker genes arranged in 312 sets.
[2023-05-09 20:29:15] INFO: Marker set inferred from 182 reference genomes.
[2023-05-09 20:29:15] INFO: Marker set for Pseudomonadaceae contains 800 marker genes arranged in 302 sets.
[2023-05-09 20:29:15] INFO: Marker set inferred from 186 reference genomes.
[2023-05-09 20:29:15] INFO: Marker set for Pseudomonadales contains 549 marker genes arranged 

With the reference markers file created in our tools directory, we perform two additional steps:
1. Using `checkm analysis` we identify what marker sets that are specific to a taxon of interest are included in every assembly. The process for the samples used in the tutorial (n = 10) should take around 10-15 min. 

In [20]:
# analyze presence of markers (4 min aprox.)
singularity exec /mnt/cidgoh-object-storage/images/checkm_1.2.2.sif checkm analyze \
    /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/pseudomonas.ms           `#file with checkm marker set for assemblies` \
    $CONTIGS_DIR                                                                                `#dir with assemblies in fasta format` \
    $RESULTS_CHECKM                                                                             `#output directory` \
    -x fa                                                                                       `#extension of assemblies` \
    -t 10 


[2023-05-09 20:30:15] INFO: CheckM v1.2.2
[2023-05-09 20:30:15] INFO: checkm analyze /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/pseudomonas.ms /home/jupyter-mdprieto/tutorials/results/contigs/ /home/jupyter-mdprieto/tutorials/results/assembly_checkm/ -x fa -t 10
[2023-05-09 20:30:15] INFO: CheckM data: /usr/local/checkm_data
[2023-05-09 20:30:15] INFO: [CheckM - analyze] Identifying marker genes in bins.
[2023-05-09 20:30:16] INFO: Identifying marker genes in 10 bins with 10 threads:
    Finished processing 10 of 10 (100.00%) bins.
[2023-05-09 20:34:17] INFO: Saving HMM info to file.
[2023-05-09 20:34:18] INFO: { Current stage: 0:04:02.851 || Total: 0:04:02.851 }
[2023-05-09 20:34:18] INFO: Parsing HMM hits to marker genes:
    Finished parsing hits for 10 of 10 (100.00%) bins.
[2023-05-09 20:34:25] INFO: Aligning marker genes with multiple hits in a single bin:
    Finished processing 10 of 10 (100.00%) bins.
[2023-05-09 20:34:26] INFO: { Current stage: 0:00:08

2. Then, with `checkm qa` we produce a summary of contamination. This step should not take long

In [21]:
# produce table of contaminations
singularity exec /mnt/cidgoh-object-storage/images/checkm_1.2.2.sif checkm qa                                                                 \
        /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/pseudomonas.ms       `#file with checkm marker set for assemblies` \
        $RESULTS_CHECKM                                                                        `#output directory` \
        --file $RESULTS_CHECKM/checkm_output.tsv                                                                    \
        --tab_table                                                                             `# print tabular output` \
        --threads 10                                                                            `# number of simultaneous threads for process` \
        --out_format 1                                                                          `# format of output 1 = summary, 2 = extended`

[2023-05-09 20:34:39] INFO: CheckM v1.2.2
[2023-05-09 20:34:39] INFO: checkm qa /mnt/cidgoh-object-storage/seagull/jupyter-mdprieto/reference_data/pseudomonas.ms /home/jupyter-mdprieto/tutorials/results/assembly_checkm/ --file /home/jupyter-mdprieto/tutorials/results/assembly_checkm//checkm_output.tsv --tab_table --threads 10 --out_format 1
[2023-05-09 20:34:39] INFO: CheckM data: /usr/local/checkm_data
[2023-05-09 20:34:39] INFO: [CheckM - qa] Tabulating genome statistics.
[2023-05-09 20:34:39] INFO: Calculating AAI between multi-copy marker genes.
[2023-05-09 20:34:39] INFO: Reading HMM info from file.
[2023-05-09 20:34:39] INFO: Parsing HMM hits to marker genes:
    Finished parsing hits for 10 of 10 (100.00%) bins.
[2023-05-09 20:34:46] INFO: QA information written to: /home/jupyter-mdprieto/tutorials/results/assembly_checkm//checkm_output.tsv
[2023-05-09 20:34:46] INFO: { Current stage: 0:00:07.534 || Total: 0:00:07.534 }


Now, we can review the output of results for contamination using **CheckM**. 

- *Completeness* is a measure of the coverage of gene marker sets spected for a species in a given contig. 
- *Contamination* shows the presence of multi-copy marker genes in the genome assembly. 
- *Strain heterogeneity* is determined by the number of multy-copy gene marker sets that have an amino acid identity >=  90%.

A high heterogeneity suggests that a majority of the contamination comes from closely related organisms. A smaller value may come from phylogenetically distinct sources

In [23]:
cat $RESULTS_CHECKM/checkm_output.tsv  

Bin Id	Marker lineage	# genomes	# markers	# marker sets	0	1	2	3	4	5+	Completeness	Contamination	Strain heterogeneity
ERR10479510_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1601	13	0	0	0	99.86	1.56	7.69
ERR10479511_contigs	Pseudomonas aeruginosa (6)	19	1617	469	4	1600	13	0	0	0	99.64	1.56	7.69
ERR10479512_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1601	13	0	0	0	99.86	1.56	7.69
ERR10479513_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1602	12	0	0	0	99.86	1.45	8.33
ERR10479514_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1601	13	0	0	0	99.86	1.56	7.69
ERR10479515_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1601	13	0	0	0	99.86	1.56	7.69
ERR10479516_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1601	13	0	0	0	99.86	1.56	7.69
ERR10479517_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1601	13	0	0	0	99.86	1.56	7.69
ERR10479518_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	1599	14	1	0	0	99.86	2.03	5.88
ERR10479519_contigs	Pseudomonas aeruginosa (6)	19	1617	469	3	160