By: Rolando Perez

Date: 01/30/21

Purpose: To improve the genome assembly of 10597-SS1 via various methods, such as hybrid de novo assembly, polishing and assembly reconciliation.

License: THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Preprocessing according to https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/data-preprocessing/

Uses Haslr, Flye, LR_GapCloser, and Spades (hybrid) for assemblies. Then, merge assemblies with JGI assembly via Quickmerge. Check quality of final merged assembly via QUAST and Busco.

In [None]:
#Download miniconda for managing packages.
!wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
!chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
!bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
!conda init bash

--2021-02-18 23:09:26--  https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8303, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85055499 (81M) [application/x-sh]
Saving to: ‘Miniconda3-py37_4.8.2-Linux-x86_64.sh’


2021-02-18 23:09:27 (75.0 MB/s) - ‘Miniconda3-py37_4.8.2-Linux-x86_64.sh’ saved [85055499/85055499]

PREFIX=/usr/local
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | done
Solving environment: - done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - asn1crypto==1.3.0=py37_0
    - ca-certificates==2020.1.1=0
    - certifi==2019.11.28=py37_0
    - cffi==1.14.0=py37h2e261b9_0
    - chardet==3.0.4=py37_1003
    - conda-package-handling==1.6.0=py37h7b6447c_0
   

In [None]:
#Mount Google Drive to access nanopore reads and other project specific data.
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#Install appropriate packages
!conda config --add channels defaults
!conda config --add channels bioconda
!conda config --add channels conda-forge
!conda install haslr flye fastp quickmerge pilon samtools bwa-mem2 samtools quast bbmap
!conda install -c bioconda -c conda-forge busco=5.0.0

Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done
Solving environment: - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | /

In [None]:
#Install Spades and Long Read Gapcloser
#http://cab.spbu.ru/files/release3.14.1/manual.html
!wget http://cab.spbu.ru/files/release3.15.0/SPAdes-3.15.0-Linux.tar.gz
!tar -xzf SPAdes-3.15.0-Linux.tar.gz
#!cd SPAdes-3.15.0-Linux/bin/
#https://github.com/CAFS-bioinformatics/LR_Gapcloser
!git clone https://github.com/CAFS-bioinformatics/LR_Gapcloser.git

--2021-02-16 19:58:45--  http://cab.spbu.ru/files/release3.15.0/SPAdes-3.15.0-Linux.tar.gz
Resolving cab.spbu.ru (cab.spbu.ru)... 195.70.219.98
Connecting to cab.spbu.ru (cab.spbu.ru)|195.70.219.98|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30408166 (29M) [application/octet-stream]
Saving to: ‘SPAdes-3.15.0-Linux.tar.gz’


2021-02-16 19:58:48 (12.6 MB/s) - ‘SPAdes-3.15.0-Linux.tar.gz’ saved [30408166/30408166]

Cloning into 'LR_Gapcloser'...
remote: Enumerating objects: 76, done.[K
remote: Total 76 (delta 0), reused 0 (delta 0), pack-reused 76[K
Unpacking objects: 100% (76/76), done.


In [None]:
#Export conda environment 
!conda env export --name root > environment.yml

In [None]:
#Obtain Illumina reads for G. lucidum 10597-SS1 from SRA
!prefetch SRR067807 SRR3927439 SRR3927440

In [None]:
#Extract files to fastq.gz format
!parallel-fastq-dump --sra-id /content/drive/MyDrive/genomictest/SRR3927439.1 /content/drive/MyDrive/genomictest/SRR3927440.1 /content/drive/MyDrive/genomictest/SRR067807.1 --threads 4 --outdir /content/ --gzip

SRR ids: ['/content/drive/MyDrive/genomictest/SRR3927440.1', '/content/drive/MyDrive/genomictest/SRR067807.1']
extra args: ['--gzip']
tempdir: /tmp/pfd_pjb1unuy
/content/drive/MyDrive/genomictest/SRR3927440.1 spots: 33811998
blocks: [[1, 8452999], [8453000, 16905998], [16905999, 25358997], [25358998, 33811998]]
Read 8452999 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Written 8452999 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Read 8453001 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Written 8453001 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Read 8452999 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Written 8452999 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Read 8452999 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
Written 8452999 spots for /content/drive/MyDrive/genomictest/SRR3927440.1
tempdir: /tmp/pfd_f396jtz2
/content/drive/MyDrive/genomictest/SRR067807.1 spots: 51719105
blocks: [[1, 1292

In [None]:
#Concatenate short-reads files
!cat /content/drive/MyDrive/genomictest/SRR3927440.1.fastq.gz /content/drive/MyDrive/genomictest/SRR3927439.fastq.gz /content/drive/MyDrive/genomictest/SRR067807.1.fastq.gz > illumina.fastq.gz 


^C


In [None]:
#Pre-process short-reads to remove sequencing adapters.
!bbduk.sh in=/content/illumina.fastq.gz out=illumina_clean.fastq.gz ref=/usr/local/pkgs/bbmap-38.90-h1296035_0/opt/bbmap-38.90-0/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=15

java -ea -Xmx3696m -Xms3696m -cp /usr/local/opt/bbmap-38.90-0/current/ jgi.BBDuk in=/content/illumina.fastq.gz out=illumina_clean.fastq.gz ref=/usr/local/pkgs/bbmap-38.90-h1296035_0/opt/bbmap-38.90-0/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=r trimq=15
Executing jgi.BBDuk [in=/content/illumina.fastq.gz, out=illumina_clean.fastq.gz, ref=/usr/local/pkgs/bbmap-38.90-h1296035_0/opt/bbmap-38.90-0/resources/adapters.fa, ktrim=r, k=23, mink=11, hdist=1, tpe, tbo, qtrim=r, trimq=15]
Version 38.90

maskMiddle was disabled because useShortKmers=true
0.047 seconds.
Initial:
Memory: max=3875m, total=3875m, free=3846m, used=29m

Added 217135 kmers; time: 	0.213 seconds.
Memory: max=3875m, total=3875m, free=3841m, used=34m

Input is being processed as unpaired
Started output streams:	0.068 seconds.
Processing time:   		1647.263 seconds.

Input:                  	118993759 reads 		18087051368 bases.
QTrimmed:               	29472539 reads (24.77%) 	1107050494 bases (6.12%)
KTri

In [None]:
#Filter short-reads to remove reads aligining to sequencing spike in control phiX174
!bbduk.sh in=/content/illumina_clean.fastq.gz out=unmatched.fq outm=matched.fq ref=/usr/local/pkgs/bbmap-38.90-h1296035_0/opt/bbmap-38.90-0/resources/phix174_ill.ref.fa.gz k=31 hdist=1 stats=stats.txt

java -ea -Xmx3711m -Xms3711m -cp /usr/local/opt/bbmap-38.90-0/current/ jgi.BBDuk in=/content/illumina_clean.fastq.gz out=unmatched.fq outm=matched.fq ref=/usr/local/pkgs/bbmap-38.90-h1296035_0/opt/bbmap-38.90-0/resources/phix174_ill.ref.fa.gz k=31 hdist=1 stats=stats.txt
Executing jgi.BBDuk [in=/content/illumina_clean.fastq.gz, out=unmatched.fq, outm=matched.fq, ref=/usr/local/pkgs/bbmap-38.90-h1296035_0/opt/bbmap-38.90-0/resources/phix174_ill.ref.fa.gz, k=31, hdist=1, stats=stats.txt]
Version 38.90

0.034 seconds.
Initial:
Memory: max=3892m, total=3892m, free=3863m, used=29m

Added 487396 kmers; time: 	0.206 seconds.
Memory: max=3892m, total=3892m, free=3859m, used=33m

Input is being processed as unpaired
Started output streams:	0.035 seconds.
Processing time:   		956.147 seconds.

Input:                  	118595165 reads 		16918153166 bases.
Contaminants:           	0 reads (0.00%) 	0 bases (0.00%)
Total Removed:          	0 reads (0.00%) 	0 bases (0.00%)
Result:                 	11

In [None]:
#Filter short-reads to remove reads aligning to published G. lucidum G.260125-1 mtDNA genome.
!bbduk.sh in=/content/unmatched.fq out=unmatched1.fq outm=matched1.fq ref=/content/drive/MyDrive/genomictest/Gluc_mtDNA.fasta k=31 hdist=1 stats=stats.txt

java -ea -Xmx3690m -Xms3690m -cp /usr/local/opt/bbmap-38.90-0/current/ jgi.BBDuk in=/content/unmatched.fq out=unmatched1.fq outm=matched1.fq ref=/content/drive/MyDrive/genomictest/Gluc_mtDNA.fasta k=31 hdist=1 stats=stats.txt
Executing jgi.BBDuk [in=/content/unmatched.fq, out=unmatched1.fq, outm=matched1.fq, ref=/content/drive/MyDrive/genomictest/Gluc_mtDNA.fasta, k=31, hdist=1, stats=stats.txt]
Version 38.90

0.034 seconds.
Initial:
Memory: max=3869m, total=3869m, free=3840m, used=29m

Added 5478645 kmers; time: 	1.825 seconds.
Memory: max=3869m, total=3869m, free=3629m, used=240m

Input is being processed as unpaired
Started output streams:	0.018 seconds.
Processing time:   		1370.950 seconds.

Input:                  	118595165 reads 		16918153166 bases.
Contaminants:           	7725405 reads (6.51%) 	1153452658 bases (6.82%)
Total Removed:          	7725405 reads (6.51%) 	1153452658 bases (6.82%)
Result:                 	110869760 reads (93.49%) 	15764700508 bases (93.18%)

Time:  

In [None]:
#Filter raw nanopore long-reads to remove reads that map to published G. lucidum G.260125-1 mtDNA genome.
!minimap2 -ax map-ont /content/drive/MyDrive/genomictest/Gluc_mtDNA.fasta /content/drive/MyDrive/genomictest/gluc_np_rawreads.fastq.gz | samtools fastq -n -f 4 - > gluc_np_clean.fastq.gz

[M::mm_idx_gen::0.018*0.58] collected minimizers
[M::mm_idx_gen::0.021*0.76] sorted minimizers
[M::main::0.021*0.76] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.021*0.77] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.022*0.77] distinct minimizers: 11180 (98.94% are singletons); average occurrences: 1.011; average spacing: 5.365
[M::worker_pipeline::42.440*1.60] mapped 90178 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax map-ont /content/drive/MyDrive/genomictest/Gluc_mtDNA.fasta /content/drive/MyDrive/genomictest/gluc_np_rawreads.fastq.gz
[M::main] Real time: 42.450 sec; CPU: 67.948 sec; Peak RSS: 1.154 GB
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 86385 reads


In [None]:
#Use LR_Gapcloser to attempt to close gaps in MycoCosm G. lucidum 10597-SS1 unmasked assembly using pre-processed nanopore long-reads.
!chmod 755 /content/LR_Gapcloser/src/LR_Gapcloser.sh
!/content/LR_Gapcloser/src/LR_Gapcloser.sh -i /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta -l /content/drive/MyDrive/genomictest/gluc_np_clean.fasta -s n -t 4 -o /content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose
#!sed -n '1~4s/^@/>/p;2~4p' /content/drive/MyDrive/genomictest/gluc_np_clean.fastq.gz > gluc_np_cleanreads.fasta

-i(scaffolds)=/content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta -l(longread)=/content/drive/MyDrive/genomictest/gluc_np_clean.fasta -s(platform)=n -t(thread)=4 -c(coverage)=0.8 -a(tolerance)=0.2 -m(max_distance)=600 -n(number)=5 -g(taglen)=300 -v(overstep)=300 -o(output)=/content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose


In [None]:
#Create index of MycoCosm assembly to prepare for short-read mapping and filtering.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 index /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
[bwa_index] Pack FASTA... 0.38 sec
* Entering FMI_search
init ticks = 4115952102
ref seq len = 79045146
binary seq ticks = 3157557696
build suffix-array ticks = 105625134510
ref_seq_len = 79045146
count = 0, 17552798, 39522573, 61492348, 79045146
BWT[74012855] = 4
CP_SHIFT = 6, CP_MASK = 63
sizeof CP_OCC = 64
pos: 9880644, ref_seq_len__: 9880643
max_occ_ind = 1235080
build fm-index ticks = 12815381845
Total time taken: 55.8532


In [None]:
#gzip filtered shor-reads for faster processing with bwamem2.
!gzip /content/unmatched1.fq

In [None]:
#Map reads to MycoCosm assembly.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 mem -t 4 /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta /content/unmatched1.fq.gz > out_jgi.sam
#convert sam file to bam file.
!samtools view -b /content/out_jgi.sam > output_jgi.bam
#Sort bam file
!samtools sort -o sorted_jgi.bam /content/output_jgi.bam 
#Filter bam file for only mapped reads.
!samtools view -h -b -F 4 /content/sorted_jgi.bam > gl_jgi_mapped.bam

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
-----------------------------
Executing in AVX2 mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta
* Entering FMI_search
* Index file found. Loading index from /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta.bwt.2bit.64
* Reference seq len for bi-index = 79045147
* sentinel-index: 74012855
* Count:
0,	1
1,	17552799
2,	39522574
3,	61492349
4,	79045147

* Reading other elements of the index from files /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta
* Index prefix: /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta
* Read 0 ALT contigs
* Done reading Index!!
* Reading reference genome..
* Binary seq file = /content/drive/MyDrive/genomictes

In [None]:
#Convert mapped bam file to fast and gzip for processing with bbnorm.
!samtools fastq /content/drive/MyDrive/genomictest/gl_jgi_mapped.bam > gl_jgi_mapped.fastq
!gzip /content/gl_jgi_mapped.fastq

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 96120439 reads


In [None]:
#Normalize mapped reads.
!bbnorm.sh in=/content/gl_jgi_mapped.fastq.gz out=normalized_mapped.fq target=100 min=5

java -ea -Xmx7427m -Xms7427m -cp /usr/local/opt/bbmap-38.90-0/current/ jgi.KmerNormalize bits=32 in=/content/gl_jgi_mapped.fastq.gz out=normalized_mapped.fq target=100 min=5
Executing jgi.KmerNormalize [bits=32, in=/content/gl_jgi_mapped.fastq.gz, out=normalized_mapped.fq, target=100, min=5]


   ***********   Pass 1   **********   


Settings:
threads:          	4
k:                	31
deterministic:    	true
toss error reads: 	false
passes:           	1
bits per cell:    	16
cells:            	2807.36M
hashes:           	3
base min quality: 	5
kmer min prob:    	0.5

target depth:     	400
min depth:        	3
max depth:        	500
min good kmers:   	15
depth percentile: 	64.8
ignore dupe kmers:	true
fix spikes:       	false

Made hash table:  	hashes = 3   	 mem = 5.23 GB   	cells = 2806.12M   	used = 91.841%
For better accuracy, use the 'prefilter' flag; run on a node with more memory; quality-trim or error-correct reads; or increase the values of the minprob flag to reduce spurio

In [None]:
#Use fastp to filter mapped reads and output de-interleaved files.
!fastp -i /content/drive/MyDrive/genomictest/normalized_mapped.fq --interleaved_in -o out.R1.fq -O out.R2.fq

Read1 before filtering:
total reads: 22403940
total bases: 3111225004
Q20 bases: 3015768575(96.9319%)
Q30 bases: 2814066613(90.4488%)

Read2 before filtering:
total reads: 22403940
total bases: 3111206500
Q20 bases: 3015731853(96.9313%)
Q30 bases: 2814070340(90.4495%)

Read1 after filtering:
total reads: 22336557
total bases: 2906162286
Q20 bases: 2818278392(96.9759%)
Q30 bases: 2632298469(90.5764%)

Read2 aftering filtering:
total reads: 22336557
total bases: 2839333025
Q20 bases: 2755476613(97.0466%)
Q30 bases: 2577396673(90.7747%)

Filtering result:
reads passed filter: 44673114
reads failed due to low quality: 131724
reads failed due to too many N: 3042
reads failed due to too short: 0
reads with adapter trimmed: 8654332
bases trimmed due to adapters: 463079150

Duplication rate: 0.195059%

Insert size peak (evaluated by paired-end reads): 152

JSON report: fastp.json
HTML report: fastp.html

fastp -i /content/drive/MyDrive/genomictest/normalized_mapped.fq --interleaved_in -o out.R

In [None]:
#Use Spades to perform a hybrid de novo assembly using filtered short-reads and filtered long-reads.
#Abondoned due to insufficient computational resources needed to complete assembly while maintaining assembly quality.
!/content/SPAdes-3.15.0-Linux/bin/spades.py --pe1-1 /content/out.R1.fq --pe1-2 /content/out.R2.fq --trusted-contigs /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta --nanopore /content/drive/MyDrive/genomictest/gluc_np_clean.fastq.gz -o /content/spades_out8





Command line: /content/SPAdes-3.15.0-Linux/bin/spades.py	--pe1-1	/content/out.R1.fq	--pe1-2	/content/out.R2.fq	--trusted-contigs	/content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta	--nanopore	/content/drive/MyDrive/genomictest/gluc_np_clean.fastq.gz	-o	/content/spades_out8	

System information:
  SPAdes version: 3.15.0
  Python version: 3.7.8
  OS: Linux-4.19.112+-x86_64-with-debian-buster-sid

Output dir: /content/spades_out8
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Standard mode
  For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'.
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['/content/out.R1.fq']
      right reads: ['/content/out.R2.fq']
      interlaced reads: not specified
      single reads: not specified
      merged reads: not specified
    Lib

In [None]:
#Use Flye to assemble long-reads.
!flye --nano-raw /content/drive/MyDrive/genomictest/gluc_np_clean.fastq.gz --out-dir gluc_flye

[2021-02-16 21:36:13] INFO: Starting Flye 2.8.3-b1695
[2021-02-16 21:36:13] INFO: >>>STAGE: configure
[2021-02-16 21:36:13] INFO: Configuring run
[2021-02-16 21:36:22] INFO: Total read length: 358324061
[2021-02-16 21:36:22] INFO: Reads N50/N90: 14113 / 2248
[2021-02-16 21:36:22] INFO: Minimum overlap set to 2000
[2021-02-16 21:36:22] INFO: >>>STAGE: assembly
[2021-02-16 21:36:22] INFO: Assembling disjointigs
[2021-02-16 21:36:22] INFO: Reading sequences
tcmalloc: large alloc 8589934592 bytes == 0x55769cf3c000 @  0x7f3e3a082887 0x55769179d3ba 0x5576917687f0 0x5576917489e2 0x7f3e396c0bf7 0x557691748ba9
[2021-02-16 21:36:43] INFO: Counting k-mers:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2021-02-16 21:40:11] INFO: Filling index table (1/2)
0% 10% 20% 30% 40% 50% tcmalloc: large alloc 1744830464 bytes == 0x5578d173c000 @  0x7f3e3a082887 0x5576917aae14 0x5576917a2770 0x55769175eef5 0x7f3e3a41819d 0x7f3e39a976db 0x7f3e397c071f
60% 70% 80% 90% 100% 
[2021-02-16 21:45:15] INFO: Filling i

In [None]:
#Index long-read assembly for short-read mapping.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 index /content/gluc_flye/assembly.fasta

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
[bwa_index] Pack FASTA... 0.23 sec
* Entering FMI_search
init ticks = 3031473194
ref seq len = 77802018
binary seq ticks = 1992907910
build suffix-array ticks = 78549827040
ref_seq_len = 77802018
count = 0, 17270557, 38901009, 60531461, 77802018
BWT[23851548] = 4
CP_SHIFT = 6, CP_MASK = 63
sizeof CP_OCC = 64
pos: 9725253, ref_seq_len__: 9725252
max_occ_ind = 1215656
build fm-index ticks = 9486968257
Total time taken: 40.8000


In [None]:
#Map short-reads to Flye assembly and output sorted reads in bam format.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 mem -t 4 /content/gluc_flye/assembly.fasta /content/drive/MyDrive/genomictest/normalized_mapped.fq > out_flye.sam
!samtools view -b /content/out_flye.sam > output_flye.bam
!rm /content/out_flye.sam
!samtools sort -o sorted_flye.bam /content/output_flye.bam
!rm /content/output_flye.bam
!samtools index -b /content/sorted_flye.bam

#Map long-reads to Flye asesmbly and output sorted reads in bam format.
#Was intended for use with a hybrid short- and long-read assembly polisher. Abandoned.
!/usr/local/pkgs/minimap2-2.17-hed695b0_3/bin/minimap2 -ax map-ont /content/gluc_flye/assembly.fasta /content/drive/MyDrive/genomictest/gluc_np_clean.fastq.gz > lr_flye.sam
!samtools view -b /content/lr_flye.sam > lr_flye.bam
!rm /content/lr_flye.sam
!samtools sort -o sorted_lr_flye.bam /content/lr_flye.bam
!rm /content/lr_flye.bam
!samtools index -b /content/sorted_lr_flye.bam
!mv /content/sorted_lr_flye.bam /content/drive/MyDrive/genomictest/assemblies/second

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
-----------------------------
Executing in AVX2 mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: /content/gluc_flye/assembly.fasta
* Entering FMI_search
* Index file found. Loading index from /content/gluc_flye/assembly.fasta.bwt.2bit.64
* Reference seq len for bi-index = 77802019
* sentinel-index: 23851548
* Count:
0,	1
1,	17270558
2,	38901010
3,	60531462
4,	77802019

* Reading other elements of the index from files /content/gluc_flye/assembly.fasta
* Index prefix: /content/gluc_flye/assembly.fasta
* Read 0 ALT contigs
* Done reading Index!!
* Reading reference genome..
* Binary seq file = /content/gluc_flye/assembly.fasta.0123
* Reference genome size: 77802018 bp
* Done reading reference genome !!

------------------------------------------
1. Memory pre-allocati

In [None]:
#Polish Flye assembly using short-reads
!pilon --genome /content/gluc_flye/assembly.fasta --bam /content/sorted_flye.bam --outdir /content

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
# fix break: contig_60:55506 0 -0 +0 NoSolution
# fix break: contig_60:59023 0 -0 +0 NoSolution
# fix break: contig_60:59378-59386 0 -0 +0 NoSolution
fix break: contig_60:60034-60056 60027 -19 +20 BreakFix
fix break: contig_60:60532-60534 60466 -69 +71 BreakFix
# fix break: contig_60:61819 0 -0 +0 NoSolution
# fix break: contig_60:62185 0 -0 +0 NoSolution
fix break: contig_60:63174-63175 63096 -80 +76 BreakFix
# fix break: contig_60:63780-63804 0 -0 +0 NoSolution
# fix break: contig_60:67621-67643 0 -0 +0 NoSolution
fix break: contig_60:68647-68651 68647 -29 +33 BreakFix
fix break: contig_60:69318-69332 69318 -102 +104 BreakFix
fix break: contig_60:72045 72045 -63 +63 BreakFix
# fix break: contig_60:74895-74920 0 -0 +0 NoSolution
# fix break: contig_60:79113-79130 0 -0 +0 NoSolution
# fix break: contig_60:86949-86951 0 -0 +0 NoSolution
# fix break: contig_60:92591-92624 0 -0 +0 NoSolution TandemRepeat 107
# fix break: con

In [None]:
#Perform hybrid assembly using short- and long-reads
!haslr.py -t 4 -o /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr -g 39m -l /content/drive/MyDrive/genomictest/gluc_np_clean.fastq.gz -x nanopore -s /content/drive/MyDrive/genomictest/out.R1.fq /content/drive/MyDrive/genomictest/out.R2.fq

checking /usr/local/bin/haslr_assemble: ok
checking /usr/local/bin/minia_nooverlap: ok
checking /usr/local/bin/fastutils: ok
checking /usr/local/bin/minia: ok
checking /usr/local/bin/minimap2: ok
number of threads: 4
output directory: /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr
subsampling 25x long reads to /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/lr25x.fasta... done
assembling short reads using Minia... done
removing overlaps in short read assembly... done
removing short sequences in short read assembly... done
aligning long reads to short read assembly using minimap2... done
assembling long reads using HASLR... done


In [None]:
#Index HASLR assembly for read mapping.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 index /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
[bwa_index] Pack FASTA... 0.28 sec
* Entering FMI_search
init ticks = 3388045533
ref seq len = 69029492
binary seq ticks = 2774613460
build suffix-array ticks = 72745227863
ref_seq_len = 69029492
count = 0, 15237821, 34514746, 53791671, 69029492
BWT[38557425] = 4
CP_SHIFT = 6, CP_MASK = 63
sizeof CP_OCC = 64
pos: 8628687, ref_seq_len__: 8628686
max_occ_ind = 1078585
build fm-index ticks = 10246219153
Total time taken: 39.1525


In [None]:
#Map short-reads to HASLR assembly and output sorted reads in bam format.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 mem -t 4 /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa /content/drive/MyDrive/genomictest/normalized_mapped.fq > out_haslr.sam
!samtools view -b /content/out_haslr.sam > output_haslr.bam
!rm /content/out_haslr.sam
!samtools sort -o sorted_haslr.bam /content/output_haslr.bam
!rm /content/output_haslr.bam
!samtools index -b /content/sorted_haslr.bam

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
-----------------------------
Executing in AVX2 mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa
* Entering FMI_search
* Index file found. Loading index from /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa.bwt.2bit.64
* Reference seq len for bi-index = 69029493
* sentinel-index: 38557425
* Count:
0,	1
1,	15237822
2,	34514747
3,	53791672
4,	69029493

* Reading other elements of the index from files /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa
* Index prefix: /content/drive/MyDrive/genomictest/assemblies/second

In [None]:
#Polish hybrid assembly using short-reads.
!pilon --genome /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa --bam /content/sorted_haslr.bam --outdir /content --threads 4

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Total Reads: 70104, Coverage: 129, minDepth: 13
Confirmed 31930 of 31948 bases (99.94%)
Corrected 2 snps; 0 ambiguous bases; corrected 4 small insertions totaling 4 bases, 10 small deletions totaling 12 bases
# Attempting to fix local continuity breaks
fix break: 177:17367 17274 -94 +93 BreakFix
# fix break: 177:20575 0 -0 +0 NoSolution
fix break: 177:28590 28557 -34 +33 BreakFix
Finished processing 177:1-31948
Processing 157:1-19631
157:1-19631 log:
unpaired /content/sorted_haslr.bam: coverage 131
Total Reads: 43369, Coverage: 131, minDepth: 13
Confirmed 19564 of 19631 bases (99.66%)
Corrected 19 snps; 0 ambiguous bases; corrected 21 small insertions totaling 30 bases, 20 small deletions totaling 27 bases
# Attempting to fix local continuity breaks
# fix break: 157:1297-1306 0 -0 +0 NoSolution
fix break: 157:8211-8213 8210 -80 +78 BreakFix
fix break: 157:8493-8494 8492 -18 +16 BreakFix
fix break: 157:9144 9069 -110 +108 

In [None]:
#Merge polished Flye assembly and LR_Gapcloser processed MycoCosm assembly.
!/usr/local/pkgs/quickmerge-0.3-pl526he1b5a44_0/bin/merge_wrapper.py /content/drive/MyDrive/genomictest/assemblies/second/pilon_flye.fasta /content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose/iteration-3/gapclosed.fasta 

1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "out.ntref" of length 39522729
# construct suffix tree for sequence of length 39522729
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 395227 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /usr/local/opt/mummer-3.23/mummer out.ntref 50.57
# reading input file "/content/hybrid_oneline.fa" of length 39016040
# matching query-file "/content/hybrid_oneline.fa"
# against subject-file "out.ntref"
# COMPLETETIME /usr/local/opt/mummer-3.23/mummer out.ntref 139.12
# SPACE /usr/local/opt/mummer-3.23/mummer out.ntref 75.53
4: FINISHING DATA
0	quickmerge
1	-d
2	out.rq.delta
3	-q
4	hybrid_oneline.fa
5	-r
6	/content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose/iteration-3/gapclosed.fasta
7	-hco
8	5.0
9	-c
10	1.5
11	-l
12	0
13	-ml
14	5000
15	-p
16	out
s

In [None]:
#Merge first merge result with MycoCosm assembly.
!/usr/local/pkgs/quickmerge-0.3-pl526he1b5a44_0/bin/merge_wrapper.py /content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta 

1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "out.ntref" of length 39522729
# construct suffix tree for sequence of length 39522729
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 395227 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /usr/local/opt/mummer-3.23/mummer out.ntref 52.07
# reading input file "/content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta" of length 41313645
# matching query-file "/content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta"
# against subject-file "out.ntref"
# COMPLETETIME /usr/local/opt/mummer-3.23/mummer out.ntref 141.50
# SPACE /usr/local/opt/mummer-3.23/mummer out.ntref 77.72
4: FINISHING DATA
0	quickmerge
1	-d
2	out.rq.delta
3	-q
4	/content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta
5	-r
6	/content/drive/MyDrive/g

In [None]:
#Use LR_Gapcloser and filtered nanopore long-reads to attempt to close any reamining gaps in final merged assembly.
!/content/LR_Gapcloser/src/LR_Gapcloser.sh -i /content/drive/MyDrive/genomictest/assemblies/second/merged_out1.fasta -l /content/drive/MyDrive/genomictest/gluc_np_clean.fasta -s n -t 4 -o /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose

-i(scaffolds)=/content/drive/MyDrive/genomictest/assemblies/second/merged_out1.fasta -l(longread)=/content/drive/MyDrive/genomictest/gluc_np_clean.fasta -s(platform)=n -t(thread)=4 -c(coverage)=0.8 -a(tolerance)=0.2 -m(max_distance)=600 -n(number)=5 -g(taglen)=300 -v(overstep)=300 -o(output)=/content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose


In [None]:
#Index gapclosed final merged result for polishing.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 index /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
[bwa_index] Pack FASTA... 0.32 sec
* Entering FMI_search
init ticks = 3887382486
ref seq len = 82635516
binary seq ticks = 3403106135
build suffix-array ticks = 81141991389
ref_seq_len = 82635516
count = 0, 18349085, 41317758, 64286431, 82635516
BWT[34486160] = 4
CP_SHIFT = 6, CP_MASK = 63
sizeof CP_OCC = 64
pos: 10329440, ref_seq_len__: 10329439
max_occ_ind = 1291179
build fm-index ticks = 12333586548
Total time taken: 44.2933


In [None]:
#Map short-reads to LR_Gapclosed processed final merged assembly and output sorted short-reads in bam format.
!/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2 mem -t 4 /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta /content/drive/MyDrive/genomictest/normalized_mapped.fq > out_merge1.sam
!samtools view -b /content/out_merge1.sam > output_merge1.bam
!rm /content/out_merge1.sam
!samtools sort -o sorted_merge1.bam /content/output_merge1.bam
!rm /content/output_merge1.bam
!samtools index -b /content/sorted_merge1.bam

Looking to launch executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2", simd = .avx2
Launching executable "/usr/local/pkgs/bwa-mem2-2.1-he513fc3_0/bin/bwa-mem2.avx2"
-----------------------------
Executing in AVX2 mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta
* Entering FMI_search
* Index file found. Loading index from /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta.bwt.2bit.64
* Reference seq len for bi-index = 82635517
* sentinel-index: 34486160
* Count:
0,	1
1,	18349086
2,	41317759
3,	64286432
4,	82635517

* Reading other elements of the index from files /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta
* Index prefix: /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta
* Read 0 ALT conti

In [None]:
#Use filtered short-reads to polish gapclosed final merged assembly.
!pilon --genome /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta --bam /content/sorted_merge1.bam --outdir /content --threads 4

Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500
Genome: /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta
Fixing snps, indels, gaps, local
Input genome size: 41317758
Scanning BAMs
/content/sorted_merge1.bam: 83174592 reads, 0 filtered, 83052493 mapped, 0 proper, 0 stray, Unpaired 100% 109+/-39, max 225 unpaired
Processing scaffold_16:1-533757
Processing scaffold_2:1-3722621
Processing contig_49_pilon:1-3654
Processing contig_202_pilon:1-13676
contig_49_pilon:1-3654 log:
unpaired /content/sorted_merge1.bam: coverage 46
Total Reads: 2854, Coverage: 46, minDepth: 5
Confirmed 2554 of 3654 bases (69.90%)
Corrected 15 snps; 0 ambiguous bases; corrected 6 small insertions totaling 6 bases, 13 small deletions totaling 15 bases
# Attempting to fix local continuity breaks
# fix break: contig_49_pilon:225-977 0 -0 +0 NoSolution
Finished processing contig_49_pilon:1-3654
Processing scaffold_70:1-3033
scaffold_70:1-3033 log:
unpaired /content/sorted

In [None]:
#Use QUAST to evaluate all assemblies, using MycoCosm assembly as the reference assembly.
!python /usr/local/pkgs/quast-5.0.2-py37pl526hb5aa323_2/opt/quast-5.0.2/quast.py --threads 4 -o /content/drive/MyDrive/genomictest/quast/second -r /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta --fragmented -L --fungus /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa /content/drive/MyDrive/genomictest/assemblies/second/pilon_haslr.fasta /content/drive/MyDrive/genomictest/assemblies/second/gluc_flye/assembly.fasta /content/drive/MyDrive/genomictest/assemblies/second/pilon_flye.fasta /content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose/iteration-3/gapclosed.fasta /content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta /content/drive/MyDrive/genomictest/assemblies/second/merged_out1.fasta /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta /content/drive/MyDrive/genomictest/assemblies/second/pilon_merge1.fasta

/usr/local/pkgs/quast-5.0.2-py37pl526hb5aa323_2/opt/quast-5.0.2/quast.py --threads 4 -o /content/drive/MyDrive/genomictest/quast/second -r /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta --fragmented -L --fungus /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta /content/drive/MyDrive/genomictest/assemblies/second/gluc_haslr/asm_contigs_k49_a3_lr25x_b500_s3_sim0.85/asm.final.fa /content/drive/MyDrive/genomictest/assemblies/second/pilon_haslr.fasta /content/drive/MyDrive/genomictest/assemblies/second/gluc_flye/assembly.fasta /content/drive/MyDrive/genomictest/assemblies/second/pilon_flye.fasta /content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose/iteration-3/gapclosed.fasta /content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta /content/drive/MyDrive/genomictest/assemblies/second/merged_out1.fasta /content/drive/MyDrive/genomictest/assemblies/second/merge1_gapclose/iteration-3/gapclosed.fasta /content/drive/MyDrive/g

In [None]:
#Use Busco to evaluate MycoCosm assembly.
!busco -i /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta -l polyporales_odb10 -o jgi -m genome

INFO:	***** Start a BUSCO v5.0.0 analysis, current time: 02/18/2021 05:12:30 *****
INFO:	Configuring BUSCO with local environment
INFO:	Mode is genome
INFO:	Input file is /content/drive/MyDrive/genomictest/Gansp1_JGI_AssemblyScaffolds.fasta
INFO:	Downloading information on latest versions of BUSCO data...
INFO:	Downloading file 'https://busco-data.ezlab.org/v5/data/lineages/polyporales_odb10.2020-08-05.tar.gz'
INFO:	Decompressing file '/content/busco_downloads/lineages/polyporales_odb10.tar.gz'
INFO:	Running BUSCO using lineage dataset polyporales_odb10 (eukaryota, 2020-08-05)
INFO:	Running 1 job(s) on metaeuk, starting at 02/18/2021 05:13:11
INFO:	[metaeuk]	1 of 1 task(s) completed
INFO:	***** Run HMMER on gene sequences *****
INFO:	Running 4464 job(s) on hmmsearch, starting at 02/18/2021 05:22:59
INFO:	[hmmsearch]	447 of 4464 task(s) completed
INFO:	[hmmsearch]	893 of 4464 task(s) completed
INFO:	[hmmsearch]	1340 of 4464 task(s) completed
INFO:	[hmmsearch]	1786 of 4464 task(s) comple

In [None]:
#Use busco to evaluate polished gapclosed final merged assembly.
!busco -i /content/drive/MyDrive/genomictest/assemblies/second/pilon_merge1.fasta -l polyporales_odb10 -o pilon_merge1 -m genome

INFO:	***** Start a BUSCO v5.0.0 analysis, current time: 02/18/2021 05:56:22 *****
INFO:	Configuring BUSCO with local environment
INFO:	Mode is genome
INFO:	Input file is /content/drive/MyDrive/genomictest/assemblies/second/pilon_merge1.fasta
INFO:	Downloading information on latest versions of BUSCO data...
INFO:	Running BUSCO using lineage dataset polyporales_odb10 (eukaryota, 2020-08-05)
INFO:	Running 1 job(s) on metaeuk, starting at 02/18/2021 05:56:22
INFO:	[metaeuk]	1 of 1 task(s) completed
INFO:	***** Run HMMER on gene sequences *****
INFO:	Running 4464 job(s) on hmmsearch, starting at 02/18/2021 06:05:51
INFO:	[hmmsearch]	447 of 4464 task(s) completed
INFO:	[hmmsearch]	893 of 4464 task(s) completed
INFO:	[hmmsearch]	1340 of 4464 task(s) completed
INFO:	[hmmsearch]	1786 of 4464 task(s) completed
INFO:	[hmmsearch]	2232 of 4464 task(s) completed
INFO:	[hmmsearch]	2679 of 4464 task(s) completed
INFO:	[hmmsearch]	3125 of 4464 task(s) completed
INFO:	[hmmsearch]	3572 of 4464 task(s) c

In [None]:
#Use busco to evaluate all remaining assemblies generated in this pipeline, with the exception of the HASLR assembly.
!busco -i /content/drive/MyDrive/genomictest/GCA_000271565.1_GanLuc1.0_genomic.fasta -l polyporales_odb10 -o gluc_GCA00271565_1 -m genome
!busco -i /content/drive/MyDrive/genomictest/assemblies/second/pilon_flye.fasta -l polyporales_odb10 -o pilon_flye -m genome
!busco -i /content/drive/MyDrive/genomictest/assemblies/second/merged_out.fasta -l polyporales_odb10 -o merged_out -m genome
!busco -i /content/drive/MyDrive/genomictest/assemblies/second/gl_jgi_gapclose/iteration-3/gapclosed.fasta -l polyporales_odb10 -o jgi_gapclose -m genome

INFO:	***** Start a BUSCO v5.0.0 analysis, current time: 02/18/2021 23:32:39 *****
INFO:	Configuring BUSCO with local environment
INFO:	Mode is genome
INFO:	Input file is /content/drive/MyDrive/genomictest/GCA_000271565.1_GanLuc1.0_genomic.fasta
INFO:	Downloading information on latest versions of BUSCO data...
INFO:	Downloading file 'https://busco-data.ezlab.org/v5/data/lineages/polyporales_odb10.2020-08-05.tar.gz'
INFO:	Decompressing file '/content/busco_downloads/lineages/polyporales_odb10.tar.gz'
INFO:	Running BUSCO using lineage dataset polyporales_odb10 (eukaryota, 2020-08-05)
INFO:	Running 1 job(s) on metaeuk, starting at 02/18/2021 23:33:19
INFO:	[metaeuk]	1 of 1 task(s) completed
INFO:	***** Run HMMER on gene sequences *****
INFO:	Running 4464 job(s) on hmmsearch, starting at 02/18/2021 23:43:30
INFO:	[hmmsearch]	447 of 4464 task(s) completed
INFO:	[hmmsearch]	893 of 4464 task(s) completed
INFO:	[hmmsearch]	1340 of 4464 task(s) completed
INFO:	[hmmsearch]	1786 of 4464 task(s) c

Below commands included reference.

In [None]:
#get only mapped reads to calculate coverage
#samtools view -b -F 4 file.bam > mapped.bam

In [None]:
#!racon -t 4 /content/drive/MyDrive/genomictest/assemblies/second/sr_unmatched.fq.gz /content/sorted_lr_flye.sam /content/drive/MyDrive/genomictest/assemblies/second/flye.fasta

In [None]:
#!racon -t 4 /content/drive/MyDrive/genomictest/assemblies/second/sr_unmatched.fq.gz /content/sorted_flye.sam /content/drive/MyDrive/genomictest/assemblies/second/flye.fasta

[racon::Polisher::initialize] loaded target sequences 0.454708 s
^C


In [None]:
#!bedtools genomecov -ibam /content/sorted_sr.bam -g /content/drive/MyDrive/genomictest/assemblies/second/merged_out3.fasta

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
genome	30339	2	39069089	5.11914e-08
genome	30340	3	39069089	7.67871e-08
genome	30341	2	39069089	5.11914e-08
genome	30343	1	39069089	2.55957e-08
genome	30344	2	39069089	5.11914e-08
genome	30345	3	39069089	7.67871e-08
genome	30348	4	39069089	1.02383e-07
genome	30349	4	39069089	1.02383e-07
genome	30350	1	39069089	2.55957e-08
genome	30351	1	39069089	2.55957e-08
genome	30353	2	39069089	5.11914e-08
genome	30354	1	39069089	2.55957e-08
genome	30355	2	39069089	5.11914e-08
genome	30356	4	39069089	1.02383e-07
genome	30357	4	39069089	1.02383e-07
genome	30359	1	39069089	2.55957e-08
genome	30360	3	39069089	7.67871e-08
genome	30361	6	39069089	1.53574e-07
genome	30363	1	39069089	2.55957e-08
genome	30364	2	39069089	5.11914e-08
genome	30365	3	39069089	7.67871e-08
genome	30366	2	39069089	5.11914e-08
genome	30367	2	39069089	5.11914e-08
genome	30368	4	39069089	1.02383e-07
genome	30369	1	39069089	2.55957e-08
genome	30370	1	39069089	2.55957e-08

In [None]:
#!samtools flagstat /content/drive/MyDrive/genomictest/assemblies/second/sorted_lr.bam

107055 + 0 in total (QC-passed reads + QC-failed reads)
9372 + 0 secondary
7505 + 0 supplementary
0 + 0 duplicates
83938 + 0 mapped (78.41% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


In [None]:
#!samtools flagstat /content/drive/MyDrive/genomictest/assemblies/second/sorted_sr.bam

238008091 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
20573 + 0 supplementary
0 + 0 duplicates
219522600 + 0 mapped (92.23% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
