# Gap closing using TGS-GapCloser

It was unclear whether the program could extend contig ends, therefore resolving telomeres to the 10x contigs. Below is the response from the authro:
* From the author (Mengyang): TGS-GapCloser is designed for closing the gaps in the assembly, which means both contig ends need to match the long reads. It does not work for a single contig end to extend. In your case, I would suggest that cut off the suspicious sequences at the end of the contig first, align the modified end (>2kbp) to long reads to find the candidate extensions, and find the best matched long-read segment to finally extend the end to telomere. You can choose the best segment based on the alignment length and alignment quality.

I'll still run this program to close gaps of contigs and probalby find other alternatives or we'll write scripts coupled with alignment tools to fix the problem.


In [1]:
export PATH=/workspace/hraczw/github/programs/TGS-GapCloser/:$PATH

In [2]:
WORKDIR=/workspace/hraczw/github/GA/Gillenia_genome/005.GapFilling

In [18]:
SCAFF_LINKS_I8=/workspace/hraczw/github/GA/Gillenia_genome/004.Scaffolding_10Xcontigs/k21_t20_d5000/scaffolds_k21_t10_d40000.scaffolds.fa

## 1. convert ONT fastq to fasta
TGS-GapCloser only takes fasta format input

In [14]:
module load seqtk
module list

Currently Loaded Modulefiles:
  1) powerPlant/core        4) git/2.21.0             7) asub/2.1
  2) texlive/20151117       5) perlbrew/0.76          8) fastx_toolkit/0.0.13
  3) pandoc/1.19.2          6) perl/5.28.0            9) seqtk/1.2


In [15]:
seqtk seq -h

seq: invalid option -- 'h'

Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n r

In [16]:
bsub -J fq-to-fa \
-o $WORKDIR/fq-to-fa.minion.out \
-e $WORKDIR/fq-to-fa.minion.err \
"seqtk seq -A Gillenia_MinNION.fastq > Gillenia_MiNION.fasta"

Job <231661> is submitted to default queue <lowpriority>.


In [17]:
bsub -J fq-to-fa \
-o $WORKDIR/fq-to-fa.p.out \
-e $WORKDIR/fq-to-fa.p.err \
"seqtk seq -A Gillenia_PromethION.fastq > Gillenia_PromethION.fasta"

Job <231662> is submitted to default queue <lowpriority>.


## 2. Test run on TGS-GapCloser

* this is a test run on the links i8 version of scaffolds
* error correction will be done using short reads R1 from S4

In [19]:
bsub -J merge \
-o merge_to_allLong.out \
-e merge_to_allLong.err \
"cat Gillenia_MiNION.fasta Gillenia_PromethION.fasta > Gillenia_all_longReads.fasta"

Job <231663> is submitted to default queue <lowpriority>.


In [20]:
rm Gillenia_PromethION.fasta
rm Gillenia_MiNION.fasta

In [22]:
# try a run without correction

bsub -J tgs-gapcloser \
-m aklppb34 \
-n 20 \
-o $WORKDIR/tgs-gapcloser_test.out \
-e $WORKDIR/tgs-gapcloser_test.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I8 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i8_gapFilled.tgs-gapcloser \
--ne \
--thread 20"

Job <231665> is submitted to default queue <lowpriority>.


In [23]:
R1=/input/genomic/plant/Gillenia/trifoliata/2019_Novogene_10X/customer-6g2hQKfL/BDHX190030841-1a/G3-2-AK3720_S4_L001_R1_001.fastq.gz

In [24]:
# try a run with Racon and pilon correction

bsub -J tgs-gapcloser \
-m aklppb34 \
-n 20 \
-o $WORKDIR/tgs-gapcloser_test.out \
-e $WORKDIR/tgs-gapcloser_test.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I8 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i8_gapFilled.corrected.tgs-gapcloser \
--ngs $R1 \
--pilon /software/bioinformatics/pilon-1.23/pilon-1.23.jar \
--samtools /software/bioinformatics/samtools-1.9/bin/samtools \
--java /usr/lib/jvm/java-1.8.0/bin/java \
--thread 20"

Job <231669> is submitted to default queue <lowpriority>.


**This R1 needs to be barcode-trimmed! stop the job!**

## 3. official run on TGS-GapCloser

* filling gaps on the links_i10 version (d=50000)
* using S3 library for error correction (need to trim 10x barcodes first)

In [25]:
SCAFF_LINKS_I10=/workspace/hraczw/github/GA/Gillenia_genome/004.Scaffolding_10Xcontigs/k21_t20_d5000/scaffolds_k21_t10_d50000.scaffolds.fa

In [1]:
SCAFF_LINKS_I6=/workspace/hraczw/github/GA/Gillenia_genome/004.Scaffolding_10Xcontigs/k21_t20_d5000/scaffolds_k21_t10_d30000.scaffolds.fa

In [5]:
SCAFF_LINKS_I4=/workspace/hraczw/github/GA/Gillenia_genome/004.Scaffolding_10Xcontigs/k21_t20_d5000/scaffolds_k21_t10_d20000.scaffolds.fa

In [14]:
SCAFF_LINKS_I1=/workspace/hraczw/github/GA/Gillenia_genome/004.Scaffolding_10Xcontigs/k21_t20_d5000/scaffolds_k21_t20_d5000.scaffolds.fa

In [16]:
SCAFF_SLR=/workspace/hraczw/github/GA/Gillenia_genome/004.Scaffolding_10Xcontigs/SLR/scaffold_set.fa

In [7]:
R1_S3=/workspace/hraczw/github/GA/Gillenia_genome/005.GapFilling/G3-2-AK3719_S3_L001_R1_001.fastq.gz
R1_S2=/workspace/hraczw/github/GA/Gillenia_genome/005.GapFilling/G3-2-AK3718_S2_L001_R1_001.fastq.gz
R1_S1=/workspace/hraczw/github/GA/Gillenia_genome/005.GapFilling/G3-2-AK3717_S1_L001_R1_001.fastq.gz

In [32]:
bsub -J merge \
-o $WORKDIR/merge_10x.out \
-e $WORKDIR/merge_10x.err \
"cat $R1_S3 $R1_S2 $R1_S1 > $WORKDIR/10x_S1-S3_R1.fastq.gz"

Job <231946> is submitted to default queue <lowpriority>.


In [33]:
R1_S1TOS3=$WORKDIR/10x_S1-S3_R1.fastq.gz

In [34]:
bsub -J tgs-gapcloser \
-m aklppb34 \
-n 60 \
-o $WORKDIR/tgs-gapcloser_links_i10.out \
-e $WORKDIR/tgs-gapcloser_links_i10.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I10 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i10_gapFilled.corrected.tgs-gapcloser \
--ngs $R1_S1TOS3 \
--pilon /software/bioinformatics/pilon-1.23/pilon-1.23.jar \
--samtools /software/bioinformatics/samtools-1.9/bin/samtools \
--java /usr/lib/jvm/java-1.8.0/bin/java \
--thread 60"

Job <231948> is submitted to default queue <lowpriority>.


In [35]:
bsub -J tgs-gapcloser \
-m wkoppb50 \
-n 60 \
-o $WORKDIR/tgs-gapcloser_links_i10_noCorrection.out \
-e $WORKDIR/tgs-gapcloser_links_i10_noCorrection.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I10 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i10_gapFilled.noCorrection.tgs-gapcloser \
--ne \
--thread 60"

Job <231973> is submitted to default queue <lowpriority>.


In [4]:
bsub -J tgs-gapcloser \
-m wkoppb50 \
-n 60 \
-o $WORKDIR/tgs-gapcloser_links_i6_noCorrection.out \
-e $WORKDIR/tgs-gapcloser_links_i6_noCorrection.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I6 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i6_gapFilled.noCorrection.tgs-gapcloser \
--ne \
--thread 60"

Job <239009> is submitted to default queue <lowpriority>.


In [10]:
bsub -J tgs-gapcloser \
-m aklppb34 \
-n 60 \
-o $WORKDIR/tgs-gapcloser_links_i4_noCorrection.out \
-e $WORKDIR/tgs-gapcloser_links_i4_noCorrection.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I4 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i4_gapFilled.noCorrection.tgs-gapcloser \
--ne \
--thread 60"

Job <243749> is submitted to default queue <lowpriority>.


In [15]:
bsub -J tgs-gapcloser \
-m aklppb39 \
-n 60 \
-o $WORKDIR/tgs-gapcloser_links_i1_noCorrection.out \
-e $WORKDIR/tgs-gapcloser_links_i1_noCorrection.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_LINKS_I1 \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_links_i1_gapFilled.noCorrection.tgs-gapcloser \
--ne \
--thread 60"

Job <243752> is submitted to default queue <lowpriority>.


In [17]:
bsub -J tgs-gapcloser \
-m wkoppb50 \
-n 56 \
-o $WORKDIR/tgs-gapcloser_slr_noCorrection.out \
-e $WORKDIR/tgs-gapcloser_slr_noCorrection.err \
"TGS-GapCloser.sh \
--scaff $SCAFF_SLR \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_slr_gapFilled.noCorrection.tgs-gapcloser \
--ne \
--thread 60"

Job <243753> is submitted to default queue <lowpriority>.


## 4. gapfilling ragoo assembly

In [3]:
RAGOO=/workspace/hraczw/github/GA/Gillenia_genome/011.RagooScaffolding/ragoo_curated_non-haplotigs/ragoo.fasta

In [4]:
bsub -J tgs-gapcloser \
-m wkoppb50 \
-n 60 \
-o $WORKDIR/tgs-gapcloser_ragoo_noCorrection.out \
-e $WORKDIR/tgs-gapcloser_ragoo_noCorrection.err \
"TGS-GapCloser.sh \
--scaff $RAGOO \
--reads Gillenia_all_longReads.fasta \
--output $WORKDIR/scaff_ragoo_gapFilled.noCorrection.tgs-gapcloser \
--ne \
--thread 60"

Job <266445> is submitted to default queue <lowpriority>.
