## Data sources

- HiC Data for Nui is here:
    - /input/genomic/plant/Vaccinium/corymbosum/AGRF_CAGRF21434_HJWHFDRXX


- 10X data for Nui and M7 here:
    - /input/genomic/plant/Vaccinium/corymbosum/AGRF_CAGRF18813_H7JY3DRXX


- ONT PromethION Nui (BB2020 and BB2020-2 are the same sample) here:
    - /input/genomic/plant/Vaccinium/corymbosum/Blueberry_PromethION_Apr2020


- ONT MinION Nui (BB2020) here:
    - /input/genomic/plant/Vaccinium/corymbosum/CAGRF21436/20200224_MinION/AGRF_CAGRFF21436_FAL87845_BB2020/


- 10X Supernova Assembly for 10X data here:
    - /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova

### Plan 
- base-calling for ONT samples using Guppy v5.
- Filter out MinION reads <1kb. Or higher...
- Cecilia has done the Supernova assembly for the 10X data.
- Use Flye to assemble ONT fastq
- Use quickmerge to merge the Supernova contigs + ONT contigs
- Use Salsa to improve assembly
- Tetraploid Haplotyping and gene annotation etc. 



**See 01_basecalling_ONT.ipyn for ONT steps**

## Flye 
### Prep

In [1]:
WKDIR=/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly
PROMREADS=/workspace/hraijc/BB_Nui_Assembly/Nui_BB2020_ONT/BB2020_PromethION_Fastq/BB2020_gupppy5.fastq
MINREADS=/workspace/hraijc/BB_Nui_Assembly/Nui_BB2020_ONT/BB2020_MinION_1kb/BB2020_MinION_1kb.fastq
ONTREADS=${WKDIR}/BB2020_ONT.fastq

In [4]:
mkdir $WKDIR
mkdir ${WKDIR}/log
mkdir ${WKDIR}/FLYE01

mkdir: cannot create directory ‘/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/’: File exists
mkdir: cannot create directory ‘/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly//log’: File exists


In [2]:
cd $WKDIR

In [27]:
#Make one file for all ONT reads.
cat $PROMREADS $MINREADS > $ONTREADS


In [28]:
du -sh $PROMREADS
du -sh $MINREADS
du -sh $ONTREADS

68G	/workspace/hraijc/BB_Nui_Assembly/Nui_BB2020_ONT/BB2020_PromethION_Fastq/BB2020_gupppy5.fastq
18G	/workspace/hraijc/BB_Nui_Assembly/Nui_BB2020_ONT/BB2020_MinION_1kb/BB2020_MinION_1kb.fastq
86G	/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/BB2020_ONT.fastq


In [3]:
#module load flye/2.8.3

### Run

In [34]:
#bsub -J FLYE01  -o $WKDIR/log/FLYE01.out -e $WKDIR/log/FLYE01.err -n 17 -R rusage[mem=450000] \
#"flye --nano-corr ${ONTREADS} --out-dir ${WKDIR}/FLYE01 --threads 16 --iterations 1"
# Failed with  ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
# Trying again setting asm-cov=50 and setting genome size

Job <630194> is submitted to default queue <lowpriority>.


In [6]:
#bsub -J FLYE02  -o $WKDIR/log/FLYE02.out -e $WKDIR/log/FLYE02.err -n 17 -R rusage[mem=100000] \
#"flye --nano-corr ${ONTREADS} --out-dir ${WKDIR}/FLYE02 --asm-coverage 50 --genome-size 600m --threads 16 --iterations 1"
# Failed with same error. I think its because of the --nano-corr flag. This assumes a very very low error rate. 
# The --nano-raw option seems to be too high though. 
# Going to try with version 2.9 which has the --nano-hq flag for the guppy v5 reads.

Job <632305> is submitted to default queue <lowpriority>.


In [4]:
module unload conda
module unload flye
module load conda
conda activate hraijc_flye


(/workspace/appscratch/miniconda/hraijc_flye) 

: 1

In [22]:
python /workspace/hraijc/git_clones/Flye/bin/flye -v

2.9-b1774
(/workspace/appscratch/miniconda/hraijc_flye) 

: 1

In [23]:
##TRYING WITH VERSION 2.9
bsub -J FLYE03  -o $WKDIR/log/FLYE03.out -e $WKDIR/log/FLYE03.err -n 17 -R rusage[mem=100000] \
"python /workspace/hraijc/git_clones/Flye/bin/flye --nano-hq ${ONTREADS} --out-dir ${WKDIR}/FLYE03 --asm-coverage 50 --genome-size 600m --threads 16 --iterations 1"

(/workspace/appscratch/miniconda/hraijc_flye) Job <632359> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_flye) 

: 1

In [5]:
conda deactivate

In [6]:
module unload conda

## Try again with filtering for quality and ONT read length.

In [33]:
ONTREADS2=/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/BB2020_ONT_min5kb.fastq
ONTREADS3=/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/BB2020_ONT_min5kb_10q.fastq

In [7]:
module load seqkit
module load nanopack

In [29]:
#bsub -J seqtkit2b  -o $WKDIR/log/seqtkit2b.out -e $WKDIR/log/seqtkit2b.err -n 9 \
#"seqkit seq -j 8 -m 5000 BB2020_ONT.fastq > ${ONTREADS2}"

Job <635035> is submitted to default queue <lowpriority>.


In [36]:
bsub -J NanoFilt  -o $WKDIR/log/NanoFilt.out -e $WKDIR/log/NanoFilt.err -n 2 \
"NanoFilt -q 10 -l 5000 BB2020_ONT.fastq > ${ONTREADS3}"

Job <635140> is submitted to default queue <lowpriority>.


In [8]:
bsub -J seqtkit3  -o $WKDIR/log/seqtkit3.out -e $WKDIR/log/seqtkit3.err -n 9 \
"seqkit stats -j 8 *.fastq"

Job <635395> is submitted to default queue <lowpriority>.


In [None]:
#file                         format  type   num_seqs         sum_len  min_len   avg_len  max_len
#BB2020_ONT.fastq             FASTQ   DNA   3,146,685  39,717,105,373       15  12,621.9  199,090
#BB2020_ONT_min5kb_10q.fastq  FASTQ   DNA   1,726,459  32,859,909,657    5,000  19,033.1  199,090

In [12]:
module unload seqkit
module unload nanopack

### Flye on filtered ONT dataset.

In [1]:
WKDIR=/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly
ONTREADS3=/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/BB2020_ONT_min5kb_10q.fastq

In [14]:
module load conda
conda activate hraijc_flye

(/workspace/appscratch/miniconda/hraijc_flye) 

: 1

In [15]:
bsub -J FLYE04  -o $WKDIR/log/FLYE04.out -e $WKDIR/log/FLYE04.err -n 17 -R rusage[mem=100000] \
"python /workspace/hraijc/git_clones/Flye/bin/flye --nano-hq ${ONTREADS3} --out-dir ${WKDIR}/FLYE04 --asm-coverage 50 --genome-size 600m --threads 16 --iterations 2"

Job <635402> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_flye) 

: 1

In [4]:
mv ONT_Assemly/FLYE04/assembly.fasta ONT_Assemly/FLYE04/Flye04_assembly.fasta 

In [5]:
ln -s /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Flye04_assembly.fasta

ln: failed to create symbolic link ‘/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Flye04_assembly.fasta’: File exists


: 1

Loading [1mBUSCO/v5.2.2[22m
  [94mLoading requirement[0m: singularity/3


In [3]:
module load BUSCO/v5.2.2
bsub -J busco4 -o ${WKDIR}/log/busco4.log -e ${WKDIR}/log/busco4.err -n 25 \
"busco -i /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta -l eudicots -o Flye04_busco -m geno -c 24 --datasets_version odb10"

Job <709649> is submitted to default queue <lowpriority>.
