## Data sources

- HiC Data for Nui is here:
    - /input/genomic/plant/Vaccinium/corymbosum/AGRF_CAGRF21434_HJWHFDRXX


- 10X data for Nui and M7 here:
    - /input/genomic/plant/Vaccinium/corymbosum/AGRF_CAGRF18813_H7JY3DRXX


- ONT PromethION Nui (BB2020 and BB2020-2 are the same sample) here:
    - /input/genomic/plant/Vaccinium/corymbosum/Blueberry_PromethION_Apr2020


- ONT MinION Nui (BB2020) here:
    - /input/genomic/plant/Vaccinium/corymbosum/CAGRF21436/20200224_MinION/AGRF_CAGRFF21436_FAL87845_BB2020/


- 10X Supernova Assembly for 10X data here:
    - /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova

### Plan 
- base-calling for ONT samples using Guppy v5.
- Filter out MinION reads <1kb. Or higher...
- Cecilia has done the Supernova assembly for the 10X data.
- Use Flye to assemble ONT fastq
- Use quickmerge to merge the Supernova contigs + ONT contigs
- Use Salsa to improve assembly
- Tetraploid Haplotyping and gene annotation etc. 



**See 01_basecalling_ONT.ipyn for ONT steps**

**See 02_flye.ipyn for ONT Assembly**


## Check to see how the Flye assembly compares to the 10X

In [1]:
WKDIR=/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly
cd $WKDIR

In [3]:
mkdir $WKDIR
mkdir ${WKDIR}/log


mkdir: cannot create directory ‘/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly’: File exists


In [4]:
#mkdir ${WKDIR}/Assembly_comparison
#cp /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/Nui.1.fasta ${WKDIR}/Assembly_comparison/Supernova_Nui.1.fasta   
#cp /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE03/assembly.fasta ${WKDIR}/Assembly_comparison/Flye03_assembly.fasta
#cp /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/Nui.1.min2KB.fasta ${WKDIR}/Assembly_comparison/Supernova_Nui.1.min2kb.fasta


#### Contiguity stats of assemblies

In [5]:
module load BBMap/38.33

In [6]:
cd ${WKDIR}/Assembly_comparison

In [7]:
bsub -J bstat1 -o ${WKDIR}/log/bstat1.log -e ${WKDIR}/log/bstat1.err \
"statswrapper.sh in=Supernova_Nui.1.fasta,Flye03_assembly.fasta,Supernova_Nui.1.min2kb.fasta format=3"

Job <636360> is submitted to default queue <lowpriority>.


|n_scaffolds|n_contigs|scaf_bp|contig_bp|gap_pct|scaf_N50(number)|scaf_L50(SeqLength)|ctg_N50|ctg_L50|scaf_N90|scaf_L90|ctg_N90|ctg_L90|scaf_max|ctg_max|scaf_n_gt50K|scaf_pct_gt50K|gc_avg|gc_std|filename|
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
|294493|327732|890445603|879628983|1.215|31787|6211|40402|5258|172322|1055|196312|989|487194|127919|843|8.239|0.38910|0.04736|Supernova_Nui.1.fasta|
|105550|132451|706271084|695673804|1.500|19187|8705|25361|7191|76634|2942|91036|2603|487194|127919|843|10.387|0.39126|0.03921|Supernova_Nui.1.min2kb.fasta|
|22050|22050|1804096927|1804096927|0.000|2301|212778|2301|212778|9403|43723|9403|43723|1907487|1907487|8662|88.079|0.38702|0.03932|Flye03_assembly.fasta|


In [32]:
module load BUSCO/v5.2.2

In [17]:
bsub -J busco1 -o ${WKDIR}/log/busco1.log -e ${WKDIR}/log/busco1.err -n 9 \
"busco -i Flye03_assembly.fasta -l eudicots -o Flye03_busco -m geno -c 8 --datasets_version odb10"

Job <633481> is submitted to default queue <lowpriority>.


In [17]:
bsub -J busco2 -o ${WKDIR}/log/busco2.log -e ${WKDIR}/log/busco2.err -n 17 \
"busco -i Nui_quickmerge2.fasta -l eudicots -o Nui_quickmerge2_busco -m geno -c 16 --datasets_version odb10"

Job <633835> is submitted to default queue <lowpriority>.


In [33]:
bsub -J busco3 -o ${WKDIR}/log/busco3.log -e ${WKDIR}/log/busco3.err -n 17 \
"busco -i Supernova_Nui.1.min2kb.fasta -l eudicots -o Supernova_Nui.1.min2kb_busco -m geno -c 16 --datasets_version odb10"


Job <633989> is submitted to default queue <lowpriority>.


In [34]:
module unload BBMap
module unload BUSCO

## Quickmerge

In [35]:
module load conda
conda activate hraijc_quickmerge
source /workspace/hraijc/git_clones/quickmerge/.quickmergerc

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [38]:
#remove whitespaces from Supernova Assembly
#sed 's/[[:space:]]*//g' ${WKDIR}/Assembly_comparison/Supernova_Nui.1.min2kb.fasta > ${WKDIR}/Assembly_comparison/Supernova_Nui.1.min2kb.fa


(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [43]:
#mkdir ${WKDIR}/quickmerge1

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [6]:
cd ${WKDIR}/quickmerge1

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [7]:
bsub -J quickmerge1 -o ${WKDIR}/log/quickmerge1.log -e ${WKDIR}/log/quickmerge1.err \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py ${WKDIR}/Assembly_comparison/Flye03_assembly.fasta ${WKDIR}/Assembly_comparison/Supernova_Nui.1.min2kb.fasta -l 8705"

Job <633214> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

#mkdir ${WKDIR}/quickmerge2

In [9]:
cd ${WKDIR}/quickmerge2

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [10]:
bsub -J quickmerge2 -o ${WKDIR}/log/quickmerge2.log -e ${WKDIR}/log/quickmerge2.err \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py ${WKDIR}/Assembly_comparison/Supernova_Nui.1.min2kb.fasta ${WKDIR}/Assembly_comparison/Flye03_assembly.fasta -l 212778"

Job <633215> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [26]:
mkdir ${WKDIR}/quickmerge3

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [41]:
cd ${WKDIR}/quickmerge3

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [42]:
bsub -J quickmerge3 -o ${WKDIR}/log/quickmerge3.log -e ${WKDIR}/log/quickmerge3.err \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py ${WKDIR}/Assembly_comparison/Nui_quickmerge2.fasta ${WKDIR}/Assembly_comparison/Flye03_assembly.fasta -l 317351"

Job <633993> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [12]:
#cp /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/quickmerge1/merged_out.fasta ${WKDIR}/Assembly_comparison/Nui_quickmerge1.fasta
#cp /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/quickmerge2/merged_out.fasta ${WKDIR}/Assembly_comparison/Nui_quickmerge2.fasta
#cp /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/quickmerge3/merged_out.fasta ${WKDIR}/Assembly_comparison/Nui_quickmerge3.fasta


In [46]:
conda deactivate
module unload conda

#### Did they improve?

In [8]:
module load BBMap/38.33

In [9]:
cd ${WKDIR}/Assembly_comparison

In [10]:
bsub -J bstat2 -o ${WKDIR}/log/bstat2.log -e ${WKDIR}/log/bstat2.err \
"statswrapper.sh in=Supernova_Nui.1.fasta,Supernova_Nui.1.min2kb.fasta,Flye03_assembly.fasta,Nui_quickmerge1.fasta,Nui_quickmerge2.fasta,Nui_quickmerge3.fasta,Flye04_assembly.fasta format=3"

Job <636361> is submitted to default queue <lowpriority>.


In [11]:
module unload BBMap/38.33

|n_scaffolds|n_contigs|scaf_bp|contig_bp|gap_pct|scaf_N50|scaf_L50|ctg_N50|ctg_L50|scaf_N90|scaf_L90|ctg_N90|ctg_L90|scaf_max|ctg_max|scaf_n_gt50K|scaf_pct_gt50K|gc_avg|gc_std|filename|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n_scaffolds|n_contigs|scaf_bp|contig_bp|gap_pct|scaf_N50|scaf_L50|ctg_N50|ctg_L50|scaf_N90|scaf_L90|ctg_N90|ctg_L90|scaf_max|ctg_max|scaf_n_gt50K|scaf_pct_gt50K|gc_avg|gc_std|filename
294493|327732|890445603|879628983|1.215|31787|6211|40402|5258|172322|1055|196312|989|487194|127919|843|8.239|0.38910|0.04736|Supernova_Nui.1.fasta
105550|132451|706271084|695673804|1.500|19187|8705|25361|7191|76634|2942|91036|2603|487194|127919|843|10.387|0.39126|0.03921|Supernova_Nui.1.min2kb.fasta
22050|22050|1804096927|1804096927|0.000|2301|212778|2301|212778|9403|43723|9403|43723|1907487|1907487|8662|88.079|0.38702|0.03932|Flye03_assembly.fasta
21859|22159|1802844220|1802206750|0.035|2285|213713|2297|212521|9343|43890|9424|43327|1907487|1907487|8628|88.140|0.38698|0.03928|Nui_quickmerge1.fasta
62183|77870|1264110310|1261613670|0.198|1224|317351|1228|316036|23729|5260|27929|4553|1942722|1942722|2508|72.941|0.38879|0.03949|Nui_quickmerge2.fasta
62180|77867|1265844424|1263347784|0.197|1223|317509|1228|316117|23694|5266|27891|4558|1942722|1942722|2510|72.979|0.38881|0.03949|Nui_quickmerge3.fasta
18923|18923|1804003865|1804003865|0.000|2279|212967|2279|212967|9115|47213|9115|47213|2121296|2121296|8821|89.209|0.38687|0.03763|Flye04_assembly.fasta

In [22]:
tail -n 13 ${WKDIR}/Assembly_comparison/Flye03_busco/short*

	***** Results: *****

	C:97.1%[S:13.5%,D:83.6%],F:0.7%,M:2.2%,n:2326	   
	2259	Complete BUSCOs (C)			   
	315	Complete and single-copy BUSCOs (S)	   
	1944	Complete and duplicated BUSCOs (D)	   
	16	Fragmented BUSCOs (F)			   
	51	Missing BUSCOs (M)			   
	2326	Total BUSCO groups searched		   

Dependencies and versions:
	hmmsearch: 3.1
	metaeuk: 4.a0f584d


## New Strategy

Trying again with:

- Using both pseudohaps of the Supernova assemlby.
- 5kb min for the ONT data with min 10 quality.
- 

In [23]:
ls /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/*.min2KB.fasta

/output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/Nui.1.min2KB.fasta
/output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/Nui.2.min2KB.fasta


In [17]:
#Combine pseudohaps of the Supernova Assembly.
bsub -J cat -o ${WKDIR}/log/cat.log -e ${WKDIR}/log/cat.err \
"cat /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/*.min2KB.fasta > /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui_min2kb.fasta"

Job <637585> is submitted to default queue <lowpriority>.


In [12]:
module load seqtk

In [20]:
#Rename fasta headers so they are unique
bsub -J seqtk -o ${WKDIR}/log/seqtk.log -e ${WKDIR}/log/seqtk.err -n 2 \
"seqtk rename /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui_min2kb.fasta > /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui_min2kb_u.fasta"

Job <637587> is submitted to default queue <lowpriority>.


In [13]:
#Rename fasta headers so they are unique
bsub -J seqtk3 -o ${WKDIR}/log/seqtk3.log -e ${WKDIR}/log/seqtk3.err -n 2 \
"seqtk rename /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova/Nui.1.min2KB.fasta > /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui.1_min2kb_u.fasta"

Job <643192> is submitted to default queue <lowpriority>.


In [14]:
module unload seqtk

In [15]:
mv /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui.1_min2kb_u.fasta /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui.1_min2kb.fasta


#### Quickmerge

In [26]:
module load conda
conda activate hraijc_quickmerge
source /workspace/hraijc/git_clones/quickmerge/.quickmergerc

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [27]:
WKDIR=/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly
cd $WKDIR

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [28]:
#mkdir ${WKDIR}/quickmerge5

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [29]:
cd ${WKDIR}/quickmerge5

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [30]:
bsub -J quickmerge5 -o ${WKDIR}/log/quickmerge5.log -e ${WKDIR}/log/quickmerge5.err -R rusage[mem=100000] \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui_min2kb.fasta -l 8694"

Job <637596> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [31]:
#mkdir ${WKDIR}/quickmerge6

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [32]:
cd ${WKDIR}/quickmerge6

(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [33]:
bsub -J quickmerge6 -o ${WKDIR}/log/quickmerge6.log -e ${WKDIR}/log/quickmerge6.err -R rusage[mem=100000] \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py /workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Supernova_Nui_min2kb.fasta /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta -l 212967"

Job <637597> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [2]:
mv ${WKDIR}/quickmerge5/merged_out.fasta ${WKDIR}/quickmerge5/Nui_quickmerge5.fasta
mv ${WKDIR}/quickmerge6/merged_out.fasta ${WKDIR}/quickmerge6/Nui_quickmerge6.fasta

In [47]:

conda deactivate 
module unload conda

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [3]:
module load BBMap/38.33

In [4]:
bsub -J bstat5 -o ${WKDIR}/log/bstat5.log -e ${WKDIR}/log/bstat5.err \
"statswrapper.sh in=${WKDIR}/quickmerge5/Nui_quickmerge5.fasta,${WKDIR}/quickmerge6/Nui_quickmerge6.fasta format=3"

Job <643170> is submitted to default queue <lowpriority>.


### Round3

Merging the two supernova pseudohaps was a bad idea. Using longer ONT reads did improve the Flye assembly.

Lets merge the Nui1 pseudohap with the new Flye assembly and stop there.

In [17]:
module load conda
conda activate hraijc_quickmerge
source /workspace/hraijc/git_clones/quickmerge/.quickmergerc

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [18]:
mkdir ${WKDIR}/quickmerge7
cd ${WKDIR}/quickmerge7

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [19]:
bsub -J quickmerge7 -o ${WKDIR}/log/quickmerge7.log -e ${WKDIR}/log/quickmerge7.err \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py ${WKDIR}/Supernova_Nui.1_min2kb.fasta /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta -l 212967"

Job <643194> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [20]:
mkdir ${WKDIR}/quickmerge8
cd ${WKDIR}/quickmerge8

(/workspace/appscratch/miniconda/hraijc_quickmerge) (/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [21]:
bsub -J quickmerge8 -o ${WKDIR}/log/quickmerge8.log -e ${WKDIR}/log/quickmerge8.err \
"/workspace/hraijc/git_clones/quickmerge/merge_wrapper.py /workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta ${WKDIR}/Supernova_Nui.1_min2kb.fasta -l 8705"

Job <643195> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_quickmerge) 

: 1

In [22]:
conda deactivate 
module unload conda

In [2]:
mv ${WKDIR}/quickmerge7/merged_out.fasta ${WKDIR}/quickmerge7/Nui_quickmerge7.fasta
mv ${WKDIR}/quickmerge8/merged_out.fasta ${WKDIR}/quickmerge8/Nui_quickmerge8.fasta

In [5]:
module load BBMap/38.33

In [6]:
bsub -J bstat7 -o ${WKDIR}/log/bstat7.log -e ${WKDIR}/log/bstat7.err \
"statswrapper.sh in=${WKDIR}/quickmerge7/Nui_quickmerge7.fasta,${WKDIR}/quickmerge8/Nui_quickmerge8.fasta format=3"

Job <644354> is submitted to default queue <lowpriority>.


In [2]:
module unload BBMap

In [5]:
pwd

/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly


In [7]:
module load BUSCO/v5.2.2

In [8]:
cd ${WKDIR}/quickmerge7
bsub -J busco4 -o ${WKDIR}/log/busco4.log -e ${WKDIR}/log/busco4.err -n 17 \
"busco -i Nui_quickmerge7.fasta -l eudicots -o Nui_quickmerge7_busco -m geno -c 16 --datasets_version odb10"
cd ${WKDIR}

Job <645674> is submitted to default queue <lowpriority>.


In [9]:
cd ${WKDIR}/quickmerge8
bsub -J busco5 -o ${WKDIR}/log/busco5.log -e ${WKDIR}/log/busco5.err -n 17 \
"busco -i Nui_quickmerge8.fasta -l eudicots -o Nui_quickmerge8_busco -m geno -c 16 --datasets_version odb10"
cd ${WKDIR}

Job <645675> is submitted to default queue <lowpriority>.


In [10]:
module unload BUSCO