## Data sources

- HiC Data for Nui is here:
    - /input/genomic/plant/Vaccinium/corymbosum/AGRF_CAGRF21434_HJWHFDRXX


- 10X data for Nui and M7 here:
    - /input/genomic/plant/Vaccinium/corymbosum/AGRF_CAGRF18813_H7JY3DRXX


- ONT PromethION Nui (BB2020 and BB2020-2 are the same sample) here:
    - /input/genomic/plant/Vaccinium/corymbosum/Blueberry_PromethION_Apr2020


- ONT MinION Nui (BB2020) here:
    - /input/genomic/plant/Vaccinium/corymbosum/CAGRF21436/20200224_MinION/AGRF_CAGRFF21436_FAL87845_BB2020/


- 10X Supernova Assembly for 10X data here:
    - /output/genomic/plant/Vaccinium/corymbosum/2021_GenomeAssembly/Nui/01_Supernova

### Plan 
- base-calling for ONT samples using Guppy v5.
- Filter out MinION reads <1kb. Or higher...
- Cecilia has done the Supernova assembly for the 10X data.
- Use Flye to assemble ONT fastq
- Use quickmerge to merge the Supernova contigs + ONT contigs
- Use Salsa to improve assembly
- Tetraploid Haplotyping and gene annotation etc. 



**See 01_basecalling_ONT.ipyn for ONT steps**

**See 02_flye.ipyn for ONT Assembly**

**See 03_CombineAssemblies.ipyn for quickmerge steps**


## SALSA

### Mapping HiC Reads to the hybrid Assembly.

In [5]:
ASSEMBLY=/workspace/hraijc/BB_Nui_Assembly/ONT_Assemly/FLYE04/Flye04_assembly.fasta
WKDIR=/workspace/hraijc/BB_Nui_Assembly/Salsa
SALSA_OUTDIR=/workspace/hraijc/BB_Nui_Assembly/Salsa/Nui_SALSA_02
HiC_bam=/workspace/hraijc/BB_Nui_Assembly/Hi-C_mapping_flye4/flye4_HiC_dedup.bam
HiC_bed=${WKDIR}/flye4_HiC_dedup.bed
TEMPDIR=${WKDIR}/temp

In [18]:
mkdir ${SALSA_OUTDIR}
mkdir log

mkdir: cannot create directory ‘/workspace/hraijc/BB_Nui_Assembly/Salsa/Nui_SALSA_02’: File exists
mkdir: cannot create directory ‘log’: File exists


: 1

In [19]:
ml load bedtools/2.30.0

In [20]:
#Convert to bed for SALSA.
cd ${WKDIR}
bsub -J bamToBed2 -o ${WKDIR}/log/bamToBed2.log -e ${WKDIR}/log/bamToBed2.err -n 2 \
"bamToBed -i ${HiC_bam} > ${HiC_bed}"


Job <709739> is submitted to default queue <lowpriority>.


In [22]:
ml unload bedtools

In [23]:
#Loading salsa/v2.3
ml load pfr-python2


In [24]:
bsub -J salsa01 -o ${WKDIR}/log/salsa01.log -e ${WKDIR}/log/salsa01.err -n 2 -R rusage[mem=32000] \
"python2 /workspace/hraijc/git_clones/SALSA/run_pipeline.py -b ${HiC_bed} -a ${ASSEMBLY} -l ${ASSEMBLY}.fai -o ${SALSA_OUTDIR} -e DNASE -m yes -g contigs_graph.gfa -i 3"


Job <709740> is submitted to default queue <lowpriority>.


In [2]:
#rename and moveoutput file
bsub -I "cp /workspace/hraijc/BB_Nui_Assembly/Salsa/Nui_SALSA_02/scaffolds_FINAL.fasta /workspace/hraijc/BB_Nui_Assembly/Salsa/Nui_Salsa02.fasta"

Job <709893> is submitted to default queue <lowpriority>.
<<Waiting for dispatch ...>>
<<Starting on aklppg34>>


In [2]:
module load BBMap/38.33

In [7]:
bsub -J bstat1 -o ${WKDIR}/log/bstat1.log -e ${WKDIR}/log/bstat1.err \
"statswrapper.sh in=/workspace/hraijc/BB_Nui_Assembly/Salsa/Nui_Salsa02_min100kb.fasta format=6"

Job <710490> is submitted to default queue <lowpriority>.


In [None]:
##n_scaffolds	n_contigs	scaf_bp	contig_bp	gap_pct	scaf_N50	scaf_L50	ctg_N50	ctg_L50	scaf_N90	scaf_L90	ctg_N90	ctg_L90	scaf_max	ctg_max	scaf_n_gt50K	scaf_pct_gt50K	gc_avg	gc_std	filename
##13992	18982	1804569893	1801758083	0.156	1820	302070	2303	212633	6336	72535	9142	47021	2121296	2121296	7362	93.446	0.38685	0.04052	Nui_Salsa01
##14416	18956	1806273865	1804003865	0.126	1908	287852	2304	212633	6633	69878	9146	47182	2121296	2121296	7615	93.226	0.38687	0.03994	Nui_Salsa02

In [9]:
module load BUSCO/v5.2.2
bsub -J buscosalsa2_min100kb -o ${WKDIR}/log/buscosalsa2_min100kb.log -e ${WKDIR}/log/buscosalsa2_min100kb.err -n 25 \
"busco -i /workspace/hraijc/BB_Nui_Assembly/Salsa/Nui_Salsa02_min100kb.fasta -l eudicots -o buscosalsa2_min100kb_busco -m geno -c 24 --datasets_version odb10"

Job <710492> is submitted to default queue <lowpriority>.


In [4]:
### NEED TO CHANGE BELOW THIS LINE!!!!
###
###
###
###
###
###
###
###
###
###
###
###

## Make Hic assembly and contact file to viz with JuiceBox
### Assembly file generation

In [5]:
ml load pfr-python2/2.7.13

In [6]:
cd /workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_04

In [43]:
bsub -J juiceboxt1a -o ${WKDIR}/log/juiceboxt1a.log -e ${WKDIR}/log/juiceboxt1a.err -n 2 \
"python2 /powerplant/workspace/hraijc/git_clones/juicebox_scripts/juicebox_scripts/makeAgpFromFasta.py scaffolds_FINAL.fasta scaffolds_FINAL.agp"


Job <627790> is submitted to default queue <lowpriority>.


In [44]:
bsub -J juiceboxt1b -o ${WKDIR}/log/juiceboxt1b.log -e ${WKDIR}/log/juiceboxt1b.err -n 2 \
"python2 /powerplant/workspace/hraijc/git_clones/juicebox_scripts/juicebox_scripts/agp2assembly.py scaffolds_FINAL.agp scaffolds_FINAL.assembly"

Job <627828> is submitted to default queue <lowpriority>.


### Contact map generation
#### Map hic reads to new assembly.

In [2]:
NEWASSEMBLY=/workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_04/scaffolds_FINAL.fasta
HiC_RAW=/input/genomic/plant/Rubus/idaeus/Wakefield_genome/HIC


In [3]:
mkdir /workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_05_mapping
cd /workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_05_mapping

mkdir: cannot create directory ‘/workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_05_mapping’: File exists


In [6]:
module load bwa/0.7.17


In [41]:
#Index Salsa assembly
bsub -J bwain -o ${WKDIR}/log/bwain.log -e ${WKDIR}/log/bwain.err -n 1 \
"bwa index ${NEWASSEMBLY}"

Job <475683> is submitted to default queue <lowpriority>.


In [7]:
#Map reads to salsa assembly
bsub -J rasbwamem2 -o ${WKDIR}/log/rasbwamem5.log -e ${WKDIR}/log/rasbwamem5.err -n 25 \
"bwa mem -5SP -t24 ${NEWASSEMBLY} ${HiC_RAW}/RI_Hi-C_S2_R1_001.fastq.gz ${HiC_RAW}/RI_Hi-C_S2_R2_001.fastq.gz -o r4_alignedHiC.sam" 


Job <627138> is submitted to default queue <lowpriority>.


In [10]:
module unload bwa
module load samtools/0.1.19

In [11]:
# flag PCR duplicate reads.
bsub -J samblaster5 -o ${WKDIR}/log/samblaster5.log -e ${WKDIR}/log/samblaster5.err -n 2 \
"/workspace/hraijc/git_clones/samblaster/samblaster -i r4_alignedHiC.sam -o predupe.sam"

Job <628106> is submitted to default queue <lowpriority>.


In [12]:
# Remove PCR duplicate Reads.
bsub -J pcrfilt5 -o ${WKDIR}/log/pcrfilt5.log -e ${WKDIR}/log/pcrfilt5.err -n 9 \
"samtools view -@ 8 -S -h -b -F 2316 -o r5_clean_HiC.bam predupe.sam"

Job <628147> is submitted to default queue <lowpriority>.


#### Copy assembly files

In [4]:
cp /workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_04/scaffolds_FINAL* .

#### Make contact map

In [5]:
#Make links file
bsub -J matlock5 -o ${WKDIR}/log/matlock5.log -e ${WKDIR}/log/matlock5.err -n 2 \
"/powerplant/workspace/hraijc/git_clones/matlock/bin/matlock bam2 juicer r5_clean_HiC.bam r5_clean_HiC.links.txt"

Job <628591> is submitted to default queue <lowpriority>.


In [10]:
#Sort links file
bsub -J linksort5 -o ${WKDIR}/log/linksort5.log -e ${WKDIR}/log/linksort5.err -n 2 \
"sort -k2,2 -k6,6 r5_clean_HiC.links.txt > r5_clean_HiC.sorted.links.txt"

(/workspace/appscratch/miniconda/hraijc_hic_qc2) Job <628686> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_hic_qc2) 

: 1

In [18]:
#Make Juicer file
bsub -J 3ddna5 -o ${WKDIR}/log/3ddna5.log -e ${WKDIR}/log/3ddna5.err -n 2 \
"/powerplant/workspace/hraijc/git_clones/matlock/3d-dna/visualize/run-assembly-visualizer.sh -p false scaffolds_FINAL.assembly r5_clean_HiC.sorted.links.txt"

Job <628709> is submitted to default queue <lowpriority>.


### hic_qc of salsa assembly

In [16]:
module load conda

In [17]:
conda activate hraijc_hic_qc2

(/workspace/appscratch/miniconda/hraijc_hic_qc2) 

: 1

In [8]:
mkdir hicqc_1M_rasp_salsa_05

(/workspace/appscratch/miniconda/hraijc_hic_qc2) 

: 1

In [9]:
bsub -J hic_qc2M5 -o ${WKDIR}/log/hic_qc2M5.log -e ${WKDIR}/log/hic_qc2M5.err -n 1 \
"python /workspace/hraijc/git_clones/hic_qc/hic_qc.py -n 10000000 -b r5_clean_HiC.bam -o hicqc_1M_rasp_salsa_05"

Job <628592> is submitted to default queue <lowpriority>.
(/workspace/appscratch/miniconda/hraijc_hic_qc2) 

: 1

In [11]:
conda deactivate
module unload conda

In [20]:
module load samtools

In [22]:
cd /workspace/hraijc/Raspberry/202110_OmniC/rasp_SALSA_05_mapping/

In [24]:
bsub -J flag1 -o ${WKDIR}/log/flag1.log -e ${WKDIR}/log/flag1.err -n 11 \
"samtools flagstat -@ 10 r4_alignedHiC.sam"

Job <630306> is submitted to default queue <lowpriority>.


In [25]:
bsub -J flag2 -o ${WKDIR}/log/flag2.log -e ${WKDIR}/log/flag2.err -n 11 \
"samtools flagstat -@ 10 r5_clean_HiC.bam"

Job <630307> is submitted to default queue <lowpriority>.


In [33]:
bsub -J flag3 -o ${WKDIR}/log/flag3.log -e ${WKDIR}/log/flag3.err -n 11 \
"samtools flagstat -@ 10 predupe.sam"

Job <630309> is submitted to default queue <lowpriority>.


In [31]:
echo r4_alignedHiC.sam
head -n 15 ${WKDIR}/log/flag1.log


r4_alignedHiC.sam
1093604787 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
78185565 + 0 supplementary
0 + 0 duplicates
1057055859 + 0 mapped (96.66% : N/A)
1015419222 + 0 paired in sequencing
507709611 + 0 read1
507709611 + 0 read2
0 + 0 properly paired (0.00% : N/A)
970621286 + 0 with itself and mate mapped
8249008 + 0 singletons (0.81% : N/A)
371470024 + 0 with mate mapped to a different chr
88812858 + 0 with mate mapped to a different chr (mapQ>=5)

------------------------------------------------------------


In [32]:
echo r5_clean_HiC.bam
head -n 15 ${WKDIR}/log/flag2.log


r5_clean_HiC.bam
970621286 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
860232586 + 0 duplicates
970621286 + 0 mapped (100.00% : N/A)
970621286 + 0 paired in sequencing
485310643 + 0 read1
485310643 + 0 read2
0 + 0 properly paired (0.00% : N/A)
970621286 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
371470024 + 0 with mate mapped to a different chr
88812858 + 0 with mate mapped to a different chr (mapQ>=5)

------------------------------------------------------------


In [37]:
echo predupe.sam
head -n 15 ${WKDIR}/log/flag3.log

predupe.sam
1093604787 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
78185565 + 0 supplementary
939833707 + 0 duplicates
1057055859 + 0 mapped (96.66% : N/A)
1015419222 + 0 paired in sequencing
507709611 + 0 read1
507709611 + 0 read2
0 + 0 properly paired (0.00% : N/A)
970621286 + 0 with itself and mate mapped
8249008 + 0 singletons (0.81% : N/A)
371470024 + 0 with mate mapped to a different chr
88812858 + 0 with mate mapped to a different chr (mapQ>=5)

------------------------------------------------------------


## Check to see how the Flye assembly compares to the 10X

In [1]:
WKDIR=/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly


In [3]:
mkdir $WKDIR
mkdir ${WKDIR}/log


mkdir: cannot create directory ‘/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly’: File exists


In [2]:
cd $WKDIR

#### Did they improve?

In [5]:
module load BBMap/38.33

In [6]:
cd ${WKDIR}/Assembly_comparison

In [7]:
bsub -J bstat2 -o ${WKDIR}/log/bstat2.log -e ${WKDIR}/log/bstat2.err \
"statswrapper.sh in=Supernova_Nui.1.fasta,Supernova_Nui.1.min2kb.fasta,Flye03_assembly.fasta,Nui_quickmerge1.fasta,Nui_quickmerge2.fasta format=3"

Job <633822> is submitted to default queue <lowpriority>.


In [8]:
module unload BBMap/38.33

|n_scaffolds|n_contigs|scaf_bp|contig_bp|gap_pct|scaf_N50|scaf_L50|ctg_N50|ctg_L50|scaf_N90|scaf_L90|ctg_N90|ctg_L90|scaf_max|ctg_max|scaf_n_gt50K|scaf_pct_gt50K|gc_avg|gc_std|filename|
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
294493|327732|890445603|879628983|1.215|31787|6211|40402|5258|172322|1055|196312|989|487194|127919|843|8.239|0.38910|0.04736|/powerplant/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Supernova_Nui.1.fasta
105550|132451|706271084|695673804|1.500|19187|8705|25361|7191|76634|2942|91036|2603|487194|127919|843|10.387|0.39126|0.03921|/powerplant/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Supernova_Nui.1.min2kb.fasta
22050|22050|1804096927|1804096927|0.000|2301|212778|2301|212778|9403|43723|9403|43723|1907487|1907487|8662|88.079|0.38702|0.03932|/powerplant/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Flye03_assembly.fasta
21859|22159|1802844220|1802206750|0.035|2285|213713|2297|212521|9343|43890|9424|43327|1907487|1907487|8628|88.140|0.38698|0.03928|/powerplant/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Nui_quickmerge1.fasta
62183|77870|1264110310|1261613670|0.198|1224|317351|1228|316036|23729|5260|27929|4553|1942722|1942722|2508|72.941|0.38879|0.03949|/powerplant/workspace/hraijc/BB_Nui_Assembly/Hybrid_assembly/Assembly_comparison/Nui_quickmerge2.fasta
