# HiC scaffolding strategy (Ali's HiCtool) on bilberry contigs

## Background

* ALLHIC did not produce a reasonable chr-level assembly of bilberry genome. We have found that bilberry contigs have a very good synteny with blueberry genome and this syntenic result can be used for chr-level assembly. We decide to generate a synteny-based bilberry pseudochromosomes, then use HiC data to correct if any of the links looks like not being supported by synteny. Ali's HiCtool is going to be used for correction based on HiC links.
* The HiCtool is still under development from Ali and it is currently in the test.

## 1. Generating import files for Ali's HiCtool

In [1]:
WORKDIR=/workspace/hraczw/github/GA/Bilberry_genome/012.HiC.ALLHIC

### 1.1 produce HiC contact map file 

Bam file will be firstly extracted and converted to qualFilt.txt file and then it was summed up using R script and saved to .rds compressed file

In [2]:
module load samtools
module list

Currently Loaded Modulefiles:
  1) powerPlant/core    4) git/2.21.0         7) asub/2.1
  2) texlive/20151117   5) perlbrew/0.76      8) samtools/1.9
  3) pandoc/1.19.2      6) perl/5.28.0


In [None]:
# generating contact file

command="samtools view $WORKDIR/mapped_shasta_i1.REduced.paired_only.bam | \
/workspace/hrpazs/bilberry_genome/filter_reads.awk -v isize=0 q=0 | \
/workspace/hrpazs/software/dryhic/inst/src/reads_to_bins_whole.awk -v w=1000 > \
$WORKDIR/mapped_shasta_i1_contact_map_1k_qualFilt.txt"
bsub -J convert -o $WORKDIR/shasta_i1_convert_to_contactMap.out -e $WORKDIR/shasta_i1_convert_to_contactMap.err $command

In [3]:
# generating contact file

command="samtools view $WORKDIR/mapped_shasta_i3.REduced.paired_only.bam | \
/workspace/hrpazs/bilberry_genome/filter_reads.awk -v isize=0 q=0 | \
/workspace/hrpazs/software/dryhic/inst/src/reads_to_bins_whole.awk -v w=1000 > \
$WORKDIR/mapped_shasta_i3_contact_map_1k_qualFilt.txt"
bsub -J convert -o $WORKDIR/shasta_i3_convert_to_contactMap.out -e $WORKDIR/shasta_i3_convert_to_contactMap.err $command

Job <249447> is submitted to default queue <lowpriority>.


In [4]:
# generating contact file

command="samtools view $WORKDIR/mapped_flye_i3.REduced.paired_only.bam | \
/workspace/hrpazs/bilberry_genome/filter_reads.awk -v isize=0 q=0 | \
/workspace/hrpazs/software/dryhic/inst/src/reads_to_bins_whole.awk -v w=1000 > \
$WORKDIR/mapped_flye_i3_contact_map_1k_qualFilt.txt"
bsub -J convert -o $WORKDIR/flye_i3_convert_to_contactMap.out -e $WORKDIR/flye_i3_convert_to_contactMap.err $command

Job <249450> is submitted to default queue <lowpriority>.


In [None]:
# this is R command to sum the above result up
# create a script.R file containing the commands below

library(dplyr)
library(data.table)
mapped_shasta_i1_contact_map_1k_qualFilt <- fread("/workspace/hraczw/github/GA/Bilberry_genome/012.HiC.ALLHIC/mapped_shasta_i1_contact_map_1k_qualFilt.txt", nThread = 1, sep = "\t") %>% .[, .(n = sum(V5)), by = .(V1, V2, V3, V4)] %>% setnames(., c("V1", "V2","V3", "V4"), c("rname", "pos", "mrnm", "mpos")) %>% filter(n > 1)
saveRDS(mapped_shasta_i1_contact_map_1k_qualFilt, file = "mapped_shasta_i1_contact_map_1k_qualFilt.rds")

# then load python3 and run the shell command

bsub -J convert -o script.out -e script.err "Rscript script.R"

# this will produce a rds compressed file as import for contact map, there is a limitation of file size for uploading.

### 1.2 sequence length file

In [None]:
# generating sequence length file from .fai

python << EOF

f = open('Shasta_racon_i1_pilon-2.noBacteria.ml1000.fasta.fai','r')
f1 = open('Shasta_racon_i1_pilon-2.noBacteria.ml1000.fasta.csv','w')

f1.write('"rname","len"\n')
for line in f:
	seqID = line.split('\t')[0]
	seqLen = line.split('\t')[1]
	f1.write('"' + seqID + '",' + seqLen + '\n')

f.close()

EOF

## 2. open HiCtool for correcting based on heatmap

Download the two files generated above to local and import to the rool

In [None]:
#open the server file from Rstudio:
/workspace/hrpazs/petunia_genome/HiC/HicTool/server.R

# then run the R script to open the URL