# HiC scaffolding using Ali's tool - Gillenia

## Background and aim

ALLHIC did not produce a reasonable chr-level genome assembly from Gillenia scaffolds. We are looking at whether Ali's tool can be used to post-scaffold assemblies. A 10 rounds of LINKS scaffolds have been found with quite a few mis-scaffolds, thus a test on reduced rounds of scaffolds is required. The selection of i1, i4, i6, i8 and SLR assemblies are chosen to look at mis-scaffolding issues. All these scaffold sets have been gap-filled with raw long reads and removed bacteria sequences.

## 1. generating input files

In [1]:
WORKDIR=/workspace/hraczw/github/GA/Gillenia_genome/008.Allhic.scaffolding/Ali_hictool/scaffolds_ragoo

In [2]:
ASSEMBLY_RAGOO=/powerplant/workspace/hraczw/github/GA/Gillenia_genome/005.GapFilling/scaff_ragoo_gapfilled_noContamination.fasta
ASSEMBLY_RAGOO_ML10K=/powerplant/workspace/hraczw/github/GA/Gillenia_genome/005.GapFilling/scaff_ragoo_gapfilled_noContamination.ml10000.fasta

In [3]:
ASSEMBLY_RAGOO_MAP=/workspace/hraczw/github/GA/Gillenia_genome/008.Allhic.scaffolding/mapped_ragoo.REduced.paired_only.bam
ASSEMBLY_RAGOO_ML10K_MAP=/workspace/hraczw/github/GA/Gillenia_genome/015.salsa2/pairs_updated_DuplicatesRemoved.bam

In [1]:
ASSEMBLY_RAGOO_MAP_PLUSTELO=/workspace/hraczw/github/GA/Gillenia_genome/008.Allhic.scaffolding/mapped_ragoo_plustelo.REduced.paired_only.bam

In [4]:
CHR=/workspace/hraczw/github/GA/Gillenia_genome/008.Allhic.scaffolding/mapped_ragoo_chrs.REduced.paired_only.bam

In [3]:
module load samtools
module list

Currently Loaded Modulefiles:
  1) powerPlant/core    4) git/2.21.0         7) asub/2.1
  2) texlive/20151117   5) perlbrew/0.76      8) samtools/1.9
  3) pandoc/1.19.2      6) perl/5.28.0


In [5]:
# generating contact file

command="samtools view $CHR | \
/workspace/hrpazs/bilberry_genome/filter_reads.awk -v isize=0 q=0 | \
/workspace/hrpazs/software/dryhic/inst/src/reads_to_bins_whole.awk -v w=100 > \
$WORKDIR/chr_contact_map_100_qualFilt.txt"
bsub -J convert -o $WORKDIR/chr_convert_to_contactMap100.out -e $WORKDIR/chr_convert_to_contactMap100.err $command




Job <638869> is submitted to default queue <lowpriority>.


In [5]:
# generating contact file

command="samtools view $ASSEMBLY_RAGOO_MAP | \
/workspace/hrpazs/bilberry_genome/filter_reads.awk -v isize=0 q=0 | \
/workspace/hrpazs/software/dryhic/inst/src/reads_to_bins_whole.awk -v w=100 > \
$WORKDIR/mapped_ragoo_contact_map_100_qualFilt.txt"
bsub -J convert -o $WORKDIR/mapped_ragoo_convert_to_contactMap_100.out -e $WORKDIR/mapped_ragoo_convert_to_contactMap_100.err $command

Job <269674> is submitted to default queue <lowpriority>.


In [None]:
# this is R command to sum the above result up

Gillenia_contact.map_ontigs_filt  <- fread("/Volumes/workspace/hrpazs/Gillenia_genome/Gillenia_HiC.contact.mat.1kb.filt.txt", nThread = 5, sep = "\t") %>%
  .[, .(n = sum(V5)), by = .(V1, V2, V3, V4)] %>%
  setnames(., c("V1", "V2","V3", "V4"), c("rname", "pos", "mrnm", "mpos")) %>%
  filter(n > 1)

# this has been put in to a Rscript on Pp:
# /workspace/hraczw/github/GA/Gillenia_genome/008.Allhic.scaffolding/Ali_hictool/scaffolds_ragoo/script.R
# need to modify and then generate .rds file

In [None]:
# generating sequence length file



In [None]:
#open the server file from Rstudio:
/workspace/hrpazs/petunia_genome/HiC/HicTool/server.R

# import files

## 2. scaffolding using hictool from Ali

## 3. generating fasta linkage group file

* script obtained from here, modified by Ali based on LECHESIS script: http://aklisl11.pfr.co.nz/workspace/hrpazs/bilberry_genome/CreateScaffoldedFasta.pl; usage: ./CreateScaffoldedFasta.pl $assembly_fasta, $input_dir, $output_dir, $output_fasta_name
* convert input grouping & orientating files into this format: http://aklisl11.pfr.co.nz/workspace/hrpazs/bilberry_genome/groups/

In [None]:
# script to convert scaffold grouping and orientating files:

f = open('Chr9_2.txt','r')
f1 = open('Chr9_2_converted.txt','w')

for line in f:
    item = line.strip()
    strand = item[0]
    id = item.split(strand)[1]
    if strand == '+':
        f1.write(id + '\t1\t.\t.\n')
    else:
        f1.write(id + '\t0\t.\t.\n')

f.close()
f1.close()

In [None]:
# script to construct chr0

f = open('Chr0.txt','r')
f1 = open('Chr0_converted.txt','w')

for line in f:
    id = line.strip()
    f1.write(id + '\t1\t.\t.\n')


f.close()
f1.close()