Skip to content

Latest commit

 

History

History
62 lines (48 loc) · 2.93 KB

README.md

File metadata and controls

62 lines (48 loc) · 2.93 KB

Scaffolding

We aligned the Omni-C data to both assemblies following the [Arima Genomics Mapping Pipeline] (https://github.com/ArimaGenomics/mapping_pipeline) and then scaffold both assemblies with SALSA.

Assumptions

  • You have been able to run the Arima Genomics Mapping Pipeline on each of the assemblies and you have generated the final deduplicated and sorted BAM file (sufix of this file is *.d.s.bam)
  • There is a directory hierarchy starting at WD. Where:
    • WD
      • asm: Assemblies
      • aln: Alignments

Requirements

Input

  • BAM file (sufix of this file is *.d.s.bam)
  • Genome assembly file (FASTA)

Software

Code

REF=assembly.fasta
conda activate salsa
mkdir $WD/scaffolding

# Prepping alignments for SALSA
REF="$WD/asm/assembly.fasta"
bamToBed -i ${WD}/aln/assembly.omnic.d.s.bam \
    > ${WD}/scaffolding/assembly.omnic.bed &&
sort -k 4 ${WD}/scaffolding/assembly.omnic.bed \
    > ${WD}/scaffolding/assembly.omnic_tmp.bed &&
mv ${WD}/scaffolding/assembly.omnic_tmp.bed ${WD}/scaffolding/assembly.omnic.bed &&
samtools faidx $REF &&
python /usr/local/src/SALSA/run_pipeline.py -a $REF \
    -l $REF.fai \
    -b $WD/scaffolding/${REFNAME}.${VERSION}.${ASM}.${DATA}.bed -e DNASE \
    -o $WD/scaffolding/salsa_${REFNAME}_${VERSION}_${ASM} \
    -i 20 -p yes \
    &> $WD/scaffolding/salsa_assembly.log &&
cp $WD/scaffolding/salsa_assembly/scaffolds_FINAL.fasta \
    $WD/asm/new.assembly.fasta &
conda activate salsa

Cite