# Ran GenomAsm4pg
- GenomAsm4pg was mainly developed by Ludovic Duvaux and Sukanya Denni from INRAE institute from France.
- Please cite the original gitlab repo from here if you want to publish any results generated from this pipeline: https://forgemia.inra.fr/asm4pg/GenomAsm4pg
- PFR contributors for polishing, debugging and testing: Ken Smith, Sarah Bailey & Chen Wu
- Other PFR contributors: Usman Rashid, Cecilia Deng, David Chagné & Susan Thomson

## GenomAsm4pg config

```
# absolute/relative path to your desired output path
root: /Repo/genome-assembly-pipeline/2023-10-17_blueberry_hybrid

####################### optional prejob - data preparation #######################
# path to tar data
data: test_data
# list of tar names
get_all_tar_filename: True
tarIDS: []

####################### job - workflow #######################
### CONFIG
get_all_filenames: False
IDS: ["classified_Nui_plus_unclassified", "classified_M7_plus_unclassified"]

classified_Nui_plus_unclassified:
  run: run001
  ploidy: 2
  busco_lineage: eudicots_odb10
  mode: default

classified_M7_plus_unclassified:
  run: run002
  ploidy: 2
  busco_lineage: eudicots_odb10
  mode: default

####################### workflow output directories #######################
# results directory
resdir: workflow_results

### PREJOB
# extracted raw data
rawdir: 00_raw_data
bamdir: 00_raw_data/bam_files
fastxdir: 00_raw_data/fastx_files

### JOB
# QC
qcdir: 01_raw_data_QC
fqc: 01_fastQC
lqc: 02_longQC
gentools: 03_genometools
kmer: 04_kmer

# assembly
assembdir: 02_genome_assembly
asm_raw: 01_raw_assembly
asm_purged: 02_after_purge_dups_assembly
asm_conta: 03_uncontaminated_assembly
asm: 00_assembly
asm_qc: 01_assembly_QC
```

# Contig construction
- ran hifiasm for the reads after classification using trio-binning

In [None]:
# quay.io/biocontainers/hifiasm:0.19.5--h43eeafb_2

hifiasm -l3 -o classified_M7_plus_unclassified -t 20 classified_M7_plus_unclassified.fasta.gz
hifiasm -l3 -o classified_Nui_plus_unclassified -t 20 classified_Nui_plus_unclassified.fasta.gz

# Purging haplotigs
- ran for classified_M7_plus_unclassified_hap1.fasta.gz, classified_M7_plus_unclassified_hap2.fasta.gz, classified_Nui_plus_unclassified_hap1.fasta.gz, classified_Nui_plus_unclassified_hap2.fasta.gz
- classified_M7_plus_unclassified_hap1.fasta.gz is shown here as an example

In [None]:
### to purge haplotigs in hifiasm assembly
# docker://registry.forgemia.inra.fr/asm4pg/genomasm4pg/purge_dups1.2.5
indir=classified_M7_plus_unclassified/02_genome_assembly/01_raw_assembly
outdir=classified_M7_plus_unclassified/02_genome_assembly/02_after_purge_dups_assembly/00_assembly/classified_M7_plus_unclassified_hap1

# coverage assessment
minimap2 -xasm20 -t 12 ${indir}/classified_M7_plus_unclassified_hap1.fa.gz classified_M7_plus_unclassified.fasta.gz | gzip -c - > ${outdir}/classified_M7_plus_unclassified_hap1.paf.gz 
pbcstat $outdir}/classified_M7_plus_unclassified_hap1.paf.gz -O ${outdir} 
calcuts ${outdir}/PB.stat > ${outdir}/cutoffs 2>${outdir}/calcuts.log

# split assembly & self-self alignment
split_fa ${indir}/classified_M7_plus_unclassified_hap1.fa.gz > ${outdir}/classified_M7_plus_unclassified_hap1.split
minimap2 -xasm5 -t 12 -DP ${outdir}/classified_M7_plus_unclassified_hap1.split ${outdir}/classified_M7_plus_unclassified_hap1.split | gzip -c - > ${outdir}/classified_M7_plus_unclassified_hap1.split.self.paf.gz
# purge haplotigs & overlaps
purge_dups -2 -T ${outdir}/cutoffs -c ${outdir}/PB.base.cov ${outdir}/classified_M7_plus_unclassified_hap1.split.self.paf.gz > ${outdir}/dups.bed 2> ${outdir}/purge_dups.log
# get purged primary and haplotig sequences from draft assembly
get_seqs -e ${outdir}/dups.bed ${indir}/classified_M7_plus_unclassified_hap1.fa.gz -p ${outdir}/classified_M7_plus_unclassified_hap1