Skip to content

baudrly/yahcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

yahcp (Yet Another Hi-C Pipeline)

Usage:

./yahcp.sh -1 reads_forward.fastq -2 reads_reverse.fastq -f genome.fa [-s size] [-o output_directory] [-e enzyme] [-q quality_min] [--duplicates] [--clean-up]

Generate a sparse, GRAAL-compatible contact map from paired-end reads and a reference genome. The map can also be easily visualized with HiC-Box. Information about fragments and contigs/chromosomes are stored in separate files. The genome can either be partitioned by restriction fragments (specifying the enzyme) or fixed size chunks (specifying a number).

Requires bowtie2, samtools, bedtools and Python (with Biopython installed) to run. Optionally, minimap2 can be used instead of bowtie2 if specified.

Parameters:

-1 or --forward: Forward FASTQ reads
-2 or --reverse: Reverse FASTQ reads
-f or --fasta: Reference genome to map against in FASTA format
-o or --output: Output directory. Defaults to the current directory.
-e or --enzyme: Restriction enzyme if a string, or chunk size (i.e. resolution) if a number. Defaults to 5000 bp chunks.
-q or --quality-min: Minimum mapping quality for selecting contacts. Defaults to 30.
-d or --duplicates: If enabled, removes adapters and PCR duplicates prior to mapping. Not enabled by default.
-s or --size: Minimum size threshold to consider contigs. Defaults to 0 (keep all contigs).
-c or --clean-up: If enabled, removes intermediary BED files after generating the contact map. Enabled by default.
-t or --threads: Number of threads to use for the aligner and samtools. Defaults to 1.
-m or --minimap: Use the minimap2 aligner instead of bowtie2. Not enabled by default.
-T or --tmp: Directory for storing intermediary BED files and temporary sort files. Defaults to the output directory.
-h or --help: Display this help message

The expected files in the output directory will take the form:

  • abs_fragments_contacts_weighted.txt: the sparse contact map
  • fragments_list.txt: information about restriction fragments (or chunks)
  • info_contigs.txt: information about contigs or chromosomes

Please be aware that intermediary BED files can be quite bulky. You may use the --tmp option to store all such files on a different location (e.g. a scratch partition if running the pipeline on a computer cluster).

About

Yet Another Hi-C Pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published