Skip to content

StevenWingett/combinations

Repository files navigation

Modified HiCUP (Hi-C User Pipeline)


Overview

This is a modified version of HiCUP, a bioinformatics pipeline for processing Hi-C data. It has been designed especially for improving the di-tag yield when using 4-cutter restriction enzymes to create Hi-C junctions.

The rationale for this pipeline is that the standard view that sequenced read pairs contain just one Hi-C junction is an oversimplification, and in reality such read pairs may contain several Hi-C junctions. By identifying each of these putative components, the pipeline then generates all the combinations of interactions. For example, suppose a read pair contained DNA derived from genomic regions A, B and C; then this can be converted into the pairwise interactions: A-B, A-C and B-C.

Additional scripts

This modified version of HiCUP incorporates the following additional Perl scripts:

hicup_combiner:

This script cuts reads at the occurrence of Hi-C ligation junctions, and retains all the resulting "sub-reads". Hi-C sub-reads from the forward read are classified as F1, F2, ...; and from the reverse read are classified as R1, R2, ... Read pairs not containing the Hi-C ligation sequence are given the tag ORIGINAL. Please note that when generating sub-reads derived entirely from either the forward read or reverse read, then one of the sub-reads should be reverse-complemented when creating a new “reconstructed HiC read”. For example: F1, F2 will become F1-FRC2.

Tags using this naming system will be appended to the FASTQ read ID headers.

We can now identify Hi-C interaction “groups” in our reconstructed datasets (i.e. reconstructed Hi-C reads from the same group will all be derived from a single original read pair).

Reads will also have appended in the header the read number e.g. 7.F2-FRC3 will be an intra-read interaction in the 7th forward read of the FASTQ file, between sub-reads 2 and 3.

The script also uses R to generate graphs.

Note: the DpnII ligation junction is hard-coded into the script, irrespective of the digest file used.

hicup_allocater:

This Perl script allocates each read to the restriction fragment from which it was derived (using the mapping results). This additional information is incorporated to the read header.

hicup_prefilter:

Removes identical fragment-fragment interactions from the same “di-tag group” (generated from a conventional read-pair by hicup_combiner). It also removes novel intra-fragment interactions generated by hicup_combiner (but retains those that may have been generated as part of the conventional HiCUP pipeline).

The HiCUP master script runs the pipeline scripts in the following order:

  1. hicup_combiner
  2. hicup_truncater
  3. hicup_mapper
  4. hicup_allocater
  5. hicup_prefilter
  6. hicup_filter
  7. hicup_deduplicator

Usage notes

As for the standard pipeline, the HiCUP master script executes each step in turn. With this modified version of HiCUP, no HTML summary file is generated. However, a pipeline summary file named "hicup_combinations_pipeline_summary_report.txt" is generated.

When running the pipeline, create a configuration file and run HiCUP with the command:

hicup -c [configuration_file]

This is a development version of HiCUP and it may only be used for protocols in which DpnII was used to generate the Hi-C ligation junctions. Furthermore, only process one FASTQ file pair at a time and keep all the input and output separate in a single directory. Make sure the FASTQ input files are in your current working directory and let HiCUP write the output to your current working directory (this is the HiCUP default i.e. do not specify the --outdir option).

The original HiCUP pipeline homepage is on the Babraham Bioinformatics website.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors