Skip to content

Khrameeva-Lab/contact-hunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Issues GPLv3 license


Logo

contact-hunter

Explore interactions of genomic regions with contact-hunter!

Table of Contents

About


There are many methods to investigate significant Hi-C contacts established between a particular genomic region and its neighborhood within some range of distances. One popular method was introduced by H. Won in 2016 (https://doi.org/10.1038/nature19847). Here we present a handy tool, applying this method (with minor technical differences). It allows user to obtain meaningful contacts from Hi-C map for a predefined list of genomic coordinates corresponding to SNPs, TSSs or any other features.

The package was developed to detect significant contacts from a human Hi-C data. It has not been tested on another species.

Getting Started

Installation

Requirements

  • python 3.6+

Install from PyPI using pip.

 pip install contact-hunter

(back to top)

Usage

Input data

  • Hi-C map in .cool format
  • Tab-delimited file for genomic features to be explored. Should not contain header, 2 columns are expected: chromosome, start
  • Tab-delimited file for background. Should not contain header, 2 columns are expected: chromosome, start

The file with background can be generated based on the data you are exploring. For example, if you are going to find contacts for a list of specific SNPs it is reasonable to use a list with all the rest SNPs from the relevant GWAS study as a background. For a set of differentially expressing genes, all other TSSs can be a background. More details on background can be found in methods section https://doi.org/10.1038/nature19847.

(back to top)

Use as a command line tool

run in terminal

 contact_hunter COOL_PATH   LOCUS_BACKGROUND   LOCUS_TEST   RESOLUTION   DISTANCE   RESULTS_FILE

type contact_hunter -h in terminal to view all the parameters

Use as a python module

import module

 import contact_hunter

use get_contacts function

   contact_hunter.get_contacts(cool,background_locus,tested_locus,resolution,distance)

type help(contact_hunter.get_contacts) or ?contact_hunter.get_contacts in jupyter notebook to view all the parameters

(back to top)

Output

The tool returns table with 5 columns:

  • chr - chromosome
  • bin_start - start of target bins (algorithm detects significant interactions between these and surrounding bins
  • list_of_loci - list with the precise coordinates of features of interes (SNPs, TSSs, etc), falling to the bins
  • interacting_locus_coord - start of significantly interacting bins
  • pval - p-value

Using the CLI version, you get a file with the table described above, since the output file name is a required argument.

When used as a python module, the get_contucts function returns a table, but no output file is created.

(back to top)

Parameters description

Average heatmap generation

The tool has been tested on the human data, the goal was to detect genomic regions interacting significantly with the list of target SNPs or a gene set TSSs. One can use the tool to explore contacts in another species with another features (for example, to get contacts for a particular set of ATAC-seq peaks). In this case, the generation of an average heatmap is recommended. The heatmap can be easily obtained with the usage of specific option. In addition to basic output, it yields an average heatmap around significant contacts which allows to estimate roughly the performance of the tool on user's specific data. The clear enrichment in the central pixel is a good sign! :)

  • add --avr_heatmap to command when using CLI version
  • specify plot_generate=True when using as a python module

Resolution

One of the important issues is the Hi-C data resolution. Everybody strives to set as small a bin size as possible for Hi-C data, this strategy helps to more accurately annotate the resulting contacts in the subsequent analysis. But, unfortunately, using the sparse data is not appropriate here. The only thing user should rely on is the Hi-C map quality.

Distance

In accordance with the initial paper https://doi.org/10.1038/nature19847, an appropriate distance constraining the field of contacts search is ±5 Mb for the human data.

FDR

The algorithm implementation includes significant contacts selection by fdr. The default fdr value is 0.01. There is a column p-val in output table. These are p-values of contacts that survived the correction. Importantly, if user plans to select contacts by p-value (e.g. to consider only contacts with the lowest p-value), then this selection should be done separately for each chromosome: a single threshold should not be set. This recommendation is due to the fact that each chromosome is considered separately in the algorithm and the critical values are calculated individually.

License

Distributed under the GPLv3 License. See LICENSE for more information.

(back to top)

Contact

Anna Kononkova - a.kononkova@yandex.ru

Project Link: https://github.com/Khrameeva-Lab/contact-hunter

(back to top)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages