Skip to content

A (continuously updated) collection of references to Hi-C data. Predominantly human/mouse Hi-C data, with replicates.

License

Notifications You must be signed in to change notification settings

Karenxzr/HiC_data

 
 

Repository files navigation

Hi-C data

MIT License PR's Welcome

A (continuously updated) collection of references to Hi-C data and papers. Predominantly human/mouse Hi-C data, with replicates. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Large collections

4D Nucleome

  • 4D Nucleome Data Portal - 3D genomics and microscopy data, uniformly processed, integrative visualization in HiGlass, comparative functionality. Browse by type (sequencing, microscopy) or publication. Data are in three tiers: Tier 1 (H1-ESC, GM12878, IMR90, HFF-hTERT (clone 6), and WTC-11), Tier 2 and untiered. Overview of first and second phases of the 4DN project. Other repositories that host Hi-C and similar datasets include the ENCODE portal, NCBI's GEO and EMBL-EBI’s ArrayExpress. 4D Nucleome Browser for integrative and multimodal data navigation.
    • Table 1 - Genomic assay types in the 4D Nucleome Data Portal. Chromatin conformation data (In situ, dilution Hi-C, Micro-C, DNase Hi-C, Hi-C 3.0, Capture Hi-C, TCC, single-cell variants, SPRITE, GAM), and related sequqncing data (ChIA-PET, ChIA-Drop, PLAC-seq, ChIP-seq, CUT&RUN, Repli-seq, MARGI (RNA-chromatin interactions), others).
    • High-resolution Hi-C datasets, over 1 billion read pairs. cooltoolsprocessing, .cool and .mcool formats, A/B compartments and TAD boundaries (insulation score) detected using domain calling pipelines.
    • Microscopy datasets - standard FISH (DNA or RNA), multi-loci FISH, high-throughput FISH, dynamic single particle tracking, ChromEMT, OptoDroplet.
    • Table 2 - All 4D Nucleome analysis pipelines, in CWL, WDL, available on Docker Hub. Alignment with BWA MEM with the -SP5M option. PairsQC - QC report for Hi-C pairs files. Hi-C processing pipeline.
    • 4DN Visualization Workspace Paper Reiff, S.B., Schroeder, A.J., Kırlı, K. et al. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat Commun 13, 2365 (02 May 2022). https://doi.org/10.1038/s41467-022-29697-4

Lieberman-Aiden lab

All HiC data released by Lieberman-Aiden group. Links to Amazon storage and GEO studies. http://aidenlab.org/data.html

  • Vian, Laura, Aleksandra Pękowska, Suhas S.P. Rao, Kyong-Rim Kieffer-Kwon, Seolkyoung Jung, Laura Baranello, Su-Chen Huang, et al. “The Energetics and Physiological Impact of Cohesin Extrusion.” Cell 173, no. 5 (May 2018) - Architectural stripes, created by extensive loading of cohesin near CTCF anchors, with Nipbl and Rad21 help. Little overlap between B cells and ESCs. Architectural stripes are sites for tumor-inducing TOP2beta DNA breaks. ATP is required for loop extrusion, cohesin translocation, but not required for maintenance, Replication of transcription is not important for loop extrusion. Zebra algorithm for detecting architectural stripes, image analysis, math in Methods. Human lymphoblastoid cells, mouse ESCs, mouse B-cells activated with LPS, CH12 B lymphoma cells, wild-type, treated with hydroxyurea (blocks DNA replication), flavopiridol (blocks transcription, PolII elongation), oligomycin (blocks ATP). Many other data types (e.g., ChIP-seq, ATAC-seq) GSE82144GSE98119

  • Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, et al. “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.” Science (New York, N.Y.) 326, no. 5950 (October 9, 2009) Gm12878, K562 cells. HindIII, NcoI enzymes. Two-three replicates. GSE18199

  • Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. “A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping.” Cell 159, no. 7 (December 18, 2014) - Human Gm12878, K562, IMR90, NHEC, HeLa cells, Mouse CH12 cells. Different digestion enzymes (HindIII, NcoI, Mbol, DpnII), different dilutions. Up to 35 biological replicates for Gm12878. GSE63525, Supplementary Table S1. Hi-C meta-data

  • Sanborn, Adrian L., Suhas S. P. Rao, Su-Chen Huang, Neva C. Durand, Miriam H. Huntley, Andrew I. Jewett, Ivan D. Bochkov, et al. “Chromatin Extrusion Explains Key Features of Loop and Domain Formation in Wild-Type and Engineered Genomes.” Proceedings of the National Academy of Sciences of the United States of America 112, no. 47 (November 24, 2015). HAP1, derived from chronic myelogenous leukemia cell line. Replicates. GSE74072

  • Rao, Suhas S.P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. “Cohesin Loss Eliminates All Loop Domains.” Cell 171, no. 2 (2017) - HCT-116 human colorectal carcinoma cells. Timecourse, replicates under different conditions. GSE104334

Leonid Mirny lab

http://mirnylab.mit.edu/

Bing Ren lab

http://chromosome.sdsc.edu/mouse/hi-c/download.html

Raw and normalized chromatin interaction matrices and TADs defined with DomainCaller. Mouse ES, cortex, Human ES, IMR90 fibroblasts. Two replicates per condition. GEO accession: GSE35156, GSE43070

Feng Yue lab

Cancer

  • 3D genomics of MYC overexpression. MYC overexpression leads to increased binding at active enhancers, amplified gene expression, increased chromatin interactions, promoter-enhancers, weakened TAD boundaries. U2OS osteosarcoma human cell line with tetracycline-inducible MYC, ChIP-seq (H3K27ac, superenhancer detection), RNA-seq (more downregulated genes, activation of ribosome, translation, motochondrial biogenesis), 4D-seq, and SIQHiC (Spike-in Quantitative Hi-C, mixing in crosslinked mouse 3T3 cells at a ratio 1:4). Replicate data at GSE164777.
    Paper See, Yi Xiang, Kaijing Chen, and Melissa J Fullwood. “MYC Overexpression Leads to Increased Chromatin Interactions at Superenhancers and MYC Binding Sites.” Genome Research, February 3, 2022, gr.276313.121. https://doi.org/10.1101/gr.276313.121.
  • Changes in 3D genome are associated with CNVs in multiple myeloma cells (RPMI-8226 trt- and tetraploid, U266 nearly diploid). The number of TADs increases by ~25%, they become smaller, ~20% switch compartment. ICE normalization better accounts for CNVs than HiCNorm. CNV breakpoints overlap with TAD boundaries. 40kb resolution, replicates. Code, Hi-C, WGS, RNA-seq data GSE87585

  • Curtaxins drugs affect 3D genome by DNA intercalation but without inducing DNA damage, compromise enhancer-promoter interactions, suppress oncogene expression, including MYC family genes, downregulates survival genes, partially disrupt TAD borders, decreases short-range interactions, the level of spatial segregation of the A/B compartments, depletes CTCF but not other factors. Hi-C in HT1080 fibrosarcoma cells. Data: Hi-C and CTCF ChIP-seq in duplicates GSE122463, gene expression in MM1.S and HeLa S3 cells GSE117611, H3K27ac GSE117409, nascent RNA transcription GSE107633

  • 3D genomics of glioblastoma. Replicate samples from three patients. Sub-5kb-resolution Hi-C data, integration with ChIP- and RNA-seq. Data: Six Hi-C replicates, EGAS00001003493, ChIP-seq GSE121601, RNA-seq data EGAS00001003700. Processed data

  • Ten non-replicated Hi-C datasets. Two human lymphoblastoid cell lines with known chromosomal translocations (FY1199 and DD1618), transformed mouse cell line (EKLF), six human brain tumours: five glioblastomas ( GB176, GB180, GB182, GB183 and GB238) and one anaplastic astrocytoma (AA86), a normal human cell line control (GM07017). GSE81879

  • Harewood, Louise, Kamal Kishore, Matthew D. Eldridge, Steven Wingett, Danita Pearson, Stefan Schoenfelder, V. Peter Collins, and Peter Fraser. “Hi-C as a Tool for Precise Detection and Characterisation of Chromosomal Rearrangements and Copy Number Variation in Human Tumours.” Genome Biology 18, no. 1 (December 2017).

  • Prostate cancer, normal. RWPE1 prostate epithelial cells transfected with GFP or ERG oncogene. Two biological and up to four technical replicates. GSE37752

    • Rickman, David S., T. David Soong, Benjamin Moss, Juan Miguel Mosquera, Jan Dlabal, Stéphane Terry, Theresa Y. MacDonald, et al. “Oncogene-Mediated Alterations in Chromatin Conformation.” Proceedings of the National Academy of Sciences of the United States of America 109, no. 23 (June 5, 2012)
  • Taberlay, Phillippa C., Joanna Achinger-Kawecka, Aaron T. L. Lun, Fabian A. Buske, Kenneth Sabir, Cathryn M. Gould, Elena Zotenko, et al. “Three-Dimensional Disorganization of the Cancer Genome Occurs Coincident with Long-Range Genetic and Epigenetic Alterations.” Genome Research 26, no. 6 (June 2016)

  • Cancer, normal Hi-C. Prostate epithelial cells, PC3, LNCaP. Two-three replicates. GSE73785

  • Haplotype-resolved Hi-C of GM12878, integrated with RNA-seq and Bru-seq (nascent mRNA). Investigation of Monoallelic expression (MAE) and Allele-Biased expression (ABE). GEO GSE159813

BRCA

  • Comparative characterization of 3D genomics in TNBC. Cell lines (HMEC as normal and 5 BRCA subtypes, by the order of aggressiveness: T47D, ZR7530, HCC1954, HCC70, BT549). TNBC shows most dramatic changes, partially conserved across TNBC cell lines and TNBC tissues. TADs (CaTCH), loops (HiCCUPS), compartment (PC1) analyses. Local interactions are lost, "normal" TAD interactions weakened but TNBC TADs strenghtened; those changes are associated with CTCF loss/gain. 3D changes are associated with gene expression changes. Hi-C (replicates), ChIP-seq (CTCF, H3K27ac), RNA-seq, and ATAC-seq data are at GSE167154.
    Paper Kim, Taemook, Sungwook Han, Yujin Chun, Hyeokjun Yang, Hyesung Min, Sook Young Jeon, Jang-il Kim, Hyeong-Gon Moon, and Daeyoup Lee. “Comparative Characterization of 3D Chromatin Organization in Triple-Negative Breast Cancers.” Experimental & Molecular Medicine, May 5, 2022. https://doi.org/10.1038/s12276-022-00768-2.
  • 3D spheroids (organoids) of three breast normal (MCF10A) and cancer cells (MCF7 and MCF7TR tamoxifen-resistant). Hi-C, RNA-seq, validation using 3D-qPCR, 3D-FISH. Normalization using HiCcompare's idea, TADs using TopDom, TAD comparison using eight types of changes, significant interactions using HiSIF. P1D1 loop definition as loops contacting promoter and distal regions of the same gene, comparison of strength change using Valid Pairs Per Million (VPPM), defining differentially expressed looping genes (DELGs). Hi-C (replicates) and RNA-seq (triplicates) at GSE165572.
    Paper Li, Jingwei, Kun Fang, Lavanya Choppavarapu, Ke Yang, Yini Yang, Junbai Wang, Ruifeng Cao, Ismail Jatoi, and Victor X. Jin. “Hi-C Profiling of Cancer Spheroids Identifies 3D-Growth-Specific Chromatin Interactions in Breast Cancer Endocrine Resistance.” Clinical Epigenetics 13, no. 1 (December 2021): 175. https://doi.org/10.1186/s13148-021-01167-6.
  • BRCA gene targets regulated by SNPs - Capture-C of chromatin interactions centered on causal variants and promoters of causal genes (Variant- and Promoter Capture Hi-C) in six human mammary epithelial (B80T5, MCF10A) and breast cancer (MCF7, T47D, MDAMB231, Hs578T) cell lines. HindIII fragments, CHiCAGO and Peaky for significant interaction calling. PCA on interactions separates cell types, significant interactions enriched in epigenomic elements. 651 target genes at 139 independent breast cancer risk signals. Table 1 - top priority target genes. HiCUP-processed capture Hi-C data (hg19), code, Supplementary tables, Tables S11 - 651 target genes.
    Paper Beesley, Jonathan, Haran Sivakumaran, Mahdi Moradi Marjaneh, Luize G. Lima, Kristine M. Hillman, Susanne Kaufmann, Natasha Tuano, et al. “Chromatin Interactome Mapping at 139 Independent Breast Cancer Risk Signals.” Genome Biology 21, no. 1 (December 2020) https://doi.org/10.1186/s13059-019-1877-y
  • Hi-C and RNA-seq in two ERα+ parental and Tamoxifen-resistant (TR) MCF7 and T47D cells, before and after treatment with Sapitinib (AZD8931), a dual TKI of EGFR/HER2. Eight types of TAD changes (TopDom), significant loops using Homer, promoter-distal looping genes (P1D1, P1D2). Many TR-specific TADs and loops are reversible upon Sapitinib treatment. ERα-bound promoter-enhancer looping genes enclosed within altered domains are enriched with genes with functions and pathways associated with cancer aggressiveness, glycolysis and metabolism, and focal adhesion. Comparing cells and spheroids - the latter recapitulate most changes and better preclinical model. hg19, 40kb. Replicated Hi-C and triplicated RNA-seq of MCF7/T47D parental/TamR at GSE144380 and GSE128676.
    Paper Yang, Yini, Lavanya Choppavarapu, Kun Fang, Alireza S. Naeini, Bakhtiyor Nosirov, Jingwei Li, Ke Yang, et al. “The 3D Genomic Landscape of Differential Response to EGFR/HER2 Inhibition in Endocrine-Resistant Breast Cancer Cells.” Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 1863, no. 11 (November 2020): 194631. https://doi.org/10.1016/j.bbagrm.2020.194631.
  • 3D (tethered chromatin conformation, TCC) timecourse of estradiol (E2) simulation in ER+ BRCA and endocrine resistance. Hormone-starved MCF7 (T0), E2-treated for 1h (T1), replicates combined. Approximately similar number of compartments (2050). Dynamic A/B compartments (HiCLib) are associated with active open chromatin. Dynamic changes are characterized by decreased CTCF binding. Associated genes enriched with cancer invasion, aggressiveness, metabolism. Three additional timepoints, 4h, 16h, 24h (T4, T16, T24). 24 patterns of changes, categorized into six (similar to TADcompare, highly common HCC, early/late transit ETC/LTC, lowly/moderately/highly dynamic LDC/MDC/HDC). Epigenetic states from histone ChIP-seq ChromHMM. Public RNA-seq data. Tamoxifen-resistant MCF7-TamR and T47D-TamR cell lines, tamoxifen-resistant altered compartments (TRACs), six types classified into shrunk, expanded, and flipped compartments. HOMER-identified loops. Differential genes associated with ribosome, tight junction, endocytosis, lysosome, cell cycle, WNT signaling pathway, insulin signaling pathway, focal adhesion, and MAPK signaling pathways. Molecular mechanistic model in Discussion. Supplementary data with hg19 coordinates of compartments, genes, loops. GSE108787 - MCF7 and TamR TCC, ChIP-seq and RNA-seq timecourse data (plus public RNA-seq); GSE119890 - T47D and TamR TCC timecourse data.
    Paper Zhou, Yufan, Diana L. Gerrard, Junbai Wang, Tian Li, Yini Yang, Andrew J. Fritz, Mahitha Rajendran, et al. “Temporal Dynamic Reorganization of 3D Chromatin Architecture in Hormone-Induced Breast Cancer and Endocrine Resistance.” Nature Communications 10, no. 1 (December 2019): 1522. https://doi.org/10.1038/s41467-019-09320-9.
  • Capture Hi-C (CHi-C) to annotate 63 breast cancer risk loci. 110 target genes at 33 loci, supported bu other evidence (eQTLs, disease-specific survival). Two ER+ breast cancer cell lines (T-47D, ZR-75-1), two ER− breast cancer cell lines (BT-20, MDA- MB-231), one “normal” breast epithelial cell line (Bre80-Q-TERT (Bre80)) and a non-breast lymphoblastoid cell line (GM06990). Approx 40% of interaction peaks are present in multiple cell lines. More interactions within TADs. WashU session with all CHi-C interaction peaks. Table 2 Risk loci which formed interaction peaks directly (N = 33) or via an adjacent risk locus (N = 3) with 110 target genes (locus, SNP, gene targets, nearest gene). Table 3 Nine CHi-C putative target genes that were statistically significant eQTLs (FDR adjusted P < 0.1) (locus, SNP, gene, p-values in all, ER+/- cancers). Table 4 Six CHi-C putative target genes for which there was orthogonal support for at least two additional data sources. PRJEB23968 - FASTQ files.
    Supplementary material https://www.nature.com/articles/s41467-018-03411-9#Sec23 - Supplementary Data 1: Captured genomic regions (Locus, SNP, hg19 coordinates, size, reference) - Supplementary Data 2: Numbers of statistically significant interaction peaks in six cell lines at 51 informative loci and 12 uninformative loci - Supplementary Data 3: Coordinates of interacting pairs detected in at least two cell lines (bedpe, -log10 FDR of interaction significance, cell line, numbed of cells) - Supplementary Data 4: Risk loci which formed interaction peaks with target genes in T-47D (T), ZR-75-1 (Z), Bre80 (Br), BT-20 (BT), MDA-MB-231 (M) and GM06990 (G) cell lines. (cytoband, SNP, gene targets). - Supplementary Data 5: Distances between published risk SNPs and putative CHi-C target genes (kb) at 36 informative risk loci (cytoband, SNP, hg19 coordinates, gene targets) - Supplementary Data 6: eQTL analysis of 69 protein coding target genes at 26 risk loci in TCGA breast cancer data - Supplementary Data 7: Disease-specific survival analysis of 97 target genes in Metabric data
    Paper Baxter, Joseph S., Olivia C. Leavy, Nicola H. Dryden, Sarah Maguire, Nichola Johnson, Vita Fedele, Nikiana Simigdala, et al. “Capture Hi-C Identifies Putative Target Genes at 33 Breast Cancer Risk Loci.” Nature Communications 9, no. 1 (December 2018): 1028. https://doi.org/10.1038/s41467-018-03411-9

Tissue-specific

ENCODE

Search query for any type of Hi-C data, e.g., human brain Hi-C

Brain

Cell lines

  • Haarhuis, Judith H.I., Robin H. van der Weide, Vincent A. Blomen, J. Omar Yáñez-Cuna, Mario Amendola, Marjon S. van Ruiten, Peter H.L. Krijger, et al. “The Cohesin Release Factor WAPL Restricts Chromatin Loop Extension.” Cell, (May 2017) - WAPL, cohesin's antagonist, DNA release factor, restricts loop length and prevents looping between incorrectly oriented CTCF sites. Together with SCC2/SCC4 complex, WAPL promotes correct assembly of chromosomal structures. WAPL WT and KO Hi-C, RNA-seq, ChIP-seq for CTCF and SMC1. Also, SCC4 KO and combined SCC4-WAPL KO Hi-C. Potential role of WAPL in mitosis chromosome condensation. Tools: HiC-Pro processing, HICCUPS, HiCseq, DI, SomaticSniper for variant calling. Data (Hi-C in custom paired BED format) : GEO GSE95015

  • Grubert, Fabian, Judith B. Zaugg, Maya Kasowski, Oana Ursu, Damek V. Spacek, Alicia R. Martin, Peyton Greenside, et al. “Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions.” Cell, (August 2015) - seven Hi-C replicates on Gm12878 cell line, GEO GSE62742

  • Naumova, Natalia, Maxim Imakaev, Geoffrey Fudenberg, Ye Zhan, Bryan R. Lajoie, Leonid A. Mirny, and Job Dekker. “Organization of the Mitotic Chromosome.” Science (New York, N.Y.), (November 22, 2013) - E-MTAB-1948 - 5C and Hi-C chromosome conformation capture study on metaphase chromosomes from human HeLa, HFF1 and K562 cell lines across the cell cycle. Two biological and two technical replicates. ArrayExpress E-MTAB-1948

  • Jessica Zuin et al., “Cohesin and CTCF Differentially Affect Chromatin Architecture and Gene Expression in Human Cells,” Proceedings of the National Academy of Sciences of the United States of America, (January 21, 2014) - CTCF and cohesin (RAD21 protein) are enriched in TAD boundaries. Depletion experiments. Different effect on inter- and intradomain interactions. Loss of cohesin leads to loss of local interactions, but TADs remained. Loss of CTCF leads to both loss of local and increase in inter-domain interactions. Different gene expression changes. TAD structures remain largely intact. Data: Hi-C, RNA-seq, RAD21 ChIP-seq for control and depleted RAD21 and CTCF in HEK293 hepatocytes. Two replicates in each condition. GEO GSE44267

Non-human data

  • Evolutionary 3D genomics, principles of chromosome folding in mammals (Eutherians (aardvark, elephant, mouse, human), marsupials (chicken, platypus, wallaby, tasmanian devil)). Reshuffling can influence high-order chromatin organization. Eutherian genome organization is associated with higher number of short loops (Hi-C), high CTCF density (ChIP-seq), chromosomal territories. Vice versa for marsupials, including chromosomes in the Rabl configuration. A/B compartments, TADs have similar properties. Analysis of synthenic region rearrangements, reconstructing evolutionary history. Juicer, TADbit, FAN-C, Newly generated data for African Elephant, Aardvark, Tasmanian Devil, Tammar Wallaby (Hi-C, CTCT, H3K4me3 ChIP-seq, RNA-seq) at GSE206075.
    Paper Álvarez-González, Lucía, Cristina Arias-Sardá, Laia Montes-Espuña, Laia Marín-Gual, Covadonga Vara, Nicholas C. Lister, Yasmina Cuartero, et al. “Principles of 3D Chromosome Folding and Evolutionary Genome Reshuffling in Mammals.” Cell Reports 41, no. 12 (December 2022): 111839. https://doi.org/10.1016/j.celrep.2022.111839.
  • Erythrocytes 3D genome organization in ten species at the last nucleated stages of maturation (newly generated mouse erythroblasts data and previously generated public blood Hi-C data from other organisms). Lack loops and TADs, strong second diagonal pattern. Raw data at SRA.
    Paper Ryzhkova, Anastasia, Alena Taskina, Anna Khabarova, Veniamin Fishman, and Nariman Battulin. “Erythrocytes 3D Genome Organization in Vertebrates.” Scientific Reports 11, no. 1 (December 2021): 4414. https://doi.org/10.1038/s41598-021-83903-9.
  • Investigation of the mechanisms of TAD boundaries in Drosophila. Notch gene locus having two TADs, the role of genetic sequences bound by architectural proteins (APs, CP190, BEAF-32, M1BP, SuHw, CTCF). Deletion (CRISPR-Cas9) of domains lead to fusion of TADs, loss of APs, disruption of transcription. In nucleus Hi-C (4-cutter MboI) in embryonic cell line S2R+ in triplicates GSE136137. References to many Drosophila public datasets in Methods section.
    Paper Arzate-Mejía, Rodrigo G., Angel Josué Cerecedo-Castillo, Georgina Guerrero, Mayra Furlan-Magaril, and Félix Recillas-Targa. “In Situ Dissection of Domain Boundaries Affect Genome Topology and Gene Transcription in Drosophila.” Nature Communications 11, no. 1 (December 2020): 894. https://doi.org/10.1038/s41467-020-14651-z.
  • RNA-seq, ATAC-seq, ChIP-seq, whole genome methylation (30X), Hi-C in 11 adult and two embryonic tissues on zebrafish. Comparison with human and mouse regulatory elements. Enrichment of evolutionary breakpoints at TAD boundaries, H3K4me3 and CCTF signal.De novo chr4 assembly (sex chromosome). scATAC-seq on zebrafish brain - 25 cell types. GEO GSE134055, Tweet

  • tagHi-C protocol for low-input tagmentation-based Hi-C. Applied to mouse hematopoiesis 10 major blood cell types. Changes in compartments and the Rabl configuration defining chromatin condensation. Gene-body-associating domains are a general property of highly-expressed genes. Spatial chromatin loops link GWAS SNPs to candidate blood-phenotype genes. HiC-Pro to Juicer. GEO GSE142216 - RNA-seq, replicates, GEO GSE152918 - tagHi-C data, replicates, combined .hic files

  • Single-nucleus Hi-C data (scHi-C) of 88 Drosophila BG3 cells. 2-5M paired-end reads per cell, 10kb resolution. ORBITA pipeline to eliminate the effect of Phi29 DNA polymerase template switching. Chromatin compartments approx. 1Mb in size, non-hierarchical conserved TADs can be detected. Lots of biology, integration with other omics data. Raw and processed data in .cool format at GEO GSE131811

  • 3D chromatin organization during spermatogenesis, mouse. Meyotic chromosomes in prophase have weak compartmentalization, TADs, loops. Enrichment in near inter-chromosomal interactions (close to diagonal). The X chromosome lacks domain organization during meiotic sex-chromosome inactivation. Concept and formula for evaluation of genomic compartment strength (Methods). GEO - Hi-C of meiotic pachytene spermatocytes (PS; 2 biological replicates). Other public Hi-C, RNA-seq, ChIP-seq data.

  • 3D genome rearrangement is uncoupled from gene expression changes. Introduction, references for and against 3D genomics-gene expression links. Drosophila, a "balancer" line with highly rearranged chromosomes. Negligible association can be detected, but changes in genome topology are not predictive of changes in gene expression, loss of long-range interactions has little impact. Processed data, GitHub. Raw data: Whole genome, Hi-C, Capture-C, RNA-seq

    Paper

    Ghavi-Helm, Yad, Aleksander Jankowski, Sascha Meiers, Rebecca R. Viales, Jan O. Korbel, and Eileen E. M. Furlong. “Highly Rearranged Chromosomes Reveal Uncoupling between Genome Topology and Gene Expression.” Nature Genetics, July 15, 2019.

  • Global organization of the B cell genome throughout differentiation by the transcription factor Pax5. Mouse splenic CD4+ cells, B cells at various differentiation stages, granulocytes. diffHiC, TADbit, directionality index. Hi-C and RNA-seq data on GEO GSE99163.
    Paper Johanson, Timothy M. “Transcription-Factor-Mediated Supervision of Global Genome Architecture Maintains B Cell Identity.” Nature Immunology 19 (2018): 14. https://doi.org/10.1038/s41590-018-0234-8
  • TADs in Drosophila, Hi-C and RNA-seq in four cell lines of various origin. dCTCF, SMC3, and Su(Hw) are weakly enriched at TAD boundaries. Transcription and active chromatin (H3K27ac, H3K4me1, H3K4me3, H3K36me3, H4K16ac) are associated with TAD boundaries. Also, BEAF-32 and CP190. Hierarchical TADs. Housekeeping genes tend to be near TAD boundaries and in inter-TAD regions. TAD boundary prediction using regression, modeling to associate TADs with bands, investigation of the hierarchy. Heavy use of the Armatus TAD caller. RNA-seq and replicate Hi-C data, high correlation, merged into 20kb resolution.  GEO GSE69013

  • Hi-C of polytene chromosomes in Drosophila. Polytene bands colocalize with TADs. TADs are conserved between polytene and diploid cells. Loops are transient. Two states of folding: Fully extended and up to 10-fold compacted fibers constitute euchromatin. Up to 30-fold compacted fibers represent heterochromatin of the nuclear periphery. Many experimental observations, validations. GEO - Tethered and in-solution Hi-C, triplicates, polytene, diploid.

    Paper Eagen, Kyle P., Tom A. Hartl, and Roger D. Kornberg. “Stable Chromosome Condensation Revealed by Chromosome Conformation Capture.” Cell 163, no. 4 (November 2015): 934–46. https://doi.org/10.1016/j.cell.2015.10.026.

Differential Hi-C

  • Liquid-liquid phase separation (LLPS) in haematological cancers is associated with intrinsically disordered regions (IDRs) of NUP98-HOXA TF chimera and induces CTCF-independent chromatin loops enriched in proto-oncogenes. Many biochemical assays, imaging, mass-spec, ChIP-seq, RNA-seq. All data at GEO GSE144643. In situ Hi-C (HEK293FT kidney cells, IDR wild type and mutated, biological and technical replicates) at GEO GSE143465.
    Paper Ahn, Jeong Hyun, Eric S. Davis, Timothy A. Daugird, Shuai Zhao, Ivana Yoseli Quiroga, Hidetaka Uryu, Jie Li, et al. “Phase Separation Drives Aberrant Chromatin Looping and Cancer Development.” Nature, June 23, 2021. https://doi.org/10.1038/s41586-021-03662-5.
  • WIZ (widely interspaced zinc finger-containing protein) - new loop-organizing protein, colocalizes with CTCF and cohesin across the genome. Loss of WIZ increases cohesin occupancy and DNA loops. WIZ maintains proper gene expression and stem cell identity. Arima, Juicer. GEO GSE137285 - RNA-seq, ChIP-seq, Hi-C replicates in WT and WIZdel mouse ESCs.

  • 3D chromatin reorganization during different types of cellular senescence, replicative (RS) and oncogene-induced (OIS over time course). Senescence-associated heterochromatin loci (SAHFs), formed with the help of DNMT1 via regulation of MMGA2 expression. WI38 primary fibroblasts. OIS - gain in long-range contacts. diffHiC analysis, differential regions enriched in H3K9me3. TADkit for 3D modeling, visualization. Data (Hi-C replicates, different conditions, timecourse, H3K4me3/H3K9me3/H3K27ac ChIP-seq, RNA-seq) GEO GSE130306

  • X chromosome sex differences in Drosophila. Male X chromosome has two-fold upregulation of gene expression, more mid/long-range interactions, weaker boundaries marked by BEAF-32, CP190, Chromator, and CLAMP, a dosage compensation complex cofactor. Less negative slope in distance-dependent decay of interactions, less clustered top scoring interactions (more randomness), more open structure overall. Local score differentiator (LSD-score) to call differential TAD boundaries in CNV-independent manner - more non-matching boundaries than autosomes, ~20% appearing and ~35% disappearing boundaries. Enrichment in epigenomic marks identified stronger boundary association with MSL (male-specific lethal complex) and CLAMP binding. Many other experimental observations. hiclib, hicpipe processing. R implementation of LSD differential TAD analysis, Hi-C data in bedGraph format GEO GSE94115, Tweet

  • Hi-C TAD comparison between normal prostate cells (RWPE1) and two prostate cancer cells (C42B, 22Rv1). TADs (TopDom-called) become smaller in cancer, switch epigenetic states. FOXA1 promoter has more loop anchors in cancer. Androgen receptor (AR) locus has chromatin structure changed around it (Figure 6). Loop investigation called with Fit-HiC, motifs (NOMe-seq) enriched in loop-associated enhancers different between normal and cancer. HiTC visualization. Figure 1a, Supplementary Figure 3, 5 - examples/coordinates of TAD boundary/length changes.

  • Data For RWPE1, C42B, 22Rv1 cell lines: GEO GSE118629. In situ Hi-C, 4-cutter MboI,  replicated, text-based sparse matrices at 10kb and 40kb resolution, raw and ICE-normalized, hg19. H3K9me3, H3K27me3, H3K36me3, RNA-seq.

  • Supplementary data: Data 2 - TAD coordinates and annotations; Data 3 - differentially expressed genes in smaller TADs; Data 4 - gene expression changes in TADs switching epigenomic state; Data 5 - enhancer-promoter loops; Data 6 - coordinates of nucleosome-depleted regions; Data 7 - all differentially expressed genes; Data 8 - target genes of FOXA1-bound enhancers; Data 9 - overexpressed genes with more enhancer-promoter loops

  • DNA methylation linked with 3D genomics. Methylation directs PRC-dependent 3D organization of mouse ESCs. Hypomethylation in mouse ESCs driven to naive pluripotency in two inhibitors (2i) is accopmanied by redistribution of polycomb H3K27me3 mark and decompaction of chromatin. Focus on HoxC, HoxD loci. Hi-C data processed with distiller and other cool-related tools. RNA-seq, H3K37me3 ChIPseq of Mouse ESCs grown in serum and 2i conditions. Hi-C data in replicates GEO GSE124342

  • RNA transcription inhibition minimally affects TADs, weakens TAD boundaries. K562, RNAse inhibition before/after crosslinking (bXL/aXL), actinomycin D (complete transcriptional arrest) treatment. Processing using cword, 40kb resolution. Data with replicates of each condition, GEO GSE114337

  • Comparison of the 3D structure of human and chimpanzee induced puripotent stem cells. Lower-order pairwise interactions are relatively conserved, but higher-order, such as TADs, differ. HiCUP and HOMER for Hi-C data processing to 10kb resolution. cyclic loess normalization, limma for significant interaction definition, Arrowhead on combined replicated wot detect TADs.  Association of differential chromatin interactions with gene expression. PyGenomeTracks for visualization. Workflowr code, Processed Hi-C data (4 human and 4 chimp iPSCs) GEO GSE122520

  • In situ HiC libraries in biological replicates (n=2) for several hematopoietic celltypes (200mio reads per replicate) with a focus on the B cell lineage in mice. The authors investigate the role of the transcription factor Pax5 towards its supervisiory role of organizing the 3D genome architecture throughout B cell differentiation. The raw data are available via GEO GSE99151

  • DNA loop changes during macrophage development (THP-1 monocyte to macrophage development under 72h PMA treatment). In situ Hi-C (pbn reads, 10kb resolution), RNA-seq, ATAC-seq, CTCF and H3K27ac ChIP-seq. Formation of multi-hubs at key macrophage genes. Differential (dynamic, DESeq2-detected) loops are enriched for AP-1, more enriched in H3K27ac, in contrast to static loops. Association between local H3K27ac and transcription level with distal DNA elements with elevated H3K27ac. Very few genes and lower H3K27ac signal in lost loops, more genes and H3K27ac signal in gained loops. Fold changes in H3K27ac signal positively correlate with DNA looping. Macrophage development-specific gene ontology enrichments. Network analysis for multi-loop multi-enhancer activation hubs identification. GEO GSE96800 ChIP-seq, ATAC-seq, RNA-seq, Two Hi-C samples, THP-1 PMA-treated and untreated, SRA PRJNA385337.

    • Supplemental material:
      • Table S1. DNA Loops in Untreated THP-1 Cells, 16067. Text, hg19 genomic coordinates, columns: anchor1_chrom anchor1_start anchor1_end anchor2_chrom anchor2_start anchor2_end sample -log10(P) anchor1_strand anchor2_strand
      • Table S2. DNA Loops in PMA-Treated THP-1 Cells, 16335.
      • Table S3. Differential Loops
    • Phanstiel, Douglas H., Kevin Van Bortle, Damek Spacek, Gaelen T. Hess, Muhammad Saad Shamim, Ido Machol, Michael I. Love, Erez Lieberman Aiden, Michael C. Bassik, and Michael P. Snyder. “Static and Dynamic DNA Loops Form AP-1-Bound Activation Hubs during Macrophage Development.” Molecular Cell, (September 2017)

Timecourse Hi-C

  • 3D genomics of human embryogenesis. Human and mouse sperm differ, human don't have TADs and A/B compartments, they establish later in embryogenesis, require zygotic genome activation and CTCF. Six stages of spatiotemporal Hi-C during human embryogenesis including sperm, 2-cell, 8-cell, morula, blastocysts, and six-week-old embryos. GitHub. Data: CRA000852, CRA000108, CRA000852.
    Paper Chen, Xuepeng, Yuwen Ke, Keliang Wu, Han Zhao, Yaoyu Sun, Lei Gao, Zhenbo Liu, et al. “Key Role for CTCF in Establishing Chromatin Structure in Human Embryos.” Nature, December 4, 2019. https://doi.org/10.1038/s41586-019-1812-0.
  • Vara, Covadonga, Andreu Paytuví-Gallart, Yasmina Cuartero, François Le Dily, Francisca Garcia, Judit Salvà-Castro, Laura Gómez-H, et al. “Three-Dimensional Genomic Structure and Cohesin Occupancy Correlate with Transcriptional Activity during Spermatogenesis.” Cell Reports, (July 2019) - 3D structure changes during spermatogenesis in mouse. Hi-C, RNA-seq, CTCF/REC8/RAD21L ChIP-seq. Description of biology of each stage (Fibroblasts, spermatogonia, leptonema/zygonema, pachynema/diplonema, round spermatids, sperm), and A/B compartment and TAD analysis (TADbit, insulation score), data normalized with ICE. Integration with differential expression. Changes in distribution of CTCF and cohesins (REC8 and RAD21L). Key tools: BBDuk (BBMap), TADbit, HiCExplorer, HiCRep, DeepTools. Data (no replicates) GEO GSE132054

  • Paulsen, Jonas, Tharvesh M. Liyakat Ali, Maxim Nekrasov, Erwan Delbarre, Marie-Odile Baudement, Sebastian Kurscheid, David Tremethick, and Philippe Collas. “Long-Range Interactions between Topologically Associating Domains Shape the Four-Dimensional Genome during Differentiation.” Nature Genetics, April 22, 2019 - Long-range TAD-TAD interactions form cliques (>3 TAD interacting) are enriched in B compartments and LADs, downregulated gene expression. Graph representation of TAD interactions. Quantifying statistical significance of between-TAD interactions. TAD boundaries are conserved. TAD cliques are dynamic. Permutation test preserving distances. Armatus for TAD detection. hiclib for data processing, Juicebox for visualization. Data: Time course differentiation or human adipose stem cells (day 0, 1, and 3). Hi-C (two replicates), Lamin B1 ChIP-seq, H3K9me3. GEO GSE109924. Also used mouse ES differentiation (Bonev 2017), mouse B cell reprogramming (Stadhouders 2018), scHi-C (Nagano 2017)

  • Du, Zhenhai, Hui Zheng, Bo Huang, Rui Ma, Jingyi Wu, Xianglin Zhang, Jing He, et al. “Allelic Reprogramming of 3D Chromatin Architecture during Early Mammalian Development.” Nature, (12 2017) - Developmental time course Hi-C. Data in preimplantation embryos at the following stages: gametes (sperm and MII oocyte), pronuclear stage 5 (PN5) zygotes, early 2-cell, late 2-cell, 8-cell, inner cell masses (ICM), and mouse embryonic stem cells (mES). Low-input Hi-C technology (sisHi-C). TADs are initially absent, then gradually appeared. HiCPro mapping, Pearson correlation on low-resolution matrices, allele resolving. Data:  GEO GSE82185

  • Hug, Clemens B., Alexis G. Grimaldi, Kai Kruse, and Juan M. Vaquerizas. “Chromatin Architecture Emerges during Zygotic Genome Activation Independent of Transcription.” Cell, (06 2017) - TADs appearing during zygotic genome activation, independent of transcription. TAD boundaries are enriched in housekeeping genes, colocalize in 3D. Drosophila. Insulation score for boundary detection. Overlap analysis of TAD boundaries. Processed Hi-C matrices at 5kb resolution (replicates merged, .cool format) and TAD boundaries at nuclear cycle 12, 13, 14, and 3-4 hours post fertilization

  • Ke, Yuwen, Yanan Xu, Xuepeng Chen, Songjie Feng, Zhenbo Liu, Yaoyu Sun, Xuelong Yao, et al. “3D Chromatin Structures of Mature Gametes and Structural Reprogramming during Mammalian Embryogenesis.” Cell, (July 13, 2017) - 3D timecourse changes during mouse gametes (sperm and MII oocyte) and early embryos development, from zygotic (no TADs, many long-range interactions) to 2-, 4-, 8-cell, blastocyst and E7.5 mature embryos (TADs established after several rounds of DNA replication). A/B compartments associated with un/methylatied CpGs, respectively. PC1, directionality index, insulation score to define compartments and TADs, these metrics increase in magnitude/strength during maturation. Enrichment in CTCF, SMC1, H3K4me3, H3K27ac, H3K9ac, H3K4me1, depletion in H3K9me3, H3K36me3, H3K27me3. The compartment strength is weaker in maternal vs. paternal genomes. Covariance for each gene vs. boundary score across the timecourse. Relative TAD intensity changes. Hi-C and RNA-seq data at different stages, some replicates

Promoter-capture Hi-C

  • SIPs, super-interactive promoters in five hematopoietic cell types (Erythrocyte, Macrophage/monophage, megakaryocyte, naive CD4 T-cells, Neutrophils). Reanalysis of promoter-capture Hi-C data from Javierre et al., “Lineage-Specific Genome Architecture Links Enhancers and Non-Coding Disease Variants to Target Gene Promoters.” study. CHiCAGO pipeline. Promoter-interacting regions (PIRs) interacting with SIPs are more enriched in cell type-specific ATAC-seq peaks, GWAS variants for relevant cell types. SIP-associated genes are higher expressed in relevant cells. Some SIPs are shared across cell lines. Super-SIPs.

  • Genome-wide maps linking disease variants to genes. Activity-By-Contact (ABC) Model. 72 diseases and complex traits (non-specific, no psychiatric), linking 5046 fine-mapped GWAS signals to 2249 genes. 577 genes influence multiple phenotypes. Nearly half enhancers regulate multiple genes.Table S7 - Summary of diseases and traits.Table S9 - ABC-Max predictions for 72 diseases and complex traits.

  • Promoter-enhancer contacts occur in cohesin-dependent and cohesin-independent manner. Promoter Capture Hi-C on degradation of cohesin (SCC1 subunit) and CTCF (both targeted by auxin-inducible degron and mEGFP reporter) in G1-synchronized HeLa cells. The majority of promoter contacts are lost (associated with transcriptional changes, SLAM-seq) but some are retained and gained. Cohesin-independent promoter contacts interact with active enhancers. Cohesin-dependent interactions are typically longer and associated with CTCF, while cohesin-independent interactions are shorter and associated with active promoters and enhancers. HiCUP, CHiCAGO, Chicdiff. Processed data, replicates of promoter-capture Hi-C data GEO GSE145735, replicates of SLAM-seq data GEO GSE145734

  • Promoter-enhancer predictions in 131 cell types and tissues using the Activity-By-Contact (ABC) Model, based on chromatin state (ATAC-seq) and 3D folding (consensus Hi-C). ABC model assumes an element’s quantitative effect on a gene should depend on its strength as an enhancer (Activity) weighted by how often it comes into 3D contact with the promoter of the gene (Contact), and that the relative contribution of an element on a gene’s expression (as assayed by the proportional decrease in expression following CRISPR-inhibition) should depend on that element’s effect divided by the total effect of all elements. Outperforms distance-based methods, 3D-based only, machine learning approaches. Enhancer-promoter predictions for GM12878, K562, liver, LNCAP, mESCs, NCCIT cells, more at Engreitz Lab page. GitHub repository broadinstitute/ABC-Enhancer-Gene-Prediction.

  • Promoter-enhancer interactions. Promoter-capture Hi-C, 27 human cell lines. Well-formatted data and hg19 genomic coordinates Supplementary material and http://www.3div.kr/capture_hic

  • Promoter capture Hi-C in 17 blood cell types. Chromatin interactions are cell type-specific. >50% interactions are one-to-one. Enriched in H3K27ac and H3K4me1 (active enhancers). GWAS loci enriched in PIRs. Table S3 lists prioritized genes/SNPs, for autoimmune diseases. Used CHiCAGO to identify strongly interacting regions. Data has active promoter-enhancer links. More than 2,500 potential disease-associated genes are linked to GWAS SNPs. https://osf.io/u8tzp/

Single-cell Hi-C

See Notes on single-cell Hi-C technologies, tools, and data repository

Micro-C

See the Micro-C section in the HiC_tools repository

GAM

Genome Architecture Mapping data

Imaging

  • MERFISH - Super-resolution imaging technology, reconstruction 3D structure in single cells at 30kb resolution, 1.2Mb region of Chr21 in IMR90 cells. Distance maps obtained by microscopy show small distance for loci within, and larger between, TADs. TAD-like structures exist in single cells. 2.5Mb region of Chr21 in HCT116 cells, cohesin depletion does not abolish TADs, only alter their preferential positioning. Multi-point (triplet) interactions are prevalent. TAD boundaries are highly heterogeneous in single cells. , diffraction-limited and STORM (stochastic optical reconstruction microscopy) imaging. GitHub

  • Single-cell level massively multiplexed FISH (MERFISH, sequential genome imaging) to measure 3D genome structure in context of gene expression and nuclear structures. Approx. 650 loci, 50kb resolution, on chr21 10.4-46.7Mb from the hg38 genome assembly, IMR90 cells, population average from approx. 12K chr21 copies, multiple rounds of hybridization. Investigation of TADs, A/B compartments, 87% agreement with bulk Hi-C. Association with cell type markers, transcription. Genome-scale imaging using barcodes, 1041 30kb loci covering autosomes and chrX of IMR90, over 5K cells, 5 replicates. Processed multiplexed FISH data and more, TXT format, GitHub

  • Parser of multiplexed single-cell imaging data from Bintu et al. 2018 and Su et al. 2020 - Take 3D coordinates of the regions as input and write the distance and contact matrices for these datasets.

CTCF

Notes on CTCF motifs and data

Integrative Hi-C

  • 3D structure mediates the effect of genetic variants on gene expression. 317 lymphoblastoid (LCL) and 78 fibroblast (FIB) cell lines, Hi-C data from Rao et al. 2014 paper. Regulatory elements identified from H3K4me1, H3K4me3, H3K27ac ChIP-seq. The regulatory activity is structured in 12,583 well-delimited cis-regulatory domains (CRDs) that respect the local chromatin organization into topologically associating domains (TADs) but constitute finer organization. 30 trans-regulatory hubs (TRHs) formed by CDRs on distinct chromosomes, associated with AB compartments and allelic regulation. Processed data - cQTLs - variants associated with chromatin peak activity; (cis/trans) eQTLs - variants associated with gene expression; aCRD-QTLs - variants associated with CRD activity; sCRD-QTLs - variants associated with CRD structure; chromatin peaks, and CRDs. For LCL and FIB cell lines, coordinates in hg19.
    Paper Delaneau, O., M. Zazhytska, C. Borel, G. Giannuzzi, G. Rey, C. Howald, S. Kumar, et al. “Chromatin Three-Dimensional Interactions Mediate Genetic Effects on Gene Expression.” Science (New York, N.Y.) 364, no. 6439 (03 2019). https://doi.org/10.1126/science.aat8266.

Misc

  • RNA-Chrom - database of RNA-chromatin interactions. Human & mouse. Manually curated. Data from "all-to-all" methods (MARGI, GRID-seq, ChAR-seq, iMARGI, RADICL-seq, Red-C) and "one-to-all" methods (RAP, CHART-seq, CgURO-seq, dChIRP-seq, ChOP-seq, CHIRT-seq), databases. Uniform processing. RNA- and DNA-centric searches. Video tutorial 1, tutorial 2. Download.
    Paper Ryabykh, G. K., S. V. Kuznetsov, Y. D Korostelev, A. I. Sigorskikh, A. A. Zharikova, and A. A. Mironov. “RNA-Chrom: A Manually-Curated Analytical Database of RNA–Chromatin Interactome.” Preprint. Bioinformatics, December 12, 2022. https://doi.org/10.1101/2022.12.10.519346.
  • Prioritization of COVID-19 candidate genes using 3D chromosomal topology. Applying COGS (Capture Hi-C Omnibus Gene Score), a statistical pipeline for linking GWAS variants with their target genes based on 3D chromatin interaction data. COVID-19 GWAS data. Promoter-capture Hi-C data from Javierre et al., “Lineage-Specific Genome Architecture Links Enhancers and Non-Coding Disease Variants to Target Gene Promoters” and Ho et al. "TOP1 inhibition therapy protects against SARS-CoV-2-induced lethal inflammation" studies (17 human primary cell types data and SARS-CoV-2-infected lung carcinoma cells data). Four prioritization approaches, summary in Supplementary Table S4. Biological analysis.
    Paper Thiecke, Michiel J., Emma J. Yang, Oliver S. Burren, Helen Ray-Jones, and Mikhail Spivakov. “[Prioritisation of Candidate Genes Underpinning COVID-19 Host Genetic Traits Based on High-Resolution 3D Chromosomal Topology](https://doi.org/10.3389/fgene.2021.745672).” Frontiers in Genetics 12 (October 25, 2021)

About

A (continuously updated) collection of references to Hi-C data. Predominantly human/mouse Hi-C data, with replicates.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published