# hilbert-genome-cooler

#### Overview

We create a Python package called `hilbertgenome` to generate a multiscale `mcool` file of Hilbert curve data. 

This package in turn uses the Python `numpy-hilbert-curve` (`hilbert`) and `cooler` packages.

Each scale of the `mcool` file is derived from **integer-keyed categorical data** (e.g., chromatin states or DHS components) at each Hilbert curve order, and put into its own `cool` file.

The set of `cool` files — each at its own "resolution" to represent Hilbert curve order — can then be loaded into a final `mcool` file.

#### Notes

1. This does not yet use the custom Hilbert curve aggregation function defined in the Observable notebook. Instead, we are generating a range of integers as placeholder data. The colormap equivalent of this data, when rendered (see below), uses coloring to demonstrate that the Hilbert curve data are ordered correctly.

2. This does not yet look at using HiGlass to render this `mcool` file; customization of the 2D track type will be required for presenting categorical data, similar to what was required for using HiGlass to render chromatin state or DHS component "tornado" multivec tracks. 

In [1]:
# The following directive activates inline plotting
%matplotlib inline

In [2]:
%%bash

pip install h5py
pip install cooler
pip install matplotlib
pip install numpy-hilbert-curve



In [3]:
import io
import os
import requests
import hilbertgenome as hg

In [4]:
data_dir = 'hilbert_genome_cooler_data'
if not os.path.exists(data_dir): 
    os.makedirs(data_dir)
mcool_fn = os.path.join(data_dir, 'signal.mcool')

In [5]:
#
# ref. /net/seq/data/projects/Epilogos/multivec-for-browser-2022-redo/epilogos_tracks/single/human/Boix_et_al_833_sample/hg19/18/All_833_biosamples/S1/scores.txt.filledGap.versionSorted.txt.gz
#
signal_categories = 18
signal_resolution = 200
signal_remote_URI = 'https://resources.altius.org/~areynolds/public/Boix_et_al_833_sample.hg19.18.All_833_biosamples.S1.scores.txt.gz'
signal_local_fn = os.path.join(data_dir, 'Boix_et_al_833_sample.hg19.18.All_833_biosamples.S1.scores.txt.gz')
if not os.path.exists(signal_local_fn):
    try:
        r = requests.get(signal_remote_URI)
        with open(signal_local_fn, "wb") as ofh:
            b = io.BytesIO(r.content)
            ofh.write(b.getbuffer())
    except requests.exceptions.RequestException as e:
        raise SystemExit(e)

In [6]:
hgo = hg.HilbertGenome(assembly="hg38", 
                       input_signal_fn=signal_local_fn,
                       input_signal_categories=signal_categories,
                       input_signal_resolution=signal_resolution, 
                       curve_order_min=4,
                       curve_order_max=8,
                       output_mcool_fn=mcool_fn)

Note: Custom curve order maximum (8) is less than optimum calculated maximum (12)


{8: 3200, 7: 6400, 6: 12800, 5: 25600, 4: 51200}
{0.02420393280989194: 0, 0.002708360245469004: 1, 0.00394814926011068: 2, 0.001220604360849253: 3, 0.026705936533921948: 4, 0.11972853944638585: 5, 0.0008191515126548877: 6, 0.0005271316964658933: 7, 0.01561427821918218: 8, 0.010779735208250608: 9, 0.0238785053698427: 10, 0.013359458511884503: 11, 0.07574853917899496: 12, 0.002951533907732596: 13, 0.0009647647358091171: 14, 0.015355406009786235: 15, 0.09095262598409129: 16, 0.5705334328871138: 17}
[0.024203924449809053, 0.002708355043696703, 0.003948147350381652, 0.0012206042756747646, 0.026705927402440006, 0.11972853264733646, 0.0008191415662563034, 0.0005271217003774098, 0.015614276386078431, 0.010779732451657307, 0.023878503196616135, 0.013359456620804257, 0.0757485318733546, 0.002951532648822401, 0.0009647638622056454, 0.015355399482853492, 0.09095262371773721, 0.5705334253238982]
[17, 17, 17, 17, 17, 12, 12, 12, 12, 12]
ds_as_fs (all) [0.01535541 0.01535541 0.01535541 0.01535541 0.0

Note: Processing signal for curve order 8...


chr1 [[15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15]]
chr1 [15]
ds_as_fs (all) [0.11972854 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854
 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854
 0.11972854 0.11972854 0.11972854 0.11972854]
ds_as_fs (min) 0.11972853944638585
5
data_split_to_promoted_category 5
chr2 [[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]]
chr2 [5]
ds_as_fs (all) [0.11972854 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854
 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854
 0.11972854 0.11972854 0.11972854 0.11972854]
ds_as_fs (min) 0.11972853944638585
5
data_split_to_promoted_category 5
chr3 [[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]]
chr3 [5]
ds_as_fs (all) [0.11972854 0.11972854 0.11972854 0.11972854 0.11972854 0.11972854
 0.11972854 0.02670594 0.02670594 0.02670594 0.02670594 0.02670594
 0.00081915 0.11972854 0.11972854 0.11972854]
ds_as_fs (min) 0.0008191515126548877
6
data_split_to_promoted

Note: Processing signal for curve order 7...
Note: Processing signal for curve order 6...
Note: Processing signal for curve order 5...
Note: Processing signal for curve order 4...


SystemExit: 0

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
hgo.cooler_info_for_curve_order(8)

In [None]:
hgo.cooler_plot_for_curve_order(8)