In [1]:
import TADselect
import numpy as np

  from ._conv import register_converters as _register_converters


DEBUG:matplotlib:CACHEDIR=/home/dmitry/.cache/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /home/dmitry/.cache/matplotlib/fontList.json
DEBUG:matplotlib.backends:backend module://ipykernel.pylab.backend_inline version unknown
DEBUG:matplotlib.backends:backend module://ipykernel.pylab.backend_inline version unknown


# Available TAD callers

|Caller name|Caller class|
|-----------|------------|
|Lavaburst armatus|`LavaArmatusCaller`|
|Lavaburst modularity|`LavaModularityCaller`|
|Armatus Cpp|`ArmatusCaller`|
|Insulation score|`InsulationCaller`|
|Directionality index|`DirecitonalityCaller`|
|HiCseg with P model|`HiCsegPCaller`|
|HiCseg with N model|`HiCsegNCaller`|
|HiCseg with B model|`HiCsegBCaller`|
|Arrowhead|`ArrowheadCaller`|
|HiCExplorer|`HiCExplorerCaller`|
|TADtree|`TADtreeCaller`|
|TADbit|`TADbitCaller`|

Note that this algorithms are not directly implemented in the module, you have to install them separately.

|Caller|Language|Soft|
|------|--------|----|
|Lavaburst armatus, modularity|Python3|Lavaburst module|
|Armatus Cpp|C++, CLI|Armatus|
|Insulation score, Directionality index|Python3|TADtool module|
|HiCseg|R|HiCseg|
|Arrowhead|Java|Juicer|
|HiCExplorer|Python, CLI|HiCExplorer|
|TADtree|Python|TADtree module|
|TADbit|Python|TADbit module|

Each caller has one or more parameters.

|Caller|Parameters|
|------|----------|
|Armatus, Modularity|gamma|
|Insulation Score, Directionality Index|window, cutoff|
|HiCseg|&mdash;|
|Arrowhead|windowSize|
|HiCExplorer|minDepth, maxDepth, _step, thresholdComparisons, delta, correction_|
|TADtree|max_TAD_size, max_tree_depth, boundary_index_p, boundary_index_q, gamma|
|TADbit|&mdash;|

# Caller with 1-dim parameter

You have to specify dataset labels and dataset files in two lists in respective order. Also you have to pass data format, chromosome, resolution and balance.

In [2]:
la = TADselect.LavaArmatusCaller(datasets_labels=['Hep_G2_rep1'],
                                 datasets_files=['../example_data/HepG2_NA_NA_1.20000.chr17.cool'],
                                 data_format='cool',
                                 chr='chr17',
                                 assembly='hg19',
                                 resolution=20000,
                                 balance=False)

DEBUG:TADselect:Initializing from files: ['../example_data/HepG2_NA_NA_1.20000.chr17.cool']


A bulk of information can be taken or edited from metadata.

In [4]:
la._metadata

{'assembly': 'hg19',
 'balance': False,
 'caller': 'Lavaburst',
 'chr': 'chr17',
 'data_formats': ['cool'],
 'files_cool': ['../example_data/HepG2_NA_NA_1.20000.chr17.cool'],
 'labels': ['Hep_G2_rep1'],
 'method': 'armatus',
 'params': ['gamma', 'method'],
 'resolution': 20000,
 'size': 0}

To call segmentations, you have to pass the range of parameters.

In [3]:
la.call(params_dict={'gamma': list(np.arange(-5, 5, 0.5))})

DEBUG:TADselect:Calling LavaburstCaller with params: {'gamma': [-5.0, -4.5, -4.0, -3.5, -3.0, -2.5, -2.0, -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]}
INFO:TADselect:NaNs filling with 0.00


Obtained segmentations are stored in a nested dictionary. First-level key is dataset label, second-level key is gamma value. Segmentations are presented as GenomicRanges.

In [9]:
la._segmentations

{'Hep_G2_rep1': {-5.0: 31	45	1
  61	116	1
  137	145	1
  177	248	1
  269	273	1
  298	302	1
  317	440	1
  452	459	1
  532	537	1
  710	719	1
  764	771	1
  809	822	1
  839	914	1
  938	947	1
  957	984	1
  1043	1073	1
  1277	2098	1
  2098	4060	1, -4.5: 31	45	1
  62	116	1
  137	145	1
  178	248	1
  269	273	1
  298	302	1
  317	440	1
  452	459	1
  532	537	1
  710	719	1
  764	771	1
  810	822	1
  839	914	1
  938	947	1
  957	984	1
  1043	1073	1
  1277	2169	1
  2169	4060	1, -4.0: 31	45	1
  62	116	1
  137	145	1
  176	180	1
  181	248	1
  269	273	1
  298	302	1
  317	436	1
  452	459	1
  532	537	1
  710	719	1
  764	771	1
  810	822	1
  839	914	1
  938	947	1
  957	984	1
  1043	1073	1
  1277	2169	1
  2239	2244	1
  2246	2267	1
  2267	4060	1, -3.5: 31	45	1
  62	116	1
  137	145	1
  176	180	1
  181	248	1
  269	273	1
  298	302	1
  317	436	1
  452	459	1
  532	537	1
  710	719	1
  764	771	1
  812	822	1
  840	914	1
  938	947	1
  957	984	1
  1043	1073	1
  1277	2169	1
  2239	2360	1
  2360	4060	1, -3.0: 31	45	1
  62	11

You can convert segmenations to pandas.DataFrame.

In [12]:
la.segmentation2df()

Unnamed: 0,bgn,caller,end,gamma,label,length,method
0,31,Lavaburst,45,-0.5,Hep_G2_rep1,14,
1,63,Lavaburst,109,-0.5,Hep_G2_rep1,46,
2,112,Lavaburst,116,-0.5,Hep_G2_rep1,4,
3,137,Lavaburst,143,-0.5,Hep_G2_rep1,6,
4,181,Lavaburst,199,-0.5,Hep_G2_rep1,18,
5,216,Lavaburst,248,-0.5,Hep_G2_rep1,32,
6,269,Lavaburst,273,-0.5,Hep_G2_rep1,4,
7,317,Lavaburst,324,-0.5,Hep_G2_rep1,7,
8,328,Lavaburst,338,-0.5,Hep_G2_rep1,10,
9,345,Lavaburst,414,-0.5,Hep_G2_rep1,69,


In [13]:
la._df

Unnamed: 0,bgn,caller,end,gamma,label,length,method
0,31,Lavaburst,45,-0.5,Hep_G2_rep1,14,
1,63,Lavaburst,109,-0.5,Hep_G2_rep1,46,
2,112,Lavaburst,116,-0.5,Hep_G2_rep1,4,
3,137,Lavaburst,143,-0.5,Hep_G2_rep1,6,
4,181,Lavaburst,199,-0.5,Hep_G2_rep1,18,
5,216,Lavaburst,248,-0.5,Hep_G2_rep1,32,
6,269,Lavaburst,273,-0.5,Hep_G2_rep1,4,
7,317,Lavaburst,324,-0.5,Hep_G2_rep1,7,
8,328,Lavaburst,338,-0.5,Hep_G2_rep1,10,
9,345,Lavaburst,414,-0.5,Hep_G2_rep1,69,


Benchmarking is performed and can be obtained in two ways: list and dataframe. CPU time, user time, wall-clock time and memory consumption are recordered into `._benchmark_list`. To get up-to-date `._benchmark_df` you have to use `.update_benchmark_df()`.

In [15]:
la._benchmark_list

[{'ctime': 4.90625,
  'memory': 1041640,
  'utime': 1.25,
  'walltime': 6.866914987564087},
 {'ctime': 3.953125,
  'memory': 1063436,
  'utime': 1.40625,
  'walltime': 5.398842096328735},
 {'ctime': 4.140625,
  'memory': 1063464,
  'utime': 1.515625,
  'walltime': 6.055584907531738},
 {'ctime': 4.75,
  'memory': 1063464,
  'utime': 1.828125,
  'walltime': 7.4982709884643555},
 {'ctime': 4.671875,
  'memory': 1063464,
  'utime': 2.109375,
  'walltime': 7.337170124053955},
 {'ctime': 4.875,
  'memory': 1063464,
  'utime': 1.640625,
  'walltime': 7.135815143585205},
 {'ctime': 4.421875,
  'memory': 1063464,
  'utime': 1.765625,
  'walltime': 6.352094888687134},
 {'ctime': 4.5625,
  'memory': 1063464,
  'utime': 1.46875,
  'walltime': 6.1942479610443115},
 {'ctime': 4.34375,
  'memory': 1063464,
  'utime': 1.296875,
  'walltime': 5.712987899780273},
 {'ctime': 4.171875,
  'memory': 1063464,
  'utime': 1.328125,
  'walltime': 5.536420106887817},
 {'ctime': 4.015625,
  'memory': 1063464,
  '

In [16]:
la.update_benchmark_df()

In [17]:
la._benchmark_df

Unnamed: 0,ctime,memory,utime,walltime
0,4.90625,1041640,1.25,6.866915
1,3.953125,1063436,1.40625,5.398842
2,4.140625,1063464,1.515625,6.055585
3,4.75,1063464,1.828125,7.498271
4,4.671875,1063464,2.109375,7.33717
5,4.875,1063464,1.640625,7.135815
6,4.421875,1063464,1.765625,6.352095
7,4.5625,1063464,1.46875,6.194248
8,4.34375,1063464,1.296875,5.712988
9,4.171875,1063464,1.328125,5.53642


# Caller with 2-dim gamma

In [4]:
IS = TADselect.InsulationCaller(datasets_labels=['Hep_G2_rep1'],
                                datasets_files=['../example_data/HepG2_NA_NA_1.20000.chr17.cool'],
                                data_format='cool',
                                chr='chr17',
                                assembly='hg19',
                                resolution=20000,
                                balance=False)

DEBUG:TADselect:Initializing from files: ['../example_data/HepG2_NA_NA_1.20000.chr17.cool']


In [5]:
IS.call(params_dict={'window': np.arange(20000, 100000, 20000), 'cutoff': [0.1, 0.3, 0.5, 0.7]})

DEBUG:TADselect:Calling InsulationCaller with params: {'window': array([20000, 40000, 60000, 80000]), 'cutoff': [0.1, 0.3, 0.5, 0.7]}


In [6]:
IS._segmentations

{'Hep_G2_rep1': {(20000, 0.1): 19	949	1
  956	1079	1
  1083	1114	1
  1263	1726	1
  1737	2170	1
  2174	2193	1
  2194	2220	1
  2237	2244	1
  2245	3121	1
  3123	3875	1
  3874	3878	1
  3879	3986	1, (20000, 0.3): 19	949	1
  956	1079	1
  1083	1114	1
  1263	1726	1
  1737	2170	1
  2174	2193	1
  2194	2220	1
  2237	2244	1
  2245	3121	1
  3123	3875	1
  3874	3878	1
  3879	3986	1, (20000, 0.5): 19	949	1
  956	1079	1
  1083	1114	1
  1263	1726	1
  1737	2170	1
  2174	2193	1
  2194	2220	1
  2237	2244	1
  2245	3121	1
  3123	3875	1
  3874	3878	1
  3879	3986	1, (20000, 0.7): 19	949	1
  956	1079	1
  1083	1114	1
  1263	1726	1
  1737	2170	1
  2174	2193	1
  2194	2220	1
  2237	2244	1
  2245	3121	1
  3123	3875	1
  3874	3878	1
  3879	3986	1, (40000, 0.1): 19	953	1
  956	1079	1
  1083	1114	1
  1263	1729	1
  1737	2170	1
  2174	2221	1
  2230	2235	1
  2234	3121	1
  3120	3124	1, (40000, 0.3): 19	950	1
  949	953	1
  956	1079	1
  1083	1114	1
  1263	1729	1
  1737	2170	1
  2174	2221	1
  2235	3121	1
  3120	3124	1, (40000,