-
Notifications
You must be signed in to change notification settings - Fork 0
/
usage.txt
91 lines (81 loc) · 3.16 KB
/
usage.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Usage:
./tgif-db <tree file> <coordinates bed file> <optional arguments>
./tgif-dc <tree file> <coordinates bed file> <optional arguments>
Example:
./tgif-db input/tgif-db/tree.txt input/tgif-db/coordinates.bed -o output/tgif-db/
./tgif-dc input/tgif-dc/chr19_tree.txt input/tgif-dc/chr19_100kb.bed -o output/tgif-dc/
Input file format:
<tree file>
1 5 leaf1 input/matrix1.txt
2 5 leaf2 input/matrix2.txt
3 6 leaf3 input/matrix3.txt
4 6 leaf4 input/matrix4.txt
5 7 branch12 N/A
6 7 branch34 N/A
7 -1 root N/A
<coordinates bed file>
chr5 0 40000 0
chr5 40000 80000 1
chr5 80000 120000 2
chr5 120000 160000 3
<input matrix file>
0 0 1000.2
0 1 1201.78
10 1 200.7
See README.md for more details on how these files interact.
Optional arguments:
[For both TGIF-DB & TGIF-DC]
-o <output path and prefix>
Output file path and prefix. Note: will NOT create a directory
if the specified directory does not already exist.
[Default] Output files saved to current directory.
-a <alpha>
Specify an integer value used for the strength of the tree
regularization.
[Default] 1000000 for TGIF-DB and 10000 for TGIF-DC
-l
Generate a file called parameters.log under the specified output
path (or current directory by default) that lists the parameter
settings used.
[Default] No log file of parameters saved.
[For TGIF-DB only]
-b
Generate files boundary_score.txt and background_score.txt under
the specified output path (or current directory by default).
[Default] No boundary-score related files saved.
-p
Generate files boundary_pval.txt and boundary_adj_pval.txt under
the specified output path (or current directory by default).
[Default] No p-value related files saved.
-w <window size>
Specify, in number of basepairs (e.g. 2000000), the size of the
"window" or the dimension of the submatrices where factorization
will be applied.
[Default] 2000000 (i.e., 2MB)
-s <step size>
Specify, in number of basepairs (e.g. 1000000), the size of the
step size for sliding down the block diagonal of the input matrix
to generate each set of submatrices.
[Default] 1000000 (i.e., 1MB)
-t <threshold>
Used to filter out sparse regions with very few interaction
counts and signal. If the threshold value is 0.5 and the window
size is 2000000 (i.e. 2MB), regions that has fewer than half
of its neighbors within 2MB radius are filtered out before
applying TGIF. If the input matrices are sparse (e.g. total read
depth is less than a billion in any of the matrices) AND you
want to apply TGIF-DB to a relatively high resolution (e.g.
10kb), we recommend lowering the threshold to 0.2.
[Default] 0.5
[For TGIF-DC only]
-e
Use this flag if the input matrices for TGIF-DC are O/E count
matrices.
[Default] TGIF-DC assumes input matrices are raw count matrices.
-k <number of clusters>
By default, TGIF-DC operates with k=2 and finds 2 clusters of
genomic regions that correspond to A or B compartments. To find
subcompartments, i.e., more granular clusters, increase k.
-u
Generate output files that contain the factors Us and Vs.
[Default] no factor files are generated.